  • 19 January 2011

If you receive PDF documents for translation, you might need to prepare to work in memoQ or any other CAT tool. To learn how to prepare PDF files for translation, please read the following detailed description.

What is a PDF?

PDF, Adobe's Portable Document Format, became a de facto standard for formatted documents. PDF documents appear with the same layout on every computer, no matter what your local settings are. Many companies simply send out PDF for translation. However, for translators, PDF is not an ideal format.
No document is created in PDF in the first place. Every PDF has an underlying document format - they can be created in Microsoft Word, Apple Pages, Adobe InDesign, Adobe Framemaker, or several other applications (regular PDF), or they can also be scanned documents (scanned PDF).


If you are working with scanned PDFs, memoQ cannot help you with turning the scan into text, you need an optical character recognition tool like Nuance PDF Converter and ABBYY PDF Transformer that turn your PDF into Word. These tools can also be useful when you are working with regular PDFs and memoQ’s built-in converter does not produce the desired result. Infix is also a promising solution for regular PDFs. Different solutions work best for different languages and different content types, it is worth giving a try to each of these and finding out which works best for you.

There are several solutions on the market to perform PDF to Word conversion. The best-known desktop applications include Nuance PDF Converter and ABBYY PDF Transformer. There are also web-based services such as OCR Terminal or PDF converter.


Translating PDF files

If you are working with a regular PDF, if possible, ask your customer to send you the original document that the PDF was created from. If you can get it, you save yourself a lot of trouble. However, sometimes you simply can't receive it. memoQ has a built-in PDF to DOCX converter that usually delivers good results. However, be prepared to fix the target file in Word.

Converted PDF files may result in many tags and lots of unnecessary line breaks. If you are not Prepearing your Word file for translationhappy with memoQ’s built-in converter, please try the OCR tools mentioned in the previous section. If you are still not happy with the result, Dave Turner's excellent CodeZapper macros allow you to process the files and remove even more tags.



Aligning PDF files and using monolingual PDF materials

If formatting is not needed, for example when you are aligning translations or using monolingual PDF material, it may be worth using memoQ’s other built-in PDF filter, which turns regular PDF into plain text. This filter does not produce tags and is more tolerant to line breaks. If you are in LiveDocs, select for example Add alignment pair, then click Add source documents as. Click Change filter and configuration, and under General, select Import by converting to Plain Text. Repeat the same for target documents, or under Import document.

