Section 5: Selecting the Document Language

Readiris converts scanned images, image files and PDF files into editable text documents and text-searchable PDF documents. In order for Readiris to recognize the text in your images, you need to select the correct recognition options.

By far the most important recognition option is the document language.

To select the document language:

Tip Readiris Pro: in case you want to recognize documents in multiple languages, make sure to select the language with the largest character set. E.g. if you want to recognize a document that contains both English and French text, select French as document language. This way, the accents will be recognized correctly.

Recognizing Numeric Documents

When you are processing documents that only contain numbers and almost or no text, then it is recommended to select the Numeric option:

When this option is selected Readiris only recognizes the numerals 0-9 and the following series of symbols:

 

+

plus sign

*

asterisk

/

slash

%

percentage sign

,

comma

.

period

(

opening parenthesis

)

closing parenthesis

-

hyphen

=

equation sign

$

dollar sign

£

pound sign

euro sign

¥

yen sign

 

Recognizing Western words in non-Latin alphabets

When you are processing Cyrillic, Slavic, Greek or Asian documents that also contain "Western" words written in the Latin alphabet - such as proper nouns, then it is recommended to select one of the available Language pairs.

Language pairs are always combined with the English language and are available for Russian, Byelorussian, Ukrainian, Serbian, Macedonian, Bulgarian and Greek.

Note: when processing Asian or Hebrew documents, mixed characters sets are used automatically.

To select a Language pair:

Selecting the language per page

When specific pages use a different language than the overall document, you don't need to define a secondary language. You can apply a different language to those pages.

Select the pages in the Pages panel, Ctrl-click them and use the command Language to assign another language than the overall document language to that/those page(s).

Pages with a different language than the overall language are marked in red in the Pages panel.

Unlike secondary languages, there are no limitations here.

Note: the tooltip of each page in the Pages panel indicates which language applies to that page.

 

Recognizing Secondary Languages inside a single document (Readiris Corporate only)

When your documents contain text in multiple languages, it is recommended to select a main recognition language, combined with several Secondary languages. You may select up to 4 secondary languages:

The list of secondary languages changes depending on the selected primary languages.

Note: do not select languages that do not apply; the bigger the character set, the slower the recognition and the higher the risk of OCR errors.