Guides

OCR

Overview

OCR (Optical Character Recognition) is the process of converting images of typed, handwritten, or printed text into machine-encoded text.

OCR is commonly used for text recognition and extraction from the following types of documents:

Non-editable scanned PDF files.
Photographs of documents.
Scene photos such as advertising layouts, signboards, etc.
Identification cards, passports, vehicle license plates, and other official plates.
Invoices, bills, receipts, and other financial documents.

The following features support OCR:

PDF to Word
PDF to Excel
PDF to PPT
PDF to HTML
PDF to RTF
PDF to TXT
PDF to CSV
PDF to JSON
PDF to Markdown

OCR Language Support of ComPDFKit Conversion SDK:

Script / Notice	Language (Native)	Language (In English)
Latn; American	English	English
Latn; Canadian	Français canadien	French
Hans/Hant	中文简体	Chinese (Simplified)
Hans/Hant	中文繁体	Chinese (Traditional)
Jpan	日本語	Japanese
Kore	한국어	Korean
Latn	Deutsch	German
Latn	Српски (латиница)	Serbian (latin)
Latn	Occitan, lenga d'òc, provençal	Occitan
Latn	Dansk	Danish
Latn	Italiano	Italian
Latn; European	Español	Spanish
Latn; European	Português (Portugal)	Portuguese
Latn	Te reo Māori	Maori
Latn	Bahasa Melayu	Malay
Latn	Malti	Maltese
Latn	Nederlands	Dutch
Latn; Bokmål	Norsk	Norwegian
Latn	Polski	Polish
Latn	Română	Romanian
Latn	Slovenčina	Slovak
Latn	Slovenščina	Slovenian
Latn	shqip	Albanian
Latn	Svenska	Swedish
Latn	Swahili	Swahili
Latn	Wikang Tagalog	Tagalog
Latn	Türkçe	Turkish
Latn	oʻzbekcha	Uzbek
Latn	Tiếng Việt	Vietnamese
Latn	Afrikaans	Afrikaans
Latn	Azərbaycan	Azerbaijani
Latn	Bosanski	Bosnian
Latn	Čeština	Czech
Latn	Cymraeg	Welsh
Latn	Eesti keel	Estonian
Latn	Gaeilge	Irish
Latn	Hrvatski	Croatian
Latn	Magyar	Hungarian
Latn	Bahasa Indonesia	Indonesian
Latn	Íslenska	Icelandic
Latn	Kurdî	Kurdish
Latn	Lietuvių	Lithuanian
Latn	Latviešu	Latvian

Converting Images to Other Document Formats

The OCR function also supports converting input images into Word, Excel, PPT, HTML, CSV, RTF, TXT, Json and other formats. This sample demonstrates how to use the ComPDFKit OCR function to convert image files to DOCX file.

kotlin

val modelPath = "***";
ConverterManager.setAIModel(modelPath, OCRLanguage.CHINESE);

// Support jpg, jpeg, png, bmp, tiff format.
val inputFilePath = "***";
val password = "***";
val outputFileName = "***";

val wordOptions = WordOptions();
wordOptions.containImages = true;
wordOptions.containAnnotations = true;
wordOptions.enableAiLayout = true;
wordOptions.enableOcr = true;

var error = ComPDFKitConverter.startPDFToWord(inputFilePath, password, outputFileName, wordOptions);

Notice

The quality of the OCR result depends on the quality of the input image. If the input image has a low resolution, the OCR result quality will be affected. A good rule of thumb is that the more pixels in the character shapes, the better. If the character bounding box is smaller than 20x20 pixels, OCR quality will drop exponentially. The ideal image is a grayscale image with a resolution around 300 DPI.
When performing OCR, make sure the OCR language setting matches the language in the PDF document to achieve the best OCR conversion quality.
OCR functionality currently does not support operating systems lower than Windows 10.

Sample

This Sample demonstrates how to use the ComPDFKit OCR function to convert a PDF to DOCX file.

kotlin

val modelPath = "***";
ConverterManager.setAIModel(modelPath, OCRLanguage.CHINESE);

// Support jpg, jpeg, png, bmp, tiff format.
val inputFilePath = "***";
val password = "***";
val outputFileName = "***";

val wordOptions = WordOptions();
wordOptions.containImages = true;
wordOptions.containAnnotations = true;
wordOptions.enableAiLayout = true;
wordOptions.enableOcr = true;

val error = ComPDFKitConverter.startPDFToWord(inputFilePath, password, outputFileName, wordOptions);

OCR ​

Overview ​

Converting Images to Other Document Formats ​

Notice ​

Sample ​

OCR

Overview

Converting Images to Other Document Formats

Notice

Sample