Skip to content
Guides

OCR

Overview

OCR (Optical Character Recognition) is the process of converting images of typed, handwritten, or printed text into machine-encoded text.

OCR is commonly used for text recognition and extraction from the following types of documents:

  • Non-editable scanned PDF files.
  • Photographs of documents.
  • Scene photos such as advertising layouts, signboards, etc.
  • Identification cards, passports, vehicle license plates, and other official plates.
  • Invoices, bills, receipts, and other financial documents.

The following features support OCR:

  • PDF to Word
  • PDF to Excel
  • PDF to PPT
  • PDF to HTML
  • PDF to RTF
  • PDF to TXT
  • PDF to CSV
  • PDF to JSON
  • PDF to Markdown

OCR Language Support of ComPDFKit Conversion SDK:

Script / NoticeLanguage (Native)Language (In English)
Latn; AmericanEnglishEnglish
Latn; CanadianFrançais canadienFrench
Hans/Hant中文简体Chinese (Simplified)
Hans/Hant中文繁体Chinese (Traditional)
Jpan日本語Japanese
Kore한국어Korean
LatnDeutschGerman
LatnСрпски (латиница)Serbian (latin)
LatnOccitan, lenga d'òc, provençalOccitan
LatnDanskDanish
LatnItalianoItalian
Latn; EuropeanEspañolSpanish
Latn; EuropeanPortuguês (Portugal)Portuguese
LatnTe reo MāoriMaori
LatnBahasa MelayuMalay
LatnMaltiMaltese
LatnNederlandsDutch
Latn; BokmålNorskNorwegian
LatnPolskiPolish
LatnRomânăRomanian
LatnSlovenčinaSlovak
LatnSlovenščinaSlovenian
LatnshqipAlbanian
LatnSvenskaSwedish
LatnSwahiliSwahili
LatnWikang TagalogTagalog
LatnTürkçeTurkish
LatnoʻzbekchaUzbek
LatnTiếng ViệtVietnamese
LatnAfrikaansAfrikaans
LatnAzərbaycanAzerbaijani
LatnBosanskiBosnian
LatnČeštinaCzech
LatnCymraegWelsh
LatnEesti keelEstonian
LatnGaeilgeIrish
LatnHrvatskiCroatian
LatnMagyarHungarian
LatnBahasa IndonesiaIndonesian
LatnÍslenskaIcelandic
LatnKurdîKurdish
LatnLietuviųLithuanian
LatnLatviešuLatvian

Converting Images to Other Document Formats

The OCR function also supports converting input images into Word, Excel, PPT, HTML, CSV, RTF, TXT, Json and other formats. This sample demonstrates how to use the ComPDFKit OCR function to convert image files to DOCX file.

kotlin
val modelPath = "***";
ConverterManager.setAIModel(modelPath, OCRLanguage.CHINESE);

// Support jpg, jpeg, png, bmp, tiff format.
val inputFilePath = "***";
val password = "***";
val outputFileName = "***";

val wordOptions = WordOptions();
wordOptions.containImages = true;
wordOptions.containAnnotations = true;
wordOptions.enableAiLayout = true;
wordOptions.enableOcr = true;

var error = ComPDFKitConverter.startPDFToWord(inputFilePath, password, outputFileName, wordOptions);

Notice

  • The quality of the OCR result depends on the quality of the input image. If the input image has a low resolution, the OCR result quality will be affected. A good rule of thumb is that the more pixels in the character shapes, the better. If the character bounding box is smaller than 20x20 pixels, OCR quality will drop exponentially. The ideal image is a grayscale image with a resolution around 300 DPI.
  • When performing OCR, make sure the OCR language setting matches the language in the PDF document to achieve the best OCR conversion quality.
  • OCR functionality currently does not support operating systems lower than Windows 10.

Sample

This Sample demonstrates how to use the ComPDFKit OCR function to convert a PDF to DOCX file.

kotlin
val modelPath = "***";
ConverterManager.setAIModel(modelPath, OCRLanguage.CHINESE);

// Support jpg, jpeg, png, bmp, tiff format.
val inputFilePath = "***";
val password = "***";
val outputFileName = "***";

val wordOptions = WordOptions();
wordOptions.containImages = true;
wordOptions.containAnnotations = true;
wordOptions.enableAiLayout = true;
wordOptions.enableOcr = true;

val error = ComPDFKitConverter.startPDFToWord(inputFilePath, password, outputFileName, wordOptions);