OCR

Overview

OCR (Optical Character Recognition) is the process of converting images of typed, handwritten, or printed text into machine-encoded text.

OCR is commonly used for text recognition and extraction from scanned PDF files, photographs of documents, scene photos, invoices, receipts, and other image-based documents.

The following features support OCR:

PDF to Word
PDF to Excel
PDF to PowerPoint
PDF to HTML
PDF to RTF
PDF to TXT
PDF to Searchable PDF
PDF to OFD
Extract PDF to JSON
Extract PDF to Markdown

Set OCR Language

Use languages to specify OCR languages. The value is an array of numeric OCR language constants.

const OCRLanguage = {
  CHINESE: 1,
  ENGLISH: 3,
  AUTO: 16
};

const options = {
  enableOcr: true,
  languages: [OCRLanguage.ENGLISH, OCRLanguage.CHINESE]
};

OCR Options

Use ocrOption to control OCR processing scope.

const OCROption = {
  INVALID_CHARACTER: 0,
  SCAN_PAGE: 1,
  INVALID_CHARACTER_AND_SCAN_PAGE: 2,
  ALL: 3
};

options.ocrOption = OCROption.ALL;

Preserve Page Background

When OCR is enabled, use containPageBackgroundImage to control whether page background images are preserved.

options.containPageBackgroundImage = true;

Sample

sdk.setDocumentAIModel("/path/to/documentai.model", -1);

const options = {
  enableOcr: true,
  languages: [3],
  ocrOption: 3
};

const result = sdk.startPDFToWord(inputFilePath, "", outputFilePath, options);

OCR ​

Overview ​

Set OCR Language ​

OCR Options ​

Preserve Page Background ​

Sample ​

OCR

Overview

Set OCR Language

OCR Options

Preserve Page Background

Sample