Skip to content
ComPDF

Extract PDF To JSON

Overview

Extract text, tables and images from PDF documents to Json file.

Standard table and non-standard table

Commonly, tables can be divided into two categories: standard tables and non-standard tables. The specific definitions are as follows:

  • Standard table: The table border and the inner lines of the table are complete and clear. There is no need to manually add table lines to divide the table content.image-20231116145224545
  • Non-Standard Tables: Tables lacking borders or clear inner lines, requiring manual additions of table lines to separate contents.image-20231116145517818

Table Extraction Option

ComPDF Conversion SDK supports the option json_contain_table, when enabled, will extract table content from PDFs and output the table structure; otherwise, table content will be treated as regular text.

Notice

  • Without enabling AI layout analysis or OCR options, tables in the original PDF cannot be extracted. It is recommended to enable AI layout analysis or OCR for high-precision table recognition.

Sample

Full sample code which illustrates the text extraction capabilities.

c++
ConvertOptions opt;
// Extract PDF to JSON.
CPDFConversion::StartPDFToJson("json.pdf", "password", "path/output.json", opt);