Skip to content
ComPDF

Overview

Extract text, tables, and images from PDF documents to a JSON file.

Standard Table and Non-standard Table

Tables can commonly be divided into two categories:

  • Standard table: The table border and inner lines are complete and clear. There is no need to manually add table lines to divide table content. Standard table example
  • Non-standard table: The table lacks borders or clear inner lines, requiring manual additions of table lines to separate content. Non-standard table example

Table Extraction Option

ComPDF Conversion SDK supports json_contain_table. When enabled, the SDK extracts table content from PDFs and outputs the table structure. Otherwise, table content is treated as regular text.

Notice

  • Without enabling AI layout analysis or OCR options, tables in the original PDF may not be extracted. It is recommended to enable AI layout analysis or OCR for high-precision table recognition.

Sample

c
CConvertOption option = CPDF_DefaultConvertOption();
option.json_contain_table = true;

CPDF_StartPDFToJson(CPDF_TEXT("json.pdf"), CPDF_TEXT("password"), CPDF_TEXT("path/output.json"), option, NULL);