Overview
Extract text, tables, and images from PDF documents to a JSON file.
Standard Table and Non-standard Table
Tables can commonly be divided into two categories:
- Standard table: The table border and inner lines are complete and clear. There is no need to manually add table lines to divide table content.

- Non-standard table: The table lacks borders or clear inner lines, requiring manual additions of table lines to separate content.

Table Extraction Option
ComPDF Conversion SDK supports json_contain_table. When enabled, the SDK extracts table content from PDFs and outputs the table structure. Otherwise, table content is treated as regular text.
Notice
- Without enabling AI layout analysis or OCR options, tables in the original PDF may not be extracted. It is recommended to enable AI layout analysis or OCR for high-precision table recognition.
Sample
c
CConvertOption option = CPDF_DefaultConvertOption();
option.json_contain_table = true;
CPDF_StartPDFToJson(CPDF_TEXT("json.pdf"), CPDF_TEXT("password"), CPDF_TEXT("path/output.json"), option, NULL);