Skip to content
ComPDF

Extract PDF To JSON

Overview

Extract text, tables, and images from PDF documents to a JSON file.

Standard Table and Non-standard Table

Commonly, tables can be divided into two categories:

  • Standard table: The table border and inner lines are complete and clear.
  • Non-standard table: Tables lack borders or clear inner lines and require table recognition to recover structure.

Table Extraction Option

ComPDF Conversion SDK supports the option json_contain_table. When enabled, table content is extracted from PDFs together with table structure; otherwise, table content is treated as regular text.

Notice

Without enabling AI layout analysis or OCR options, tables in the original PDF may not be extracted with high precision. It is recommended to enable AI layout analysis, OCR, or table recognition when extracting complex tables.

Sample

ruby
options = ComPDFKitConversion::ConvertOptions.new
options.json_contain_table = true
options.enable_ai_layout = true
options.enable_ai_table_recognition = true

result = ComPDFKitConversion::Conversion.start_pdf_to_json(
  input_file_path,
  "",
  output_file_path,
  options
)