Skip to content
ComPDF

Extraction modes

Intelligent document extraction supports two processing modes through the mode field at the unified endpoint /v2/process/idp/documentExtract. When omitted, the backend defaults to vision. Both modes use the same fixed extract_fields schema input:

modeDescriptionSchema inputBest for
visionVision-language model running independently on each page. Handles handwriting and free-form layouts more robustly.extract_fields
layoutLayout-aware integrated extraction. Supports large files, cross-page extraction, and bbox grounding for traceable results.extract_fields

Vision mode (mode=vision, default)

extract_fields is the JSON string of a single schema object:

bash
curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentExtract' \
  --header 'x-api-key: public_key' \
  --form 'file=@/path/to/handwriting.pdf' \
  --form 'mode=vision' \
  --form 'extract_fields={"name":"Form","keys":{"Name":{"prompt":"Applicant name","mapping":null}},"tableHeaders":{}}'

Layout mode (mode=layout)

layout uses the same extract_fields input — a JSON string of one fixed schema object:

bash
curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentExtract' \
  --header 'x-api-key: public_key' \
  --form 'file=@/path/to/invoice.pdf' \
  --form 'mode=layout' \
  --form 'extract_fields={"name":"ShipmentList","keys":{"OrderNo":{"prompt":null,"mapping":null},"Consignee":{"prompt":null,"mapping":null}},"tableHeaders":{"Table_1":{"No":{"prompt":null,"mapping":null},"ISBN":{"prompt":null,"mapping":null},"BookName":{"prompt":null,"mapping":null},"Qty":{"prompt":null,"mapping":null}}}}' \
  --form 'enable_grounding=true'

When using layout mode, if you need to map results back to the original text, you can enable the enable_grounding parameter; the returned results will include coordinate information for the text blocks corresponding to the fields, facilitating result tracing and highlighting. Additionally, layout mode supports the same options_json parameter as the parsing feature. See Parse options for details.

How to choose

  • Start with layout when the document structure is stable, the file is long, or you need to map results back to the source
  • Switch to vision when the document is handwriting-heavy, scan quality is uneven, or the page layout is highly free-form

Request examples

bash
--form 'mode=vision'
bash
--form 'mode=layout' \
--form 'enable_grounding=true'

Next, read Extract schema.