Extraction modes

Intelligent document extraction supports two processing modes through the mode field at the unified endpoint /v2/process/idp/documentExtract. When omitted, the backend defaults to vision. Both modes use the same fixed extract_fields schema input:

mode	Description	Schema input	Best for
`vision`	Vision-language model running independently on each page. Handles handwriting and free-form layouts more robustly.	`extract_fields`
`layout`	Layout-aware integrated extraction. Supports large files, cross-page extraction, and bbox grounding for traceable results.	`extract_fields`

Vision mode (`mode=vision`, default)

extract_fields is the JSON string of a single schema object:

bash

curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentExtract' \
  --header 'x-api-key: public_key' \
  --form 'file=@/path/to/handwriting.pdf' \
  --form 'mode=vision' \
  --form 'extract_fields={"name":"Form","keys":{"Name":{"prompt":"Applicant name","mapping":null}},"tableHeaders":{}}'

Layout mode (`mode=layout`)

layout uses the same extract_fields input — a JSON string of one fixed schema object:

bash

curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentExtract' \
  --header 'x-api-key: public_key' \
  --form 'file=@/path/to/invoice.pdf' \
  --form 'mode=layout' \
  --form 'extract_fields={"name":"ShipmentList","keys":{"OrderNo":{"prompt":null,"mapping":null},"Consignee":{"prompt":null,"mapping":null}},"tableHeaders":{"Table_1":{"No":{"prompt":null,"mapping":null},"ISBN":{"prompt":null,"mapping":null},"BookName":{"prompt":null,"mapping":null},"Qty":{"prompt":null,"mapping":null}}}}' \
  --form 'enable_grounding=true'

When using layout mode, if you need to map results back to the original text, you can enable the enable_grounding parameter; the returned results will include coordinate information for the text blocks corresponding to the fields, facilitating result tracing and highlighting. Additionally, layout mode supports the same options_json parameter as the parsing feature. See Parse options for details.

How to choose

Start with layout when the document structure is stable, the file is long, or you need to map results back to the source
Switch to vision when the document is handwriting-heavy, scan quality is uneven, or the page layout is highly free-form

Request examples

bash

--form 'mode=vision'

bash

--form 'mode=layout' \
--form 'enable_grounding=true'

Next, read Extract schema.

Extraction modes ​

Vision mode (mode=vision, default) ​

Layout mode (mode=layout) ​

How to choose ​

Request examples ​

Extraction modes

Vision mode (`mode=vision`, default)

Layout mode (`mode=layout`)

How to choose

Request examples