Extraction modes
Intelligent document extraction supports two processing modes through the mode field at the unified endpoint /v2/process/idp/documentExtract. When omitted, the backend defaults to vision. Both modes use the same fixed extract_fields schema input:
| mode | Description | Schema input | Best for |
|---|---|---|---|
vision | Vision-language model running independently on each page. Handles handwriting and free-form layouts more robustly. | extract_fields | |
layout | Layout-aware integrated extraction. Supports large files, cross-page extraction, and bbox grounding for traceable results. | extract_fields |
Vision mode (mode=vision, default)
extract_fields is the JSON string of a single schema object:
curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentExtract' \
--header 'x-api-key: public_key' \
--form 'file=@/path/to/handwriting.pdf' \
--form 'mode=vision' \
--form 'extract_fields={"name":"Form","keys":{"Name":{"prompt":"Applicant name","mapping":null}},"tableHeaders":{}}'Layout mode (mode=layout)
layout uses the same extract_fields input — a JSON string of one fixed schema object:
curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentExtract' \
--header 'x-api-key: public_key' \
--form 'file=@/path/to/invoice.pdf' \
--form 'mode=layout' \
--form 'extract_fields={"name":"ShipmentList","keys":{"OrderNo":{"prompt":null,"mapping":null},"Consignee":{"prompt":null,"mapping":null}},"tableHeaders":{"Table_1":{"No":{"prompt":null,"mapping":null},"ISBN":{"prompt":null,"mapping":null},"BookName":{"prompt":null,"mapping":null},"Qty":{"prompt":null,"mapping":null}}}}' \
--form 'enable_grounding=true'When using layout mode, if you need to map results back to the original text, you can enable the enable_grounding parameter; the returned results will include coordinate information for the text blocks corresponding to the fields, facilitating result tracing and highlighting. Additionally, layout mode supports the same options_json parameter as the parsing feature. See Parse options for details.
How to choose
- Start with
layoutwhen the document structure is stable, the file is long, or you need to map results back to the source - Switch to
visionwhen the document is handwriting-heavy, scan quality is uneven, or the page layout is highly free-form
Request examples
--form 'mode=vision'--form 'mode=layout' \
--form 'enable_grounding=true'Next, read Extract schema.