Document parsing guide

idp/documentParsing converts a document into structured JSON and Markdown. It is commonly used before extraction, search, RAG, or review workflows.

bash

curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentParsing' \
  --header 'x-api-key: <your-public-key>' \
  --form 'file=@/path/to/document.pdf' \
  --form 'pageRanges=1-3' \
  --form 'parseOptions={"applyDocumentTree":true,"mergeTables":true}'

What this endpoint returns

For a typical request, the response body contains:

Path	What it is used for
`code`, `message`, `x_request_id`	Request status and troubleshooting
`file_type`	Parsed input type, for example `PDF`
`result`	Main parse result object, including page output and summary counters
`metrics`	Page-level processing metadata such as `dpi`, `angle`, and duration
`image_process`	Extra image-processing output; usually empty for standard parsing requests

The most commonly used fields inside result are:

Path	What it is used for
`result.pages`	Per-page parse output, including `structured` and `content`
`result.markdown`	A merged Markdown view of the document
`result.catalog`	Document tree / TOC when hierarchy detection is enabled
`result.valid_page_number`	Number of successfully parsed pages
`result.total_page_number`	Total page count of the input file
`result.success_count`	Count of successfully processed pages or result units
`result.detail`	Flat paragraph-level list merging all pages in reading order
`result.excel_base64`	Base64-encoded Excel output when export format includes Excel

Open-Source PDF SDK & AI Document Processing

Document parsing guide

What this endpoint returns

Recommended reading order

Document parsing guide ​

What this endpoint returns ​

Recommended reading order ​

Document parsing guide

What this endpoint returns

Recommended reading order