Document parsing guide
idp/documentParsing converts a document into structured JSON and Markdown. It is commonly used before extraction, search, RAG, or review workflows.
bash
curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentParsing' \
--header 'x-api-key: <your-public-key>' \
--form 'file=@/path/to/document.pdf' \
--form 'pageRanges=1-3' \
--form 'parseOptions={"applyDocumentTree":true,"mergeTables":true}'What this endpoint returns
For a typical request, the response body contains:
| Path | What it is used for |
|---|---|
code, message, x_request_id | Request status and troubleshooting |
file_type | Parsed input type, for example PDF |
result | Main parse result object, including page output and summary counters |
metrics | Page-level processing metadata such as dpi, angle, and duration |
image_process | Extra image-processing output; usually empty for standard parsing requests |
The most commonly used fields inside result are:
| Path | What it is used for |
|---|---|
result.pages | Per-page parse output, including structured and content |
result.markdown | A merged Markdown view of the document |
result.catalog | Document tree / TOC when hierarchy detection is enabled |
result.valid_page_number | Number of successfully parsed pages |
result.total_page_number | Total page count of the input file |
result.success_count | Count of successfully processed pages or result units |
result.detail | Flat paragraph-level list merging all pages in reading order |
result.excel_base64 | Base64-encoded Excel output when export format includes Excel |
Recommended reading order
- Start with Parse options to understand request parameters such as
image_type,content_filter,options_json, andignore_labels. - Then read Response structure for the top-level JSON contract.
- Use Page details when you need to map headings, tables, and footnotes back to page coordinates.
- Use Metrics when you are monitoring processing quality or performance.