Skip to content
ComPDF

Document parsing guide

idp/documentParsing converts a document into structured JSON and Markdown. It is commonly used before extraction, search, RAG, or review workflows.

bash
curl --location --request POST 'https://api-server.compdf.com/server/v2/process/idp/documentParsing' \
  --header 'x-api-key: <your-public-key>' \
  --form 'file=@/path/to/document.pdf' \
  --form 'pageRanges=1-3' \
  --form 'parseOptions={"applyDocumentTree":true,"mergeTables":true}'

What this endpoint returns

For a typical request, the response body contains:

PathWhat it is used for
code, message, x_request_idRequest status and troubleshooting
file_typeParsed input type, for example PDF
resultMain parse result object, including page output and summary counters
metricsPage-level processing metadata such as dpi, angle, and duration
image_processExtra image-processing output; usually empty for standard parsing requests

The most commonly used fields inside result are:

PathWhat it is used for
result.pagesPer-page parse output, including structured and content
result.markdownA merged Markdown view of the document
result.catalogDocument tree / TOC when hierarchy detection is enabled
result.valid_page_numberNumber of successfully parsed pages
result.total_page_numberTotal page count of the input file
result.success_countCount of successfully processed pages or result units
result.detailFlat paragraph-level list merging all pages in reading order
result.excel_base64Base64-encoded Excel output when export format includes Excel
  • Start with Parse options to understand request parameters such as image_type, content_filter, options_json, and ignore_labels.
  • Then read Response structure for the top-level JSON contract.
  • Use Page details when you need to map headings, tables, and footnotes back to page coordinates.
  • Use Metrics when you are monitoring processing quality or performance.