Skip to content
ComPDF

Parse options

This page focuses on the request parameters for the parser-style document parsing endpoint. The examples below keep the file upload flow only and use the current public parameter names.

Request parameters

ParameterLocationTypeRequiredDefaultDescription
fileformfileYesInput document file
image_typequerystringNourlHow images are embedded in Markdown: url or base64
content_filterquerystringNoallKeep only selected content block types
options_jsonformJSON stringNoBuilt-in defaultsParser configuration merged with the server defaults

image_type

image_type controls how image content is represented in the Markdown result:

ValueMeaning
urlEmbed image content as accessible URLs
base64Embed image content inline as Base64

Use url for most frontend and knowledge-base integrations. Choose base64 when you need a fully self-contained Markdown artifact.

content_filter

content_filter narrows the result to selected block types. Common patterns:

ValueMeaning
allReturn all content blocks
textKeep only text-related content
tableKeep only table-related content
imageKeep only image-related content

If your workflow only needs one category, filtering at request time is usually simpler than post-filtering in downstream code.

options_json

options_json is a JSON string that controls parsing behaviour. Typical options include:

  • generating a document tree / catalog
  • merging related table fragments
  • re-levelling title hierarchy
  • ignoring headers, footers, footnotes, and similar auxiliary content

Example:

json
{
  "applyDocumentTree": true,
  "mergeTables": true,
  "relevelTitles": true,
  "ignore_labels": [
    "number",
    "footnote",
    "header",
    "header_image",
    "footer",
    "footer_image",
    "aside_text"
  ]
}

ignore_labels

ignore_labels is typically passed inside options_json to suppress auxiliary block types in the parse output. The supported labels are:

  • number
  • footnote
  • header
  • header_image
  • footer
  • footer_image
  • aside_text

To keep all supported auxiliary content, pass an empty array explicitly:

bash
--form 'options_json={"ignore_labels":[]}'

Recommendations

  • For real-time previews, prefer image_type=url to keep payloads smaller.
  • For search, extraction, or RAG workflows, use content_filter=text or content_filter=table to reduce downstream processing.
  • For layout-heavy documents, combine document-tree and table-merge options, then inspect the output with Response overview.