Skip to content
ComPDF

Page details

result.pages returns parse output page by page. Each page object includes metadata, a structured block list, and a lightweight content list. Additionally, result.detail provides a flat paragraph-level view across all pages.

Page object fields

FieldTypeMeaning
page_idintPage number (1-based)
anglefloatPage rotation angle (after correction)
heightintPage pixel height
widthintPage pixel width
image_idstringOriginal image identifier (usually empty)
durationsfloatProcessing time for this page (seconds)
statusstringProcessing status, e.g. Success
structuredarrayStructured block list for fine-grained rendering and positioning
contentarrayLightweight content list for quick text access

structured — structured blocks

Best for detailed rendering, highlight positioning, and content reconstruction. Each block:

FieldTypeMeaning
idintBlock sequence number
typestringBlock type, e.g. doc_title, paragraph_title, text, table, image
textstringExtracted text
posfloat[]Quadrilateral coordinates [x1,y1, x2,y2, x3,y3, x4,y4] (top-left, top-right, bottom-right, bottom-left)
outline_levelintHeading level (-1 for non-heading)
contentint[]Referenced text block IDs linking to the content array
json
{
  "id": 0,
  "type": "doc_title",
  "text": "# Sample PDF",
  "pos": [141, 154, 509, 154, 509, 220, 141, 220],
  "outline_level": -1,
  "content": [0]
}

Common block types:

TypeMeaning
doc_titleDocument title
paragraph_titleParagraph heading
textBody text
tableTable
imageImage
figureFigure / chart
header / footerPage header / footer
footnoteFootnote
formulaFormula

content — lightweight content

Better for quick consumption — concatenate into page-level text or build search indexes. Each item:

FieldTypeMeaning
idintContent block ID, linked from structured[].content
typestringBlock type
textstringText content
posfloat[]Quadrilateral coordinates
scorefloatRecognition confidence
anglefloatText angle
json
{
  "id": 0,
  "type": "doc_title",
  "text": "# Sample PDF",
  "pos": [141, 154, 509, 154, 509, 220, 141, 220],
  "score": 0.5958,
  "angle": 0
}

result.detail — cross-page paragraph view

result.detail aggregates all paragraphs in reading order into a single flat array, eliminating the need to iterate through pages manually. Each record:

FieldTypeMeaning
paragraph_idintParagraph sequence number
page_idintSource page number
typestringFixed as paragraph
sub_typestringParagraph sub-type, e.g. doc_title, paragraph_title, text, table
textstringParagraph text
positionfloat[]Quadrilateral coordinates
outline_levelintHeading level
tagsstring[]Custom tags
json
{
  "paragraph_id": 1,
  "page_id": 1,
  "type": "paragraph",
  "sub_type": "doc_title",
  "text": "# Sample PDF",
  "position": [141, 154, 509, 154, 509, 220, 141, 220],
  "outline_level": -1,
  "tags": []
}

Typical usage

  • For UI highlighting, read structured[].pos or content[].pos
  • For block-level filtering, select by structured[].type
  • For reading views, use result.markdown or iterate through content
  • For structured downstream processing, iterate result.pages and consume structured
  • For cross-page paragraph processing, iterate result.detail directly