Skip to content
ComPDF

Response structure

The extraction result can be understood as "field output plus optional grounding data". When enable_grounding is enabled, the response can include bbox references back to the source page.

Top-level view

The API returns a standard task-level response. The extraction result (fields, tables, pages, etc.) is available in the downloaded file referenced by downloadUrl or fileUrl.

json
{
  "code": "200",
  "msg": "success",
  "data": {
    "fileKey": "<string>",
    "taskId": "<string>",
    "fileName": "<string>",
    "downFileName": "<string>",
    "fileUrl": "<string>",
    "downloadUrl": "<string>",
    "sourceType": "<string>",
    "targetType": "<string>",
    "fileSize": 0,
    "convertSize": 0,
    "convertTime": 0,
    "status": "<string>",
    "failureCode": "<string>",
    "failureReason": "<string>",
    "fileParameter": "<string>"
  }
}

The result file downloaded from downloadUrl contains the actual extraction payload:

text
result
  name / key
  value
  page
  bboxes

Field output

Scalar field results commonly include:

FieldMeaning
name / keyField name
valueExtracted field value
pageSource page number, when returned
bboxSource-page coordinates, especially when grounding is enabled

Table output

Table-style output is commonly organised as "table name -> rows -> cells". In practice, this usually means:

  • a table identifier
  • a list of row records
  • cell values mapped to the schema-defined headers

If your schema defines tableHeaders, the returned table data usually follows the same header structure, which makes it easier to map into downstream business objects.

Extraction result example

Below is an actual extraction result organised by page:

json
{
  "Page-1": {
    "批销单号": "PXD222085",
    "发货方式": "汽运",
    "客户单号": "5444412/1891133",
    "审批日期": "2024-05-07",
    "收货单位": "Shanghai Hexiaoxiao Information Technology Co., Ltd.",
    "单位编码": "21002214",
    "仓储联系人": "",
    "tables": [
      [
        {
          "序号": "1",
          "数量": "98",
          "ISBN": "978-7-5197-8886-5",
          "码洋": "4,862.00",
          "图书名称": "Legal Matters Around Zhang San",
          "折扣": "66.00",
          "单价": "49.00",
          "包册数": "2+10(14)",
          "货位号": "01-02-027-005"
        },
        {
          "序号": "2",
          "数量": "3",
          "ISBN": "978-7-5197-9009-7",
          "码洋": "255.00",
          "图书名称": "Bankruptcy Trial Practice and Frontier Issues",
          "折扣": "66.00",
          "单价": "85.00",
          "包册数": "0+3(8)",
          "货位号": "01-02-063-002"
        }
      ]
    ]
  }
}

Asset insufficient error

When your page quota is exhausted, the API returns:

json
{
  "code": "06001",
  "msg": "You have run out of the files which could be processed",
  "data": null
}

enable_grounding

When you pass:

bash
--form 'enable_grounding=true'

the backend attempts to include positional references for extracted fields or table cells. This is useful for:

  • highlighting extracted values in the source PDF
  • jumping from extracted results back to the original page
  • human review and verification workflows

How to use the result

  • If you only need values, read fields
  • If you need line-item detail, read tables
  • If you need UI highlighting, consume page and bbox together
  • If you need full context, combine extraction output with documentParsing

Reading with other pages