Intelligent Document Extraction API

BASE URLhttps://api-server.compdf.com/server/

❖ Feature Description

Intelligently extract key fields and structured information from documents.

❖ Request Mode

Synchronous Request (Sync) ✓

The API returns the result file directly after processing. Recommended for small files and real-time interactive scenarios that need immediate feedback.

Asynchronous Request (Async)

The API first returns task acceptance information, then you query progress and results with taskId. Suitable for large files and batch workloads.

Secure Request Mode

Upload and process files through secure mechanisms such as pre-signed URLs. Suitable for high-security and privacy compliance scenarios.

▎Call Flow

1Upload file

2Call API (sync)

3Get result URL

4Download file

▎Usage Limits

Download validity

24 hours

synchronous执行

POSThttps://api-server.compdf.com/server/v2/process/idp/documentExtract

❖ Request Parameters

x-api-key*

Authentication credential sent in the header: x-api-key

Body Parameters multipart/form-data

fileFile*

Choose FileNo file selected

Upload file

passwordstring

File password (if the PDF is password-protected)

languageinteger

API error message language (1 = English, 2 = Chinese)

pageRangesstring

Page range. Page numbers start from 1, for example 1-3,6. Empty means all pages.

enableOcrinteger

Enable OCR (0 = off, 1 = on)

ocrRecognitionLangstring

OCR 识别语言代码，查看支持语言。

ocrOptionstring

OCR strategy: ALL, SCAN_PAGE, INVALID_CHARACTER, or INVALID_CHARACTER_AND_SCAN_PAGE

isOutputDocumentPerPageinteger

Whether to output one file per page (0 = no, 1 = yes)

modestring

抽取模式：vision（默认，基于视觉模型逐页抽取，手写体识别效果更好）或 layout（基于版面结构的一体化抽取，支持大文件、跨页与结果溯源）。两种模式均使用 extract_fields 传入固定抽取 schema。

extractFieldsstring

抽取 schema 的 JSON 字符串（snake_case 别名），vision 与 layout 模式均使用此字段；layout 模式不再使用 document_types 数组。

enableGroundingboolean

可选。snake_case 形式，

optionsJsonstring

可选。options_json 是一个 JSON 字符串，可包含模型配置参数和 ignore_labels；ignore_labels 仅支持 number、footnote、header、header_image、footer、footer_image、aside_text，传入即表示忽略。完整示例：{"use_doc_unwarping":false,"use_chart_recognition":false,"use_seal_recognition":false,"use_ocr_for_image_block":false,"use_layout_detection":true,"layout_shape_mode":"auto","merge_tables":true,"relevel_titles":true,"concatenate_pages":false,"ignore_labels":[]}

❖ Response Properties

Field	Type	Description
`code`	String	Business status code
`msg`	String	Message
`data`	Object	Response data
`data.fileKey`	String	Unique key of the file in the storage system.
`data.taskId`	String	Task ID
`data.fileName`	String	Source file name. Required in presigned mode to generate the object storage upload URL.
`data.downFileName`	String	Output file name after conversion.
`data.fileUrl`	String	Source file storage URL or object storage key.
`data.downloadUrl`	String	File download URL
`data.sourceType`	String	Source file type
`data.targetType`	String	Target file type
`data.fileSize`	Integer	Source file size in bytes.
`data.convertSize`	Integer	Converted file size in bytes.
`data.convertTime`	Integer	Conversion time for a single file, typically in milliseconds.
`data.status`	String	File processing status. Common values: success, failed, processing, etc.
`data.failureCode`	String	Error code when file conversion fails.
`data.failureReason`	String	Error reason when file conversion fails.
`data.fileParameter`	String	Conversion parameter JSON string submitted when creating the task.

🔗Request Example

curl --request POST \
  --url https://api-server.compdf.com/server/v2/process/idp/documentExtract \
  --header 'x-api-key: YOUR API-KEY' \
  --form [email protected] \
  --form mode=vision

✓Response Example
200 OK
{
  "code": "200",
  "msg": "success",
  "data": {
    "fileKey": "<string>",
    "taskId": "<string>",
    "fileName": "<string>",
    "downFileName": "<string>",
    "fileUrl": "<string>",
    "downloadUrl": "<string>",
    "sourceType": "<string>",
    "targetType": "<string>",
    "fileSize": 0,
    "convertSize": 0,
    "convertTime": 0,
    "status": "<string>",
    "failureCode": "<string>",
    "failureReason": "<string>",
    "fileParameter": "<string>"
  }
}