使用自定义 AI 模型回调

概述

从 v4.1 起，ComPDF Conversion SDK 提供了基于回调的扩展点，允许您为 OCR、版面分析和表格识别接入自己的 AI 推理引擎。相比加载内置 DocumentAI 模型（通过 setDocumentAIModel），您可以：

使用任意您选择的模型或运行时进行推理（例如自研引擎、PaddleOCR、云端 OCR API 等）。
通过约定 JSON 格式把结果返回给 SDK。

当 ConvertCallback 中对应能力的回调对被设置后，SDK 在该阶段不再调用内置模型，而是使用您返回的 JSON。若未设置，则 SDK 仍按原有方式调用内置 DocumentAI 模型（前提是模型已加载）。

回调对

每个 AI 能力使用 两个回调：一个 触发回调（SDK 传入页面 PNG 临时文件路径）和一个 结果获取回调（SDK 紧随其后调用以拿到 JSON 字符串）。

能力	触发回调字段 / setter	结果获取回调字段 / setter	触发条件
OCR	`ocr`	`get_ocr_result`	开启 OCR
版面分析	`layout`	`get_layout_result`	开启版面分析（或在开启 OCR 时隐式开启）
表格识别	`table`	`get_table_result`	开启表格识别且版面分析报告了表格区域

规则：

触发回调接收 PNG 临时文件的 UTF-8 路径。推理成功返回 true，返回 false 则该页结果会被 SDK 忽略。
结果获取回调必须返回 UTF-8 JSON 字符串。SDK 会把内容拷贝到内部缓冲区后再消费。
同一能力的两个回调必须成对设置；若只设置其中一个，SDK 会回退到内置实现。
JSON 中的坐标必须使用 触发回调收到的那张图像的像素空间（左上为原点，X 向右，Y 向下）。
置信度过滤：OCR 文本 confidence < 0.1、版面对象 confidence < 0.45 会被 SDK 丢弃。
当您需要的三种能力全部通过自定义回调接入时，可以不调用 setDocumentAIModel。

示例

java

class CustomAICallback implements ConvertCallback {
    @Override
    public void onProgress(int currentPage, int totalPage) {}

    @Override
    public boolean isCancelled() { return false; }

    @Override
    public boolean onOcr(String imagePath) {
        // Run your OCR engine on `imagePath`, cache the JSON result.
        return true;
    }

    @Override
    public boolean onLayout(String imagePath) {
        // Run your layout engine on `imagePath`.
        return true;
    }

    @Override
    public boolean onTable(String imagePath) {
        // Run your table engine on `imagePath`.
        return true;
    }

    @Override
    public String getOcrResult()    { return ""; }  // return UTF-8 JSON
    @Override
    public String getLayoutResult() { return ""; }
    @Override
    public String getTableResult()  { return ""; }
}

WordOptions opt = new WordOptions();
opt.setEnableOcr(true);
opt.setEnableAiLayout(true);
opt.setEnableAiTableRecognition(true);
CPDFConversion.startPDFToWord("input.pdf", "password", "path/output.docx", opt, new CustomAICallback());

JSON 数据结构

OCR 结果 JSON 数据结构

由 OCR 结果 getter 返回。SDK 会优先使用 words[] 中的逐词框；如果未提供 words[]，则会按 text_spans[].rect 均匀拆分字符区域。

json

{
    "text_spans": [
        {
            "text": "Hello World",
            "confidence": 0.98,
            "rotation": 0.0,
            "rect": { "left": 120, "top": 80, "right": 320, "bottom": 110 },
            "style": {
                "font_size": 18.0,
                "font_color": { "r": 0, "g": 0, "b": 0 }
            },
            "words": [
                { "text": "Hello", "rect": { "left": 120, "top": 80, "right": 200, "bottom": 110 } },
                { "text": "World", "rect": { "left": 210, "top": 80, "right": 320, "bottom": 110 } }
            ]
        }
    ]
}

字段	类型	必填	说明
`text_spans`	array	是	页面上识别出的文本片段。
`text`	string	是	文本片段的 UTF-8 内容。
`confidence`	number	否	0.0 - 1.0。低于 0.1 的文本片段会被 SDK 丢弃。
`rotation`	number	否	文本旋转角度，单位为度，默认值为 0。
`rect`	object	是	图像像素坐标中的边界框（`left` / `top` / `right` / `bottom`）。
`style.font_size`	number	否	估算的字体大小，单位为像素。
`style.font_color`	object	否	`{ r, g, b }`，取值范围 0 - 255。
`words`	array	否	逐词边界框。如果省略，SDK 会均匀拆分文本片段边界框。处理中日韩与拉丁字符混排文本时，强烈建议提供该字段以获得更准确的字符间距。

版面分析结果 JSON 数据结构

由版面分析结果 getter 返回。confidence < 0.45 的对象会被 SDK 丢弃。

json

{
    "objects": [
        { "type": "title", "confidence": 0.95, "rect": { "left": 60, "top": 50, "right": 540, "bottom": 90 } },
        { "type": "paragraph", "confidence": 0.97, "rect": { "left": 60, "top": 100, "right": 540, "bottom": 220 } },
        { "type": "figure", "confidence": 0.92, "rect": { "left": 80, "top": 240, "right": 520, "bottom": 460 } },
        { "type": "table", "confidence": 0.93, "rect": { "left": 60, "top": 480, "right": 540, "bottom": 700 } }
    ]
}

支持的 type 取值：

取值	含义
`paragraph`	正文段落
`title`	标题
`figure`	图片或图形
`figure_title`	图片标题
`figure_caption`	图片说明文字
`table`	表格区域。表格是否有边框由表格识别阶段判断，而不是由版面标签判断。
`table_title`	表格标题
`table_caption`	表格说明文字
`ordered_list`	有序列表
`unordered_list`	无序列表
`catalogue`	目录
`formula`	数学公式
`code`	代码块
`algorithm`	算法块
`header`	页眉
`footer`	页脚
`page_number`	页码
`reference`	参考文献或引用

未列出的 type 会被忽略。请使用上表中的取值作为自定义输出中的标准版面标签。

表格识别结果 JSON 数据结构

每个检测到的表格区域会调用一次表格识别结果 getter。多边形使用 8 个整数 [x0, y0, x1, y1, x2, y2, x3, y3]，顺序为左上、右上、右下、左下。

json

{
    "type": "table_with_line",
    "position": [60, 480, 540, 480, 540, 700, 60, 700],
    "rows": 3,
    "cols": 2,
    "angle": 0.0,
    "height_of_rows": [40, 60, 60],
    "width_of_cols": [200, 280],
    "table_cells": [
        {
            "start_row": 0,
            "end_row": 0,
            "start_col": 0,
            "end_col": 0,
            "cell_background_color_r": 240,
            "cell_background_color_g": 240,
            "cell_background_color_b": 240,
            "position": [60, 480, 260, 480, 260, 520, 60, 520]
        }
    ]
}

字段	类型	说明
`type`	string	`table_with_line` 表示有线表格；其他值会被视为非标准（无线）表格。
`position`	int[8]	表格在图像像素坐标中的多边形。
`rows` / `cols`	int	行数 / 列数。
`angle`	number	倾斜角度，单位为度。
`height_of_rows`	int[]	每一行的像素高度，长度等于 `rows`。
`width_of_cols`	int[]	每一列的像素宽度，长度等于 `cols`。
`table_cells[]`	array	每个合并单元格对应一个条目。
`start_row` / `end_row`	int	单元格跨越的起止行，闭区间。
`start_col` / `end_col`	int	单元格跨越的起止列，闭区间。
`cell_background_color_*`	int	单元格背景色分量，取值范围 0 - 255。
`position`	int[8]	单元格在图像像素坐标中的多边形。

提示：验证 JSON 输出

如需参考输出进行对比，可以先使用内置 DocumentAI 模型执行一次转换。SDK 内部使用相同的 JSON 结构，因此自定义模型输出应遵循相同格式。

使用自定义 AI 模型回调 ​

概述 ​

回调对 ​

示例 ​

JSON 数据结构 ​

OCR 结果 JSON 数据结构 ​

版面分析结果 JSON 数据结构 ​

表格识别结果 JSON 数据结构 ​

提示：验证 JSON 输出 ​

使用自定义 AI 模型回调

概述

回调对

示例

JSON 数据结构

OCR 结果 JSON 数据结构

版面分析结果 JSON 数据结构

表格识别结果 JSON 数据结构

提示：验证 JSON 输出