通过回调使用自定义 AI 模型

概述

从 SDK v4.1.0 开始，PHP SDK 暴露了与 C++ SDK 相同的基于回调的扩展点：您可以插入自己的 AI 推理引擎进行 OCR、版面分析和表格识别，并以 JSON 字符串返回结果。当在 ConvertCallback 上注册相关回调对后，SDK 会跳过该能力的内置 DocumentAI 调用，转而消费您的 JSON 输出。如果回调对未设置，SDK 会回退到内置的 DocumentAI 模型。

回调对

每种 AI 能力使用两个回调：一个触发器，接收页面图片的路径（以 PNG 格式保存在临时目录中）；一个结果获取器，返回 JSON 字符串。

能力	触发器回调	结果获取器回调	触发条件
OCR	`$onOcr`	`$onOcrResult`	`enableOcr = true`
版面分析	`$onLayout`	`$onLayoutResult`	`enableAiLayout = true` 或 `enableOcr = true`
表格识别	`$onTable`	`$onTableResult`	`enableAiTableRecognition = true` 且版面分析检测到表格区域

规则：

触发器接收一个 PNG 文件的 UTF-8 路径。推理成功返回 true，返回 false 则 SDK 会忽略该页的结果。
获取器必须返回一个 UTF-8 JSON 字符串。PHP SDK 会将返回的字符串保存在内部缓冲区中供 SDK 读取。
一种能力的两个回调必须同时设置。如果只设置一个，SDK 会回退到内置路径。
JSON 中的坐标必须位于触发器接收的图片的像素空间中，以左上角为原点，X 向右，Y 向下。

示例

php

use ComPDFKit\Conversion\Conversion;
use ComPDFKit\Conversion\ConvertCallback;
use ComPDFKit\Conversion\ConvertOption;
use ComPDFKit\Conversion\LibraryManager;
use ComPDFKit\Conversion\OcrLanguage;

LibraryManager::licenseVerify('LICENSE_KEY', 'device_id', 'app_id');
LibraryManager::initialize(__DIR__ . '/../');

$ocrJson = '';
$layoutJson = '';
$tableJson = '';

$cb = new ConvertCallback();

$cb->onOcr = static function (string $imagePath) use (&$ocrJson): bool {
    $ocrJson = MyOcrModel::run($imagePath); // 您自己的引擎
    return $ocrJson !== '';
};
$cb->onOcrResult = static function () use (&$ocrJson): string {
    return $ocrJson;
};

$cb->onLayout = static function (string $imagePath) use (&$layoutJson): bool {
    $layoutJson = MyLayoutModel::run($imagePath);
    return $layoutJson !== '';
};
$cb->onLayoutResult = static function () use (&$layoutJson): string {
    return $layoutJson;
};

$cb->onTable = static function (string $imagePath) use (&$tableJson): bool {
    $tableJson = MyTableModel::run($imagePath);
    return $tableJson !== '';
};
$cb->onTableResult = static function () use (&$tableJson): string {
    return $tableJson;
};

$option = new ConvertOption();
$option->enableOcr = true;
$option->enableAiLayout = true;
$option->languages = [OcrLanguage::ENGLISH];

Conversion::pdfToWord('input.pdf', '', 'output.docx', $option, $cb);

LibraryManager::release();

您可以只注册想要覆盖的能力，其余保持未设置以保留内置行为。

线程安全与生命周期

回调由调用转换函数的同一个 OS 线程同步调用。PHP FFI 不支持跨线程回调，因此 PHP 闭包本身不需要任何锁。
传递给触发器的路径下的 PNG 图片位于 SDK 临时目录中，可能在触发器返回后不久被删除。请在返回前复制或处理它。
ConvertCallback 实例和 Conversion::pdfTo*() 调用共同拥有跳板（trampoline）；在调用返回之前不要修改回调对象。

JSON 模式

OCR 结果 JSON 模式

由 $onOcrResult 返回。SDK 会从 words[]（如果提供）填充每个 text_spans[].chars[]，否则均匀拆分 span 的 rect。

json

{
  "text_spans": [
    {
      "text": "Hello World",
      "confidence": 0.98,
      "rotation": 0.0,
      "rect": { "left": 120, "top": 80, "right": 320, "bottom": 110 },
      "style": {
        "font_size": 18.0,
        "font_color": { "r": 0, "g": 0, "b": 0 }
      },
      "words": [
        { "text": "Hello", "rect": { "left": 120, "top": 80, "right": 200, "bottom": 110 } },
        { "text": "World", "rect": { "left": 210, "top": 80, "right": 320, "bottom": 110 } }
      ]
    }
  ]
}

字段	类型	必填	说明
`text`	string	是	span 的 UTF-8 文本内容。
`confidence`	number	否	0.0 – 1.0。低于 0.1 的 span 会被丢弃。
`rotation`	number	否	文本旋转角度（度）。默认 0。
`rect`	object	是	图片像素中的边界框（`left` / `top` / `right` / `bottom`）。
`style.font_size`	number	否	估计的字体大小（像素）。
`style.font_color`	object	否	`{ r, g, b }` 0 – 255。
`words`	array	否	每个单词的框。如果省略，SDK 会均匀拆分 span 的 rect。对于 CJK + 拉丁混排行强烈建议提供，以获得正确的字形间距。

版面分析结果 JSON 模式

由 $onLayoutResult 返回。confidence < 0.45 的对象会被丢弃。

json

{
  "objects": [
    { "type": "title", "confidence": 0.95, "rect": { "left": 60, "top": 50, "right": 540, "bottom": 90 } },
    { "type": "paragraph", "confidence": 0.97, "rect": { "left": 60, "top": 100, "right": 540, "bottom": 220 } },
    { "type": "figure", "confidence": 0.92, "rect": { "left": 80, "top": 240, "right": 520, "bottom": 460 } },
    { "type": "table", "confidence": 0.93, "rect": { "left": 60, "top": 480, "right": 540, "bottom": 700 } }
  ]
}

支持的 type 值：

值	含义
`paragraph`	正文段落
`title`	标题
`figure`	图片或图形
`figure_title`	图片标题头
`figure_caption`	图片标题文本
`table`	表格区域。表格是有边框还是无边框由表格识别阶段决定，而不是由版面标签决定。
`table_title`	表格标题头
`table_caption`	表格标题文本
`ordered_list`	有序列表
`unordered_list`	无序列表
`catalogue`	目录
`formula`	数学公式
`code`	代码块
`algorithm`	算法块
`header`	页眉
`footer`	页脚
`page_number`	页码
`reference`	引用或参考文献

type 值未在上表中列出的对象会被忽略。请使用上表中的值作为自定义输出中的规范版面标签。

表格识别结果 JSON 模式

由 $onTableResult 为每个检测到的表格区域返回一次。多边形使用 8 个整数 [x0, y0, x1, y1, x2, y2, x3, y3]，顺序为左上、右上、右下、左下。

json

{
  "type": "table_with_line",
  "position": [60, 480, 540, 480, 540, 700, 60, 700],
  "rows": 3,
  "cols": 2,
  "angle": 0.0,
  "height_of_rows": [40, 60, 60],
  "width_of_cols": [200, 280],
  "table_cells": [
    {
      "start_row": 0,
      "end_row": 0,
      "start_col": 0,
      "end_col": 0,
      "cell_background_color_r": 240,
      "cell_background_color_g": 240,
      "cell_background_color_b": 240,
      "position": [60, 480, 260, 480, 260, 520, 60, 520]
    }
  ]
}

字段	类型	说明
`type`	string	`table_with_line` 表示有边框表格；任何其他值被视为非标准（无边框）表格。
`position`	int[8]	图片像素中的表格多边形。
`rows` / `cols`	int	行/列数。
`angle`	number	倾斜角度（度）。
`height_of_rows`	int[]	每行像素高度（长度 = `rows`）。
`width_of_cols`	int[]	每列像素宽度（长度 = `cols`）。
`table_cells[]`	array	每个合并单元格一项。
`start_row` / `end_row`	int	单元格的包含性行跨度。
`start_col` / `end_col`	int	单元格的包含性列跨度。
`cell_background_color_*`	int	单元格背景颜色分量（0 – 255）。
`position`	int[8]	图片像素中的单元格多边形。

提示：验证您的 JSON

如果您需要参考输出进行对比，可先使用内置 DocumentAI 模型运行一次转换。SDK 内部使用相同的 JSON 结构，因此您的自定义输出应遵循相同的结构。

通过回调使用自定义 AI 模型 ​

概述 ​

回调对 ​

示例 ​

线程安全与生命周期 ​

JSON 模式 ​

OCR 结果 JSON 模式 ​

版面分析结果 JSON 模式 ​

表格识别结果 JSON 模式 ​

提示：验证您的 JSON ​

通过回调使用自定义 AI 模型

概述

回调对

示例

线程安全与生命周期

JSON 模式

OCR 结果 JSON 模式

版面分析结果 JSON 模式

表格识别结果 JSON 模式

提示：验证您的 JSON