Use Custom AI Models via Callbacks
Overview
Starting with SDK v4.1.0, the PHP SDK exposes the same callback-based extension point as the C++ SDK: you can plug in your own AI inference engine for OCR, Layout Analysis, and Table Recognition and return the result as a JSON string. When the relevant callback pair is registered on ConvertCallback, the SDK skips its built-in DocumentAI invocation for that capability and consumes your JSON output instead. If a pair is left unset, the SDK falls back to the built-in DocumentAI model.
Callback Pairs
Each AI capability uses two callbacks: a trigger that receives the path to a page image (saved as PNG in a temporary directory) and a result getter that returns the JSON string.
| Capability | Trigger callback | Result getter callback | Triggered when |
|---|---|---|---|
| OCR | $onOcr | $onOcrResult | enableOcr = true |
| Layout Analysis | $onLayout | $onLayoutResult | enableAiLayout = true or enableOcr = true |
| Table Recognition | $onTable | $onTableResult | enableAiTableRecognition = true and a table region is detected by layout analysis |
Rules:
- The trigger receives a UTF-8 path to a PNG file. Return
trueif inference succeeded, orfalseto make the SDK ignore the result for that page. - The getter must return a UTF-8 JSON string. The PHP SDK keeps the returned string alive in an internal buffer for the SDK to read.
- Both callbacks for a capability must be set together. If only one is provided, the SDK falls back to the built-in path.
- Coordinates in your JSON must be in the pixel space of the image received by the trigger, with top-left origin, X to the right, and Y down.
Sample
use ComPDFKit\Conversion\Conversion;
use ComPDFKit\Conversion\ConvertCallback;
use ComPDFKit\Conversion\ConvertOption;
use ComPDFKit\Conversion\LibraryManager;
use ComPDFKit\Conversion\OcrLanguage;
LibraryManager::licenseVerify('LICENSE_KEY', 'device_id', 'app_id');
LibraryManager::initialize(__DIR__ . '/../');
$ocrJson = '';
$layoutJson = '';
$tableJson = '';
$cb = new ConvertCallback();
$cb->onOcr = static function (string $imagePath) use (&$ocrJson): bool {
$ocrJson = MyOcrModel::run($imagePath); // your own engine
return $ocrJson !== '';
};
$cb->onOcrResult = static function () use (&$ocrJson): string {
return $ocrJson;
};
$cb->onLayout = static function (string $imagePath) use (&$layoutJson): bool {
$layoutJson = MyLayoutModel::run($imagePath);
return $layoutJson !== '';
};
$cb->onLayoutResult = static function () use (&$layoutJson): string {
return $layoutJson;
};
$cb->onTable = static function (string $imagePath) use (&$tableJson): bool {
$tableJson = MyTableModel::run($imagePath);
return $tableJson !== '';
};
$cb->onTableResult = static function () use (&$tableJson): string {
return $tableJson;
};
$option = new ConvertOption();
$option->enableOcr = true;
$option->enableAiLayout = true;
$option->languages = [OcrLanguage::ENGLISH];
Conversion::pdfToWord('input.pdf', '', 'output.docx', $option, $cb);
LibraryManager::release();You can register only the capabilities you want to override and leave the rest unset to keep the built-in behavior.
Thread Safety and Lifetime
- Callbacks are invoked synchronously from the same OS thread that called the conversion function. PHP FFI does not support cross-thread callbacks, so you do not need any locking for the PHP closures themselves.
- The PNG image at the path passed to the trigger lives in the SDK temporary directory and may be deleted shortly after the trigger returns. Copy or process it before returning.
- The
ConvertCallbackinstance and theConversion::pdfTo*()call own the trampolines together; do not modify the callback object until the call returns.
JSON Schemas
OCR Result JSON Schema
Returned by $onOcrResult. The SDK populates each text_spans[].chars[] either from words[] if provided, or by uniformly splitting the span rect.
{
"text_spans": [
{
"text": "Hello World",
"confidence": 0.98,
"rotation": 0.0,
"rect": { "left": 120, "top": 80, "right": 320, "bottom": 110 },
"style": {
"font_size": 18.0,
"font_color": { "r": 0, "g": 0, "b": 0 }
},
"words": [
{ "text": "Hello", "rect": { "left": 120, "top": 80, "right": 200, "bottom": 110 } },
{ "text": "World", "rect": { "left": 210, "top": 80, "right": 320, "bottom": 110 } }
]
}
]
}| Field | Type | Required | Description |
|---|---|---|---|
text | string | Yes | UTF-8 text content of the span. |
confidence | number | No | 0.0 – 1.0. Spans below 0.1 are discarded. |
rotation | number | No | Text rotation in degrees. Default 0. |
rect | object | Yes | Bounding box in image pixels (left / top / right / bottom). |
style.font_size | number | No | Estimated font size in pixels. |
style.font_color | object | No | { r, g, b } 0 – 255. |
words | array | No | Per-word boxes. If omitted, the SDK splits the span rect evenly. Strongly recommended for CJK + Latin mixed lines for correct glyph spacing. |
Layout Analysis Result JSON Schema
Returned by $onLayoutResult. Objects with confidence < 0.45 are discarded.
{
"objects": [
{ "type": "title", "confidence": 0.95, "rect": { "left": 60, "top": 50, "right": 540, "bottom": 90 } },
{ "type": "paragraph", "confidence": 0.97, "rect": { "left": 60, "top": 100, "right": 540, "bottom": 220 } },
{ "type": "figure", "confidence": 0.92, "rect": { "left": 80, "top": 240, "right": 520, "bottom": 460 } },
{ "type": "table", "confidence": 0.93, "rect": { "left": 60, "top": 480, "right": 540, "bottom": 700 } }
]
}Supported type values:
| Value | Meaning |
|---|---|
paragraph | Body text paragraph |
title | Heading |
figure | Image or figure |
figure_title | Figure caption header |
figure_caption | Figure caption text |
table | Table region. Whether the table is bordered or borderless is determined by the table recognition stage, not by the layout label. |
table_title | Table caption header |
table_caption | Table caption text |
ordered_list | Ordered list |
unordered_list | Unordered list |
catalogue | Table of contents |
formula | Math formula |
code | Code block |
algorithm | Algorithm block |
header | Page header |
footer | Page footer |
page_number | Page number |
reference | Reference or citation |
Objects with a type value that is not listed above are ignored. Use the values in this table as the canonical layout labels in your custom output.
Table Recognition Result JSON Schema
Returned by $onTableResult once per detected table region. Polygons use eight integers [x0, y0, x1, y1, x2, y2, x3, y3] in the order top-left, top-right, bottom-right, bottom-left.
{
"type": "table_with_line",
"position": [60, 480, 540, 480, 540, 700, 60, 700],
"rows": 3,
"cols": 2,
"angle": 0.0,
"height_of_rows": [40, 60, 60],
"width_of_cols": [200, 280],
"table_cells": [
{
"start_row": 0,
"end_row": 0,
"start_col": 0,
"end_col": 0,
"cell_background_color_r": 240,
"cell_background_color_g": 240,
"cell_background_color_b": 240,
"position": [60, 480, 260, 480, 260, 520, 60, 520]
}
]
}| Field | Type | Description |
|---|---|---|
type | string | table_with_line for bordered tables; any other value is treated as a non-standard (borderless) table. |
position | int[8] | Table polygon in image pixels. |
rows / cols | int | Row / column counts. |
angle | number | Skew angle in degrees. |
height_of_rows | int[] | Per-row pixel heights (length = rows). |
width_of_cols | int[] | Per-column pixel widths (length = cols). |
table_cells[] | array | One entry per merged cell. |
start_row / end_row | int | Inclusive row span of the cell. |
start_col / end_col | int | Inclusive column span of the cell. |
cell_background_color_* | int | Cell background color components (0 – 255). |
position | int[8] | Cell polygon in image pixels. |
Tip: Validate Your JSON
If you need a reference output to compare against, run a conversion once with the built-in DocumentAI model. The SDK uses the same JSON shape internally, so your custom output should follow the same structure.