Use Custom AI Models via Callbacks
Overview
Starting with SDK v4.1.0, ComPDF Conversion SDK exposes a callback-based extension point that lets you plug in your own AI inference engine for OCR, Layout Analysis, and Table Recognition. Instead of relying on the built-in DocumentAI model loaded by LibraryManager.SetDocumentAIModel, you can:
- Run inference with any model or runtime you choose (e.g. your in-house engine, PaddleOCR, a cloud OCR API).
- Return the result to the SDK as a JSON string with a well-defined schema.
When the relevant callback pair is registered on ConvertCallback, the SDK skips its built-in model invocation for that capability and consumes your JSON output instead. If a pair is left as IntPtr.Zero (the default), the SDK falls back to the built-in DocumentAI model (when available).
Callback Pairs
Each AI capability uses two delegates: a trigger (invoked by the SDK with the path to a page image saved as PNG in a temp directory) and a result getter (invoked by the SDK immediately afterwards to retrieve the JSON string as a UTF-8 native pointer).
| Capability | Trigger delegate | Result getter delegate | Trigger field on ConvertCallback | Getter field on ConvertCallback | Triggered when |
|---|---|---|---|---|---|
| OCR | OnOCR | OnGetOCRResult | ocr | get_ocr_result | EnableOCR = true |
| Layout Analysis | OnLayout | OnGetLayoutResult | layout | get_layout_result | EnableAiLayout = true or EnableOCR = true |
| Table Recognition | OnTable | OnGetTableResult | table | get_table_result | EnableAiTableRecognition = true and a table region is detected by layout analysis |
Rules:
- The trigger receives a UTF-8 path to a PNG file. Return
trueif your inference succeeded,falseto make the SDK ignore the result for that page. - The getter must return an
IntPtrto a UTF-8 JSON C-string. The pointed-to buffer must remain valid until the SDK has finished parsing it (typically until the next call to the same getter). UseMarshal.StringToHGlobalAnsi/Marshal.StringToCoTaskMemUTF8or pin a managedbyte[]to keep it alive. - Both delegates for a capability must be set together. If only one is provided, the SDK falls back to the built-in path.
- Keep the delegate instances alive for the entire conversion (store them in a field or
staticvariable). Otherwise the GC may collect them and the native call will crash. - Coordinates in your JSON must be in the pixel space of the image the trigger received (top-left origin, X right, Y down).
- Confidence filtering: OCR spans with
confidence < 0.1and layout objects withconfidence < 0.45are discarded by the SDK.
Sample
using System;
using System.Runtime.InteropServices;
using ComPDF_Conversion.Common;
using ComPDF_Conversion.Converter;
public class CustomAiIntegration
{
// Buffers / delegates owned by the integrator. Must outlive each SDK call.
private static IntPtr g_ocr_json = IntPtr.Zero;
private static IntPtr g_layout_json = IntPtr.Zero;
private static IntPtr g_table_json = IntPtr.Zero;
private static readonly OnOCR ocrTrigger = MyOcrTrigger;
private static readonly OnLayout layoutTrigger = MyLayoutTrigger;
private static readonly OnTable tableTrigger = MyTableTrigger;
private static readonly OnGetOCRResult ocrGetter = MyOcrGetter;
private static readonly OnGetLayoutResult layoutGetter = MyLayoutGetter;
private static readonly OnGetTableResult tableGetter = MyTableGetter;
private static bool MyOcrTrigger(string image_path) {
string json = RunMyOcrModel(image_path); // produce JSON (see schema in the API reference)
UpdateUtf8Buffer(ref g_ocr_json, json);
return !string.IsNullOrEmpty(json);
}
private static IntPtr MyOcrGetter() { return g_ocr_json; }
private static bool MyLayoutTrigger(string image_path) {
string json = RunMyLayoutModel(image_path);
UpdateUtf8Buffer(ref g_layout_json, json);
return !string.IsNullOrEmpty(json);
}
private static IntPtr MyLayoutGetter() { return g_layout_json; }
private static bool MyTableTrigger(string image_path) {
string json = RunMyTableModel(image_path);
UpdateUtf8Buffer(ref g_table_json, json);
return !string.IsNullOrEmpty(json);
}
private static IntPtr MyTableGetter() { return g_table_json; }
private static void UpdateUtf8Buffer(ref IntPtr slot, string json) {
if (slot != IntPtr.Zero) Marshal.FreeHGlobal(slot);
if (string.IsNullOrEmpty(json)) { slot = IntPtr.Zero; return; }
byte[] bytes = System.Text.Encoding.UTF8.GetBytes(json + "\0");
slot = Marshal.AllocHGlobal(bytes.Length);
Marshal.Copy(bytes, 0, slot, bytes.Length);
}
public static ErrorCode Convert(string input, string password, string output) {
/* SetDocumentAIModel is not required when callbacks cover OCR,
Layout Analysis, and Table Recognition. */
ConvertCallback callback = new ConvertCallback();
callback.ocr = Marshal.GetFunctionPointerForDelegate(ocrTrigger);
callback.layout = Marshal.GetFunctionPointerForDelegate(layoutTrigger);
callback.table = Marshal.GetFunctionPointerForDelegate(tableTrigger);
callback.get_ocr_result = Marshal.GetFunctionPointerForDelegate(ocrGetter);
callback.get_layout_result = Marshal.GetFunctionPointerForDelegate(layoutGetter);
callback.get_table_result = Marshal.GetFunctionPointerForDelegate(tableGetter);
WordOptions opt = new WordOptions();
opt.EnableOCR = true; // triggers ocr + layout callbacks
opt.EnableAiLayout = true;
opt.EnableAiTableRecognition = true;
return CPDFConversion.StartPDFToWord(input, password, output, opt, callback);
}
}Thread Safety
The SDK invokes the trigger and getter pair sequentially on the same worker thread for a given page. If multiple conversion tasks run concurrently, each task should use its own ConvertCallback and its own JSON buffer slot (do not share the static IntPtr slots in the snippet above across concurrent tasks).
JSON Schemas
OCR Result JSON Schema
Returned by get_ocr_result. The SDK populates each text_spans[].chars[] either from words[] if provided, or by uniformly splitting the span rect.
{
"text_spans": [
{
"text": "Hello World",
"confidence": 0.98,
"rotation": 0.0,
"rect": { "left": 120, "top": 80, "right": 320, "bottom": 110 },
"style": {
"font_size": 18.0,
"font_color": { "r": 0, "g": 0, "b": 0 }
},
"words": [
{ "text": "Hello", "rect": { "left": 120, "top": 80, "right": 200, "bottom": 110 } },
{ "text": "World", "rect": { "left": 210, "top": 80, "right": 320, "bottom": 110 } }
]
}
]
}| Field | Type | Required | Description |
|---|---|---|---|
text_spans | array | Yes | Recognized text spans on the page. |
text | string | Yes | UTF-8 text content of the span. |
confidence | number | No | 0.0 – 1.0. Spans below 0.1 are discarded. |
rotation | number | No | Text rotation in degrees. Default 0. |
rect | object | Yes | Bounding box in image pixels (left/top/right/bottom). |
style.font_size | number | No | Estimated font size in pixels. |
style.font_color | object | No | { r, g, b } 0 – 255. |
words | array | No | Per-word boxes. If omitted, the SDK splits the span rect evenly. Strongly recommended for CJK + Latin mixed lines for correct glyph spacing. |
Layout Analysis Result JSON Schema
Returned by get_layout_result. Objects with confidence < 0.45 are discarded.
{
"objects": [
{ "type": "title", "confidence": 0.95, "rect": { "left": 60, "top": 50, "right": 540, "bottom": 90 } },
{ "type": "paragraph", "confidence": 0.97, "rect": { "left": 60, "top": 100, "right": 540, "bottom": 220 } },
{ "type": "figure", "confidence": 0.92, "rect": { "left": 80, "top": 240, "right": 520, "bottom": 460 } },
{ "type": "table", "confidence": 0.93, "rect": { "left": 60, "top": 480, "right": 540, "bottom": 700 } }
]
}Supported type values:
| Value | Meaning |
|---|---|
paragraph | Body text paragraph |
title | Heading |
figure | Image or figure |
figure_title | Figure caption header |
figure_caption | Figure caption text |
table | Table region. Whether the table is bordered or borderless is determined by the table recognition stage, not by the layout label. |
table_title | Table caption header |
table_caption | Table caption text |
ordered_list | Ordered list |
unordered_list | Unordered list |
catalogue | Table of contents |
formula | Math formula |
code | Code block |
algorithm | Algorithm block |
header | Page header |
footer | Page footer |
page_number | Page number |
reference | Reference or citation |
Objects with a type value that is not listed above are ignored. Use the values in this table as the canonical layout labels in your custom output.
Table Recognition Result JSON Schema
Returned by get_table_result once per detected table region. Polygons use 8 integers [x0, y0, x1, y1, x2, y2, x3, y3] in the order top-left, top-right, bottom-right, bottom-left.
{
"type": "table_with_line",
"position": [60, 480, 540, 480, 540, 700, 60, 700],
"rows": 3,
"cols": 2,
"angle": 0.0,
"height_of_rows": [40, 60, 60],
"width_of_cols": [200, 280],
"table_cells": [
{
"start_row": 0,
"end_row": 0,
"start_col": 0,
"end_col": 0,
"cell_background_color_r": 240,
"cell_background_color_g": 240,
"cell_background_color_b": 240,
"position": [60, 480, 260, 480, 260, 520, 60, 520]
}
]
}| Field | Type | Description |
|---|---|---|
type | string | table_with_line for bordered tables; any other value is treated as a non-standard (borderless) table. |
position | int[8] | Table polygon in image pixels. |
rows / cols | int | Row / column counts. |
angle | number | Skew angle in degrees. |
height_of_rows | int[] | Per-row pixel heights (length = rows). |
width_of_cols | int[] | Per-column pixel widths (length = cols). |
table_cells[] | array | One entry per merged cell. |
start_row / end_row | int | Inclusive row span of the cell. |
start_col / end_col | int | Inclusive column span of the cell. |
cell_background_color_* | int | Cell background color components (0 – 255). |
position | int[8] | Cell polygon in image pixels. |
Tip: Validate Your JSON
If you need a reference output to compare against, run a conversion once with the built-in DocumentAI model. The SDK uses the same JSON shape internally, so your custom output should follow the same structure.