Use Custom AI Models via Callbacks

Overview

Starting with SDK v4.1.0, ComPDF Conversion SDK exposes a callback-based extension point that lets you plug in your own AI inference engine for OCR, Layout Analysis, and Table Recognition. Instead of relying on the built-in DocumentAI model loaded by LibraryManager::SetDocumentAIModel, you can:

Run inference with any model or runtime you choose (e.g. your in-house engine, PaddleOCR, a cloud OCR API).
Return the result to the SDK as a JSON string with a well-defined schema.

When the relevant callback pair is registered on CConvertCallback, the SDK skips its built-in model invocation for that capability and consumes your JSON output instead. If a pair is left as nullptr, the SDK falls back to the built-in DocumentAI model (when available).

Callback Pairs

Each AI capability uses two callbacks: a trigger (invoked by the SDK with the path to a page image saved as PNG in a temp directory) and a result getter (invoked by the SDK immediately afterwards to retrieve the JSON string).

Capability	Trigger callback	Result getter callback	Triggered when
OCR	`ocr`	`get_ocr_result`	`enable_ocr = true`
Layout Analysis	`layout`	`get_layout_result`	`enable_ai_layout = true` or `enable_ocr = true`
Table Recognition	`table`	`get_table_result`	`enable_ai_table_recognition = true` and a table region is detected by layout analysis

Rules:

The trigger receives a UTF-8 path to a PNG file. Return true if your inference succeeded, false to make the SDK ignore the result for that page.
The getter must return a UTF-8 JSON C-string. The pointer must remain valid until the SDK has finished parsing it (typically until the next call to the same getter).
Both callbacks for a capability must be set together. If only one is provided, the SDK falls back to the built-in path.
Coordinates in your JSON must be in the pixel space of the image the trigger received (top-left origin, X right, Y down).
Confidence filtering: OCR spans with confidence < 0.1 and layout objects with confidence < 0.45 are discarded by the SDK.

Sample

c++

#include "compdf_conversion.h"
#include <string>

using namespace compdf;
using namespace compdf::common;
using namespace compdf::conversion;

// Buffers owned by the integrator. Must outlive each SDK getter call.
static std::string g_ocr_json;
static std::string g_layout_json;
static std::string g_table_json;

// --- OCR ---
bool my_ocr_trigger(const char* image_path) {
    g_ocr_json = run_my_ocr_model(image_path);   // produce JSON (see schema below)
    return !g_ocr_json.empty();
}
const char* my_ocr_getter() { return g_ocr_json.c_str(); }

// --- Layout Analysis ---
bool my_layout_trigger(const char* image_path) {
    g_layout_json = run_my_layout_model(image_path);
    return !g_layout_json.empty();
}
const char* my_layout_getter() { return g_layout_json.c_str(); }

// --- Table Recognition ---
bool my_table_trigger(const char* image_path) {
    g_table_json = run_my_table_model(image_path);
    return !g_table_json.empty();
}
const char* my_table_getter() { return g_table_json.c_str(); }

int main() {
    LibraryManager::LicenseVerify("<license>", "<device_id>", "<app_id>");
    LibraryManager::Initialize("resource");
     /* SetDocumentAIModel is not required when callbacks cover OCR,
       Layout Analysis, and Table Recognition. */

    CConvertCallback callback   = {};
    callback.handle             = nullptr;
    callback.progress           = nullptr;
    callback.cancel             = nullptr;
    callback.ocr                = &my_ocr_trigger;
    callback.get_ocr_result     = &my_ocr_getter;
    callback.layout             = &my_layout_trigger;
    callback.get_layout_result  = &my_layout_getter;
    callback.table              = &my_table_trigger;
    callback.get_table_result   = &my_table_getter;

    ConvertOptions opt;
    opt.enable_ocr       = true;
    opt.enable_ai_layout = true;
    opt.languages        = { OCRLanguage::e_English };

    CPDFConversion::StartPDFToWord("input.pdf", "", "output.docx", opt, &callback);
    return 0;
}

You can register only the capabilities you want to override and leave the others as nullptr to keep the built-in behaviour.

Thread Safety and Lifetime

Callbacks are invoked from the SDK conversion thread. Implement them in a thread-safe manner if your model engine is shared across calls.
The PNG image at image_path lives in the SDK temp directory and may be deleted shortly after the trigger returns; copy or process it before returning.
The JSON pointer returned from a getter must remain valid until the same getter is called again (or until the conversion task ends).

OCR Result JSON Schema

Returned by get_ocr_result. The SDK populates each text_spans[].chars[] either from words[] if provided, or by uniformly splitting the span rect.

json

{
  "text_spans": [
    {
      "text": "Hello World",
      "confidence": 0.98,
      "rotation": 0.0,
      "rect":  { "left": 120, "top": 80, "right": 320, "bottom": 110 },
      "style": {
        "font_size":  18.0,
        "font_color": { "r": 0, "g": 0, "b": 0 }
      },
      "words": [
        { "text": "Hello", "rect": { "left": 120, "top": 80, "right": 200, "bottom": 110 } },
        { "text": "World", "rect": { "left": 210, "top": 80, "right": 320, "bottom": 110 } }
      ]
    }
  ]
}

Field	Type	Required	Description
`text_spans`	array	Yes	Recognized text spans on the page.
`text`	string	Yes	UTF-8 text content of the span.
`confidence`	number	No	0.0 – 1.0. Spans below 0.1 are discarded.
`rotation`	number	No	Text rotation in degrees. Default 0.
`rect`	object	Yes	Bounding box in image pixels (`left`/`top`/`right`/`bottom`).
`style.font_size`	number	No	Estimated font size in pixels.
`style.font_color`	object	No	`{ r, g, b }` 0 – 255.
`words`	array	No	Per-word boxes. If omitted, the SDK splits the span rect evenly. Strongly recommended for CJK + Latin mixed lines for correct glyph spacing.

Layout Analysis Result JSON Schema

Returned by get_layout_result. Objects with confidence < 0.45 are discarded.

json

{
  "objects": [
    { "type": "title",     "confidence": 0.95, "rect": { "left": 60, "top": 50,  "right": 540, "bottom": 90  } },
    { "type": "paragraph", "confidence": 0.97, "rect": { "left": 60, "top": 100, "right": 540, "bottom": 220 } },
    { "type": "figure",    "confidence": 0.92, "rect": { "left": 80, "top": 240, "right": 520, "bottom": 460 } },
    { "type": "table",     "confidence": 0.93, "rect": { "left": 60, "top": 480, "right": 540, "bottom": 700 } }
  ]
}

Supported type values:

Value	Meaning
`paragraph`	Body text paragraph
`title`	Heading
`figure`	Image or figure
`figure_title`	Figure caption header
`figure_caption`	Figure caption text
`table`	Table region. Whether the table is bordered or borderless is determined by the table recognition stage, not by the layout label.
`table_title`	Table caption header
`table_caption`	Table caption text
`ordered_list`	Ordered list
`unordered_list`	Unordered list
`catalogue`	Table of contents
`formula`	Math formula
`code`	Code block
`algorithm`	Algorithm block
`header`	Page header
`footer`	Page footer
`page_number`	Page number
`reference`	Reference or citation

Objects with a type value that is not listed above are ignored. Use the values in this table as the canonical layout labels in your custom output.

Table Recognition Result JSON Schema

Returned by get_table_result once per detected table region. Polygons use 8 integers [x0, y0, x1, y1, x2, y2, x3, y3] in the order top-left, top-right, bottom-right, bottom-left.

json

{
  "type": "table_with_line",
  "position": [60, 480, 540, 480, 540, 700, 60, 700],
  "rows": 3,
  "cols": 2,
  "angle": 0.0,
  "height_of_rows": [40, 60, 60],
  "width_of_cols":  [200, 280],
  "table_cells": [
    {
      "start_row": 0, "end_row": 0,
      "start_col": 0, "end_col": 0,
      "cell_background_color_r": 240,
      "cell_background_color_g": 240,
      "cell_background_color_b": 240,
      "position": [60, 480, 260, 480, 260, 520, 60, 520]
    }
  ]
}

Field	Type	Description
`type`	string	`table_with_line` for bordered tables; any other value is treated as a non-standard (borderless) table.
`position`	int[8]	Table polygon in image pixels.
`rows` / `cols`	int	Row / column counts.
`angle`	number	Skew angle in degrees.
`height_of_rows`	int[]	Per-row pixel heights (length = `rows`).
`width_of_cols`	int[]	Per-column pixel widths (length = `cols`).
`table_cells[]`	array	One entry per merged cell.
`start_row` / `end_row`	int	Inclusive row span of the cell.
`start_col` / `end_col`	int	Inclusive column span of the cell.
`cell_background_color_*`	int	Cell background color components (0 – 255).
`position`	int[8]	Cell polygon in image pixels.

Tip: Validate Your JSON

If you need a reference output to compare against, run a conversion once with the built-in DocumentAI model — the SDK uses the same JSON shape internally, so your custom output should follow the same structure.

Use Custom AI Models via Callbacks ​

Overview ​

Callback Pairs ​

Sample ​

Thread Safety and Lifetime ​

OCR Result JSON Schema ​

Layout Analysis Result JSON Schema ​

Table Recognition Result JSON Schema ​

Tip: Validate Your JSON ​

Use Custom AI Models via Callbacks

Overview

Callback Pairs

Sample

Thread Safety and Lifetime

OCR Result JSON Schema

Layout Analysis Result JSON Schema

Table Recognition Result JSON Schema

Tip: Validate Your JSON