Skip to content
ComPDF

Overview

Starting with SDK v4.1.0, ComPDF Conversion SDK exposes a callback-based extension point that lets you plug in your own AI inference engine for OCR, Layout Analysis, and Table Recognition. Instead of relying on the built-in DocumentAI model loaded by CPDF_SetDocumentAIModel, you can run inference with any model or service and return the result to the SDK as a JSON string.

When the relevant callback pair is registered on CConvertCallback, the SDK skips its built-in model invocation for that capability and consumes your JSON output instead. If a pair is left as NULL, the SDK falls back to the built-in DocumentAI model when available.

Callback Pairs

Each AI capability uses two callbacks: a trigger callback invoked by the SDK with the path to a page image saved as PNG in a temporary directory, and a result getter invoked by the SDK immediately afterwards to retrieve the JSON string.

CapabilityTrigger callbackResult getter callbackTriggered when
OCRocrget_ocr_resultenable_ocr = true
Layout Analysislayoutget_layout_resultenable_ai_layout = true or enable_ocr = true
Table Recognitiontableget_table_resultenable_ai_table_recognition = true and a table region is detected by layout analysis

Rules:

  • The trigger receives a UTF-8 path to a PNG file. Return true if inference succeeded, or false to make the SDK ignore the result for that page.
  • The getter must return a UTF-8 JSON C string. The pointer must remain valid until the SDK has finished parsing it, usually until the next call to the same getter.
  • Both callbacks for a capability must be set together. If only one is provided, the SDK falls back to the built-in path.
  • Coordinates in your JSON must be in the pixel space of the image received by the trigger, with top-left origin, X to the right, and Y down.

Sample

c
#include <stdbool.h>
#include <stddef.h>

#include "compdf_common_c.h"
#include "compdf_conversion_c.h"

static char g_ocr_json[4096];
static char g_layout_json[4096];
static char g_table_json[4096];

static bool MyOcrTrigger(const char* image_path)
{
    return RunMyOcrModel(image_path, g_ocr_json, sizeof(g_ocr_json));
}

static const char* MyOcrGetter(void)
{
    return g_ocr_json;
}

static bool MyLayoutTrigger(const char* image_path)
{
    return RunMyLayoutModel(image_path, g_layout_json, sizeof(g_layout_json));
}

static const char* MyLayoutGetter(void)
{
    return g_layout_json;
}

static bool MyTableTrigger(const char* image_path)
{
    return RunMyTableModel(image_path, g_table_json, sizeof(g_table_json));
}

static const char* MyTableGetter(void)
{
    return g_table_json;
}

int main(void)
{
    CPDF_LicenseVerify("LICENSE_KEY", "device_id", "app_id");
    CPDF_Initialize(CPDF_TEXT("resource"));

    CConvertCallback callback = {0};
    callback.ocr = MyOcrTrigger;
    callback.get_ocr_result = MyOcrGetter;
    callback.layout = MyLayoutTrigger;
    callback.get_layout_result = MyLayoutGetter;
    callback.table = MyTableTrigger;
    callback.get_table_result = MyTableGetter;

    COCRLanguage languages[] = {e_CENGLISH};

    CConvertOption option = CPDF_DefaultConvertOption();
    option.enable_ocr = true;
    option.enable_ai_layout = true;
    option.languages = languages;
    option.language_count = 1;

    CPDF_StartPDFToWord(CPDF_TEXT("input.pdf"), CPDF_TEXT(""), CPDF_TEXT("output.docx"), option, &callback);
    CPDF_Release();
    return 0;
}

You can register only the capabilities you want to override and leave others as NULL to keep the built-in behavior.

Thread Safety and Lifetime

  • Callbacks are invoked from the SDK conversion thread. Implement them in a thread-safe manner if your model engine is shared across calls.
  • The PNG image at image_path lives in the SDK temporary directory and may be deleted shortly after the trigger returns. Copy or process it before returning.
  • The JSON pointer returned from a getter must remain valid until the same getter is called again or until the conversion task ends.

OCR Result JSON Schema

Returned by get_ocr_result. The SDK populates each text_spans[].chars[] either from words[] if provided, or by uniformly splitting the span rect.

json
{
  "text_spans": [
    {
      "text": "Hello World",
      "confidence": 0.98,
      "rotation": 0.0,
      "rect": { "left": 120, "top": 80, "right": 320, "bottom": 110 },
      "style": {
        "font_size": 18.0,
        "font_color": { "r": 0, "g": 0, "b": 0 }
      },
      "words": [
        { "text": "Hello", "rect": { "left": 120, "top": 80, "right": 200, "bottom": 110 } },
        { "text": "World", "rect": { "left": 210, "top": 80, "right": 320, "bottom": 110 } }
      ]
    }
  ]
}
FieldTypeRequiredDescription
text_spansarrayYesRecognized text spans on the page.
textstringYesUTF-8 text content of the span.
confidencenumberNo0.0 – 1.0. Spans below 0.1 are discarded.
rotationnumberNoText rotation in degrees. Default 0.
rectobjectYesBounding box in image pixels (left/top/right/bottom).
style.font_sizenumberNoEstimated font size in pixels.
style.font_colorobjectNo{ r, g, b } 0 – 255.
wordsarrayNoPer-word boxes. If omitted, the SDK splits the span rect evenly. Strongly recommended for CJK + Latin mixed lines for correct glyph spacing.

Layout Analysis Result JSON Schema

Returned by get_layout_result. Objects with confidence < 0.45 are discarded.

json
{
  "objects": [
    { "type": "title", "confidence": 0.95, "rect": { "left": 60, "top": 50, "right": 540, "bottom": 90 } },
    { "type": "paragraph", "confidence": 0.97, "rect": { "left": 60, "top": 100, "right": 540, "bottom": 220 } },
    { "type": "figure", "confidence": 0.92, "rect": { "left": 80, "top": 240, "right": 520, "bottom": 460 } },
    { "type": "table", "confidence": 0.93, "rect": { "left": 60, "top": 480, "right": 540, "bottom": 700 } }
  ]
}

Supported type values:

ValueMeaning
paragraphBody text paragraph
titleHeading
figureImage or figure
figure_titleFigure caption header
figure_captionFigure caption text
tableTable region. Whether the table is bordered or borderless is determined by the table recognition stage, not by the layout label.
table_titleTable caption header
table_captionTable caption text
ordered_listOrdered list
unordered_listUnordered list
catalogueTable of contents
formulaMath formula
codeCode block
algorithmAlgorithm block
headerPage header
footerPage footer
page_numberPage number
referenceReference or citation

Objects with a type value that is not listed above are ignored. Use the values in this table as the canonical layout labels in your custom output.

Table Recognition Result JSON Schema

Returned by get_table_result once per detected table region. Polygons use eight integers [x0, y0, x1, y1, x2, y2, x3, y3] in the order top-left, top-right, bottom-right, bottom-left.

json
{
  "type": "table_with_line",
  "position": [60, 480, 540, 480, 540, 700, 60, 700],
  "rows": 3,
  "cols": 2,
  "angle": 0.0,
  "height_of_rows": [40, 60, 60],
  "width_of_cols": [200, 280],
  "table_cells": [
    {
      "start_row": 0,
      "end_row": 0,
      "start_col": 0,
      "end_col": 0,
      "cell_background_color_r": 240,
      "cell_background_color_g": 240,
      "cell_background_color_b": 240,
      "position": [60, 480, 260, 480, 260, 520, 60, 520]
    }
  ]
}
FieldTypeDescription
typestringtable_with_line for bordered tables; any other value is treated as a non-standard (borderless) table.
positionint[8]Table polygon in image pixels.
rows / colsintRow / column counts.
anglenumberSkew angle in degrees.
height_of_rowsint[]Per-row pixel heights (length = rows).
width_of_colsint[]Per-column pixel widths (length = cols).
table_cells[]arrayOne entry per merged cell.
start_row / end_rowintInclusive row span of the cell.
start_col / end_colintInclusive column span of the cell.
cell_background_color_*intCell background color components (0 – 255).
positionint[8]Cell polygon in image pixels.

Tip: Validate Your JSON

If you need a reference output to compare against, run a conversion once with the built-in DocumentAI model. The SDK uses the same JSON shape internally, so your custom output should follow the same structure.