Skip to content
ComPDF

Use Custom AI Models via Callbacks

Overview

Starting with SDK v4.1.0, ComPDF Conversion SDK exposes a callback-based extension point that lets you plug in your own AI inference engine for OCR, Layout Analysis, and Table Recognition. Instead of relying on the built-in DocumentAI model loaded by SetDocumentAIModel, you can:

  • Run inference with any model or runtime you choose (e.g. your in-house engine, PaddleOCR, a cloud OCR API).
  • Return the result to the SDK as a JSON string with a well-defined schema.

When the relevant callback pair is registered on ConvertCallback, the SDK skips its built-in model invocation for that capability and consumes your JSON output instead. If a pair is left unset, the SDK falls back to the built-in DocumentAI model (when available).

Callback Pairs

Each AI capability uses two callbacks: a trigger (invoked by the SDK with the path to a page image saved as PNG in a temp directory) and a result getter (invoked by the SDK immediately afterwards to retrieve the JSON string).

CapabilityTrigger field / setterResult getter field / setterTriggered when
OCRocrget_ocr_resultOCR is enabled
Layout Analysislayoutget_layout_resultlayout analysis is enabled (or implicitly when OCR is enabled)
Table Recognitiontableget_table_resulttable recognition is enabled and a table region is detected by layout analysis

Rules:

  • The trigger receives a UTF-8 path to a PNG file. Return true if your inference succeeded, false to make the SDK ignore the result for that page.
  • The getter must return a UTF-8 JSON string. The SDK copies the string into an internal buffer before consuming it.
  • Both callbacks for a capability must be set together. If only one is provided, the SDK falls back to the built-in path.
  • Coordinates in your JSON must be in the pixel space of the image the trigger received (top-left origin, X right, Y down).
  • Confidence filtering: OCR spans with confidence < 0.1 and layout objects with confidence < 0.45 are discarded by the SDK.
  • When all three capabilities you need are covered by your own callbacks, SetDocumentAIModel does not have to be called.

Sample

go
// Embed BaseCallback so you only override the methods you need.
type customAICallback struct {
    compdf.BaseCallback
    ocrJSON    string
    layoutJSON string
    tableJSON  string
}

func (c *customAICallback) OnOCR(imagePath string) bool {
    // Run your OCR engine on `imagePath`, cache the JSON result.
    c.ocrJSON = ""
    return true
}
func (c *customAICallback) OnLayout(imagePath string) bool {
    c.layoutJSON = ""
    return true
}
func (c *customAICallback) OnTable(imagePath string) bool {
    c.tableJSON = ""
    return true
}

func (c *customAICallback) GetOCRResult() string    { return c.ocrJSON }
func (c *customAICallback) GetLayoutResult() string { return c.layoutJSON }
func (c *customAICallback) GetTableResult() string  { return c.tableJSON }

callback := &customAICallback{}
wordOptions := compdf.NewWordOptions()
wordOptions.EnableOCR = true
wordOptions.EnableAILayout = true
wordOptions.EnableAITableRecognition = true
err := compdf.StartPDFToWord("input.pdf", "password", "path/output.docx", wordOptions, callback)

JSON Schemas

The expected JSON schema for each capability, including field names, coordinate space (pixel space of the image the trigger received), and value ranges, is the same across all language bindings. See the C++ guide chapter Use Custom AI Models via Callbacks.