Use Custom AI Models via Callbacks
Overview
Starting with SDK v4.1.0, ComPDF Conversion SDK exposes a callback-based extension point that lets you plug in your own AI inference engine for OCR, Layout Analysis, and Table Recognition. Instead of relying on the built-in DocumentAI model loaded by SetDocumentAIModel, you can:
- Run inference with any model or runtime you choose (e.g. your in-house engine, PaddleOCR, a cloud OCR API).
- Return the result to the SDK as a JSON string with a well-defined schema.
When the relevant callback pair is registered on ConvertCallback, the SDK skips its built-in model invocation for that capability and consumes your JSON output instead. If a pair is left unset, the SDK falls back to the built-in DocumentAI model (when available).
Callback Pairs
Each AI capability uses two callbacks: a trigger (invoked by the SDK with the path to a page image saved as PNG in a temp directory) and a result getter (invoked by the SDK immediately afterwards to retrieve the JSON string).
| Capability | Trigger field / setter | Result getter field / setter | Triggered when |
|---|---|---|---|
| OCR | ocr | get_ocr_result | OCR is enabled |
| Layout Analysis | layout | get_layout_result | layout analysis is enabled (or implicitly when OCR is enabled) |
| Table Recognition | table | get_table_result | table recognition is enabled and a table region is detected by layout analysis |
Rules:
- The trigger receives a UTF-8 path to a PNG file. Return
trueif your inference succeeded,falseto make the SDK ignore the result for that page. - The getter must return a UTF-8 JSON string. The SDK copies the string into an internal buffer before consuming it.
- Both callbacks for a capability must be set together. If only one is provided, the SDK falls back to the built-in path.
- Coordinates in your JSON must be in the pixel space of the image the trigger received (top-left origin, X right, Y down).
- Confidence filtering: OCR spans with
confidence < 0.1and layout objects withconfidence < 0.45are discarded by the SDK. - When all three capabilities you need are covered by your own callbacks,
SetDocumentAIModeldoes not have to be called.
Sample
// Embed BaseCallback so you only override the methods you need.
type customAICallback struct {
compdf.BaseCallback
ocrJSON string
layoutJSON string
tableJSON string
}
func (c *customAICallback) OnOCR(imagePath string) bool {
// Run your OCR engine on `imagePath`, cache the JSON result.
c.ocrJSON = ""
return true
}
func (c *customAICallback) OnLayout(imagePath string) bool {
c.layoutJSON = ""
return true
}
func (c *customAICallback) OnTable(imagePath string) bool {
c.tableJSON = ""
return true
}
func (c *customAICallback) GetOCRResult() string { return c.ocrJSON }
func (c *customAICallback) GetLayoutResult() string { return c.layoutJSON }
func (c *customAICallback) GetTableResult() string { return c.tableJSON }
callback := &customAICallback{}
wordOptions := compdf.NewWordOptions()
wordOptions.EnableOCR = true
wordOptions.EnableAILayout = true
wordOptions.EnableAITableRecognition = true
err := compdf.StartPDFToWord("input.pdf", "password", "path/output.docx", wordOptions, callback)JSON Schemas
The expected JSON schema for each capability, including field names, coordinate space (pixel space of the image the trigger received), and value ranges, is the same across all language bindings. See the C++ guide chapter Use Custom AI Models via Callbacks.