Table Recognition
Overview
Table Recognition reconstructs the internal structure of tables detected during layout analysis, including rows, columns, merged cells, and cell boundaries, so that the converted document preserves the original tabular semantics instead of producing a flat grid of text fragments.
It is controlled by the independent option enable_ai_table_recognition, which is enabled by default. The table model is only invoked for table regions reported by layout analysis whose detection confidence is below the trusted threshold; high-confidence native PDF tables bypass the model to save inference time.
Typical scenarios that benefit from Table Recognition:
- Borderless or partially bordered tables, where ruling lines alone cannot describe the structure.
- Tables with merged header cells, multi-row headers, or spanning cells, such as financial statements, lab reports, and invoices.
- Scanned tables processed by OCR, where geometric reconstruction is required before cell-level data extraction.
Features that support Table Recognition:
- PDF to Word
- PDF to Excel
- PDF to PowerPoint (PPT)
- PDF to HTML
- PDF to RTF
- PDF to CSV
- Extract PDF to JSON
- Extract PDF to Markdown
Notice
- Table Recognition runs only when layout analysis is active (i.e.
enable_ai_layoutis enabled, or implicitly whenenable_ocris enabled). - You need to load the DocumentAI model before using Table Recognition, or plug in your own table model via the callbacks described in 3.11 Use Custom AI Models via Callbacks.
- Disabling
enable_ai_table_recognitionturns the table model off entirely; detected table regions then fall back to geometric reconstruction from the underlying page objects.
Sample
This sample demonstrates how to convert a PDF to a Word document with Table Recognition enabled.
LibraryManager.set_document_ai_model("path/documentai.model", -1)
word_options = ConvertOptions()
word_options.enable_ai_layout = True
word_options.enable_ai_table_recognition = True # enabled by default in 4.1
CPDFConversion.start_pdf_to_word("input.pdf", "password", "path/output.docx", word_options)