Guides
Layout Analysis
Overview
Layout analysis is the process of leveraging Artificial Intelligence (AI) technology to parse and understand the structure of a document's layout. Its primary goal is to extract text, images, tables, layers, and other data from the input documents.
Layout analysis has several common use cases, including:
- Intelligent recognition of tables within PDF documents: This feature is particularly useful for analyzing company financial statements, invoices, bank statements, experimental data, medical test reports, and more.
- Smart extraction of text, images, or tables from PDF documents through layout analysis: This functionality greatly aids in the analysis and extraction of information from identification cards, receipts, licenses, documents, ancient books, and other various types of files.
Features that support Layout Analysis:
- PDF to Word
- PDF to Excel
- PDF to PowerPoint (PPT)
- PDF to HTML
- PDF to RTF
- PDF to TXT
- PDF to CSV
- Extract PDF to JSON
- Extract PDF to Markdown
Notice
- You need to integrate the OCR module before using layout analysis.
- When the OCR is enabled, the layout analysis is automatically enabled.
Sample
This Sample demonstrates how to use the ComPDFKit OCR function to convert PDF to DOCX file.
c++
// Set the OCR model path and language.
LibraryManager::SetDocumentAIModel("path/model", OCRLanguage::e_English);
ConvertOptions opt;
// Enable layout analysis option.
opt.enable_ai_layout = true;
CPDFConversion::StartPDFToWord("word.pdf", "password", "path/output.docx", opt);