Skip to content
Guides

Layout Analysis

Overview

Layout analysis is the process of leveraging Artificial Intelligence (AI) technology to parse and understand the structure of a document's layout. Its primary goal is to extract text, images, tables, layers, and other data from the input documents.

Layout analysis has several common use cases, including:

  • Intelligent recognition of tables within PDF documents: This feature is particularly useful for analyzing company financial statements, invoices, bank statements, experimental data, medical test reports, and more.
  • Smart extraction of text, images, or tables from PDF documents through layout analysis: This functionality greatly aids in the analysis and extraction of information from identification cards, receipts, licenses, documents, ancient books, and other various types of files.

Features that support Layout Analysis:

  • Convert PDF to Word
  • Convert PDF to Excel
  • Convert PDF to PowerPoint
  • Convert PDF to HTML
  • Extract PDF Table

Notice

  • You need to integrate the OCR module before using layout analysis.
  • When the OCR is enabled, the layout analysis is automatically enabled.

Sample

This Sample demonstrates how to use the ComPDFKit OCR function to convert PDF to DOCX file.

objective-c
// Get the path of the PDF file.
NSString *pdfPath = @"...";
// Get the path to the Word file.
NSString *outputPath = @"...";
CPDFConvertWordOptions *options = [[CPDFConvertWordOptions alloc] init];
// Whether to contain images when converting,which takes effect only when IsAllowOCR is false.
[options setIsContainImages:YES];
// Set whether to contain background images, which takes effect only when IsAllowOCR is true. 
[options setIsContainOCRBgImage:YES];
// Whether to contain annotations when converting.
[options setIsContainAnnotations:YES];
 // PDF to Word conversion parameter object (derived class of CPDFConvertOptions)Layout Options:CPDFConvertRetainPageLayout: Retain the same layout as your original file by splitting the text into multiple text boxes accoring to its layout.
[options setLayoutOptions:CPDFConvertRetainPageLayout];
// Open layout analysis.
[options setIsAILayoutAnalysis:YES];
CPDFConverterWord *converter = [[CPDFConverterWord alloc] initWithURL:[NSURL fileURLWithPath:pdfPath] password:nil];
[converter convertToFilePath:outputPath pageIndexs:nil options:options];