Skip to content
Guides

Extract Tables from PDFs

Overview

To extract table content from a PDF document.

Standard table and non-standard table

Commonly, tables can be divided into two categories: standard tables and non-standard tables. The specific definitions are as follows:

  • Standard table: The table border and the inner lines of the table are complete and clear. There is no need to manually add table lines to divide the table content.

image-20231116145224545

  • Non-standard tables: Table borders or inner lines are missing, and table lines are unclear. Table lines need to be manually added to separate the table content.

image-20231116145517818

Sample

To extract table content from a PDF document.

kotlin
val cPDFConvert = CPDFConverterTableToJson(context, uri, "")

val params = CPDFConvertTableToJsonOptions()

val result: ConvertError = cPDFConvert.convert(outputDir, outputFilenameNoSuffix, params, pageArrays, 
onHandle = onHandleCal, 
onProgress = onProgressCal, 
onPost = onPostCal)