ComPDFKit Document AI Table Recognition -- V1

By ComPDFKit | 2023 Sep 22
Tech Popularization AI Release

When it comes to the reuse of content in images or scanned files, the table extraction function provides a convenient solution, accurately and easily converting or extracting the table data in the image into data that we can directly handle, avoiding manual input. Due to the complexity of tables, the extraction effect of data is not good. To solve this problem, ComPDFKit has launched an AI product, Document AI Table Recognition V1 version. This product combines AI technology, improving the table detection and table structure restoration effect, making the conversion and extraction of data easier and quicker.



Table Types


ComPDFKit Document AI Table Recognition classifies tables into two types to handle them, one is the standard table, and the other is a non-standard table:

         - Standard Table: Tables with complete borders, clear and complete inner lines, and no need for manual addition of table lines for content segmentation. Image of Standard Table:

Standard Table


         - Non-standard table: Lacking table borders or internal lines, the lines of the table are not clear, and human intervention is needed to add table lines to divide the table content. For instance, the table shown below misses many horizontal and vertical lines.Non-standard table example:

Non-standard table



Principles of Table Recognition


In real life, tables vary greatly in size, type, and style, such as different background fills, different row and column merging methods, and different content text types. Moreover, existing documents not only include modern, electronic ones, but also historical scanned pictures or handwritten drafts, and photos. Their document styles, lighting, environment texture, etc. All vary considerably; therefore, table recognition has always been a research difficulty in the field of document recognition.


Document AI combines AI and traditional algorithms to achieve table recognition, mainly using the following capabilities and algorithms:


         - AI Layout Analysis: ComPDFKit Document AI has deeply optimized the capability of analyzing table layouts, which can accurately identify and define the position and region of tables in images, unaffected by image rotation angle and table structure complexity, providing a solid foundation for subsequent table recognition.

         - AI OCR Capability: OCR (Optical Character Recognition) is a technology that can scan printed or handwritten text in images or photos and convert it into editable text. ComPDFKit Document AI OCR, based on deep learning, accurately recognizes and extracts text within tables, working with high precision whether it's simple digits or complex characters.

         - Image Processing Algorithm: It extends the traditional methods of grayscale processing and binarization, and by combining color, texture, shape, and other elements, it can accurately assist in acquiring the structural information of the grid, thereby improving the accuracy of recognition.


Through the above AI and algorithms, along with some table recognition algorithms, not only can standard tables be accurately recognized, but also non-standard irregular ones, greatly enhancing the breadth and depth of recognition.



Processes of Recognition


This is a complex process that spans multiple fields such as digital image processing, computer vision, and pattern recognition. However, through this process, we can enable machines to accurately identify and understand the content of table images, greatly enhancing the efficiency and accuracy of information processing. The general process for recognizing tables in images is:


         - Pre-process the input table image: The main function of pre-processing is to improve the quality of the picture to facilitate subsequent processing. This stage may include steps like grayscale conversion, scale adjustment, and image enhancement. For noisy images, noise reduction may be needed; for pictures that are too large or too small, image size adjustment may be necessary; for images with low contrast, enhancing the image contrast may be needed. The goal of this stage is to make the information in the picture as vivid and clear as possible to aid subsequent recognition work.

         - Filter non-line information through a morphological algorithm: Morphological algorithms extract structural morphological information from images through a series of mathematical operations. In the process of table image recognition, we need to extract line information from the image, because in the table image, lines play an important role; they construct the framework structure of the table. Morphological algorithms can help us filter out non-line information, such as text, images, etc., so that we can observe and handle line information more directly.

         - Detect lines and blocks to get basic table information: Line detection can be achieved through methods such as edge detection and line detection, through which we can obtain information about the position, length, and direction of the lines. At the same time, we also need to detect blocks in the table. Blocks are usually formed by intersecting lines, and each block maps to a cell in the table. By detecting lines and blocks, we can get basic table information such as the number of rows and columns, and even the boundary positions of cells.

Processes of Recognition



Output the Table Data


The format of data is often a crucial consideration, especially when dealing with and transferring tabular data. Table data is arranged row by row, column by column in the form of a matrix or a two-dimensional table, enabling users to process and understand these data more quickly and intuitively. The format of data output will directly affect the parsing efficiency and usability of the data. This article mainly discusses the two most common table data output formats: HTML and JSON.


HTML Format:

HTML, or HyperText Markup Language, is one of the main ways data is presented. However, it not only applies to constructing web pages but also can output table data. By outputting the HTML format of tabular data, users can see the layout and structure of two-dimensional table data intuitively, thus facilitating result visualization and debugging more conveniently.


HTML table data output shows high flexibility since HTML tags can define the details of framework structure, such as rows, columns, boundaries, colors, etc. Moreover, by using embedded CSS or JavaScript, HTML can also offer additional styling and dynamic capabilities, substantially enhancing the practicality and aesthetic appeal of data visualization. This type of complex table visualization is more likely to help users find patterns or anomalies in the data.


JSON Format:

JSON, or JavaScript Object Notation, is another widely used data format. It's a lightweight data exchange format, with its own language-independent data structure. Hence, through the JSON format, table data can be transferred more conveniently between various programming languages.


For programmers, the JSON format is the ideal choice for handling table data as it can deal well with complicated data structures, such as nested arrays and objects. Importantly, data in JSON format can be parsed and generated by any programming language, greatly enhancing the usability and transmission efficiency of the data. Furthermore, the key-value pair structure of JSON data also allows developers to easily access the specific part of the data needed, without having to deal with a large amount of irrelevant data.


In summary, both HTML and JSON are effective formats for table data output, each with its own advantages. If there is a need for detailed data visualization and debugging, one may prefer HTML. However, if the data needs to be transferred and processed efficiently among various programming languages, JSON may be the more likely choice. In real-world applications, depending on the specific needs and context, users must carefully choose the appropriate table data output format.



Example of Table Recognition


Original Image:

Original Image


Results in HTML:

Results in HTML


Results in JSON:

  "type": "table_with_line",				//Table type
  "angle": 0,								//Table tilt angle
  "width": 489,								//Table width
  "height": 166,							//Table height
  "rows": 4,								//Row number
  "cols": 4,								//Column number
  "position": [114, 444, 603, 444, 603, 610, 114, 610],	//Table position
  "height_of_rows": [65, 30, 31, 36],					//Height of each row in the table
  "width_of_cols": [122, 122, 118, 122],				//Width of each column in the table
  "table_cells": [										//Information of all cells in the table
      "start_row": 1,									//Start row number of the cell
      "end_row": 1,										//End row number of the cell
      "start_col": 1,									//Start column number of the cell
      "end_col": 1,										//End column number of the cell
      "text": "",										//Text content in the cell
      "position": [2, 2, 124, 2, 124, 67, 2, 67],		//Position of the cell
      "lines": []										//Text lines information in the cell
  	...,	//Remaining cells
      "start_row": 4,
      "end_row": 4,
      "start_col": 4,
      "end_col": 4,
      "text": "8",
      "position": [364, 128, 486, 128, 486, 164, 364, 164],
      "lines": [
          "text": "8",
          "score": 1,
          "position": [416, 129, 433, 129, 433, 158, 416, 158]





Overall, ComPDFKit's Document AI Table Recognition V1 relies on advanced computer vision and deep learning technology, it achieves broad, efficient, and accurate recognition of various types of tables, bringing enormous convenience to various industries. It can be widely used in fields such as education, finance, healthcare, and scientific research.


ComPDFKit is continuously optimizing its technology. The Document AI Table Recognition has now achieved the V2 version, greatly improving the recognition accuracy of standard tables. The specific information will be further explained in the next article.