We’re glad to announce that our ComPDFKit Conversion SDK 1.9.0 was released these days. This new version brings new features and improvements based on our client’s feedback, giving your users a better experience, especially with the help of AI and OCR. This update consists of:
- Add an AI option for recognizing the table regions when converting PDF files;
- Add an OCR option to table recognition for the conversion of PDFs;
- Support recognizing and extracting the highlight, underline, squiggly, and strikeout from PDF files as annotation features when converting to other formats;
- Keep outline and hyperlinks remained and normally worked when converting PDFs to Word/Excel/PPT/HTML;
- Improve the effect of extracting images from tables with the added OCR option.
Accurate Table Recognition with AI
We have already given the AI Table Recognition option to scanned PDFs (basically in the form of images) for a long time. ComPDFKit Conversion SDK 1.9.0 now added the AI option to normal PDF documents, providing your users with a more accurate result of recognizing tables from PDF files. When converting PDFs to Word/PPT/Excel/HTML, users can enable the AI option which smartly extracts texts from the table and recognizes the style and size.
Especially for PDFs with tables that don’t have regular or normal borders, for example, the outer frame of a table, ComPDFKit Conversion SDK 1.9.0 allows to enable the AI Table Recognition feature when users need to transform them into other formats, demonstrating a better effect than before.
Allow OCR for Precise Table Recognition
Like the AI Table Recognition, ComPDFKit Conversion SDK 1.9.0 allows users to turn on the OCR mode when converting image-based PDF files to Word/PPT/Excel/HTML. This also makes the effect of table recognition more accurate. Particularly, if users want to transform PDFs to Excel with Include PDF Background (OCR) enabled, the inserted images will not be recognized as a whole background that covers all texts. After our optimization, once OCR is selected, these kinds of images will become smaller and be put into the table cell.
Keep Outline and Hyperlinks on Converted Files
When you convert PDF files to Word or PPT format, the inserted hyperlinks can be recognized as links allowing you to jump over rather than text only. With ComPDFKit Conversion SDK 1.9.0, users won't be bothered by the inability to open a received PDF file that contains important reference links. Identically, since the outline of a converted document remains, it is available to direct to any section you want.
Remain Text Annotations on Converted Documents
Text annotations, including Highlight, Underline, Squiggly, and Strikethrough, are remained on the converted Office documents and HTML. This indicates that these markups will no longer be recognized as images for the assistance of ComPDFKit Conversion SDK 1.9.0, instead, they will be identical to the original annotations. In previous versions, as these annotations were in image format, they were not supported to be shown or hidden, but it is easy to realize this requirement now.
Add JSON Format for Extraction and Recognization
The text and its corresponding coordinate information of a PDF document are capable of being extracted by lines while converting it to the new JSON format. Moreover, it is possible to recognize tables from a PDF file and save it as JSON. Now that the developer's demand for this format has been fulfilled, it gives better readability to them.
Conclusion
There are more optimizations in ComPDFKit Conversion SDK 1.9.0. If you want to get full information about them, please refer to ComPDFKit Conversion SDK changelog. It would be really appreciated if you could share us with your experience in using our conversion SDK and give us any feedback on how we can improve our product. With your support, ComPDFKit will provide a better user experience constantly.