Skip to content
Guides

Conversion Options: Contain Image & Annotation

Overview

In the process of converting PDF documents into various formats, ComPDFKit Conversion SDK offers two additional options for users: one option to determine whether images are included in the generated document, and another to decide if annotations from the PDF file are to be retained.

  • With the "Include Images" option enabled, ComPDFKit Conversion SDK will extract the images from the PDF document and embed them in the corresponding pages and positions in the output file. For areas with overlapping images, ComPDFKit Conversion SDK merges these images into one and embeds it into the exact location on the corresponding page of the output file.
  • When the "Include Annotations" option is selected, most annotations are converted into raster images and embedded at the respective positions within your document. However, certain types of annotations, such as highlights, underlines, strikeouts, and squiggly, are converted into their respective formatting equivalents in the converted Word, PPT, and HTML documents, and are marked over the corresponding text. It is important to note that the conversion won't be 100% accurate in every instance.

In the ComPDFKit Conversion SDK, the options of including image and annotation are commonly used in the following format conversion:

  • PDF to Word
  • PDF to Excel
  • PDF to PPT
  • PDF to HTML
  • PDF to RTF
  • PDF to JSON
  • PDF to Markdown

About Text Markup Annotation

  • Highlight Annotation: When converting to Word format, Microsoft Word only supports 15 highlight colors. To best replicate the original document, the highlighted text will have a background color that matches the original annotation’s color. In PPT, native highlight tags are used for the corresponding marked text. In HTML format, a <span> tag is created around the highlighted text, and the background color is set to match the original annotation.
  • Underline & Wavy Line Annotations: When converting to Word and PPT formats, underlined or wavy lines will appear over the marked text. In HTML, the corresponding styles are applied to represent these annotations. If a piece of text is marked with both underline and wavy lines, only one will be applied during conversion, as underlines are essentially a form of wavy lines in Word, PPT, and HTML.
  • Strikethrough Annotations: When converting to Word and PPT formats, strikethroughs will be applied over the marked text. However, the color of the strikethrough may not match the original PDF, as Word and PPT rely on the font color. In HTML, the strikethrough will maintain the original color.

Sample

This Sample demonstrates how to use the ComPDFKit Conversion SDK to convert a PDF document to a Word document with the selected options: Include images and annotations.

kotlin
val inputFilePath = "***";
val password = "***";
val outputFileName = "***";

val wordOptions = WordOptions();
wordOptions.containImage = true;
wordOptions.containAnnotation = true;
wordOptions.enableAiLayout = true;
wordOptions.enableOcr = false;

val error = ComPDFKitConverter.startPDFToWord(inputFilePath, password, outputFileName, wordOptions);