PDF Conversion

To convert a PDF file to an Office or other format, send a request to /file/handle, including the PDF file as input and file processing parameters. Before you begin, make sure ComPDFKit Processor is started and running.

You will send a POST request to the endpoint /file/handle of the processor. For more information about multipart requests, please refer to the API section.

Convert using local PDF file

Send segmented requests to /file/handle and attach the PDF file:

shell

curl -f -X POST http://localhost:7000/file/handle \
-H "Content-Type: multipart/form-data" \
-F file=@"document.pdf" \
-F executeType="pdf/docx" \
-F password="file open password" \
-F parameter="{ \"contentOptions\": \"2\", \"worksheetOptions\": \"1\"}" \
> result.docx

PDF Conversion Parameters

This section describes the parameter settings currently supported by ComPDFKit Processor for PDF file conversion and processing.

PDF to Word

Note: Special parameters can be used when uploading files for different functions, while the remaining steps remain consistent.

PDF to Word：

java

{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrRecognitionLang": "AUTO",
  "pageRanges": "1,2,3-5",
  "pageLayoutMode": "e_Flow",
  "formulaToImage": 1,
  "ocrOption": "ALL",
  "isOutputDocumentPerPage": 0,
  "containPageBackgroundImage": 1
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.

enableOcr: Whether to use OCR (0: not enabled; 1: enabled). Default is 0.

ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

pageLayoutMode: Specify the layout mode. e_Box; e_Flow. Default is e_Flow.

Layout differences

Word's Streaming Layout Ideal for editing, with your editing, the content dynamically adapts to different positions. However, a Word file would display differently due to the incompatibility of various software or app versions. It makes it unsuitable for precise documentation like electronic files or certificates.

PDF's Fixed Page Layout: Ensures a stable, uniform appearance and print quality across all devices. The content and formatting are locked upon creation, making alterations difficult without affecting the overall layout. It's preferred for formal documentation such as business reports and official electronic records.

formulaToImage: Whether to convert formulas to images (0: not enabled; 1: enabled). Default 0.If enabled, save as image; if not, save as text. For complex formulas, it is recommended to save as image.

ocrOption: OCR recognition range, supported types and definitions:

INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.

isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.

containPageBackgroundImage：Whether to include page background images during conversion; this setting is only effective when using OCR (0: disabled; 1: enabled). Default 1.

PDF to Excel

Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.

PDF to Excel：

java

{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrRecognitionLang": "AUTO",
  "pageRanges": "1,2,3-5",
  "excelAllContent": 1,
  "excelWorksheetOption": "e_ForTable",
  "ocrOption": "ALL",
  "isOutputDocumentPerPage": 0
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: Disable; 1: Enable). Default is 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

excelAllContent: Whether to convert all contents. 1: Yes; 0: No. Default 1.

excelWorksheetOption: brief Excel Worksheet option. e_ForTable: A worksheet to contain only one table.; e_ForPage: A worksheet to contain table for PDF Page; e_ForDocument: A worksheet to contain table for PDF Document. Default e_ForTable.

ocrOption: OCR recognition range, supported types and definitions:

isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.

PDF to Slide

Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.

PDF to Slide：

java

{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrRecognitionLang": "AUTO",
  "pageRanges": "1,2,3-5",
  "ocrOption": "ALL",
  "isOutputDocumentPerPage": 0,
  "containPageBackgroundImage": 1
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

ocrOption: OCR recognition range, supported types and definitions:

isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.

containPageBackgroundImage：Whether to include page background images during conversion; this setting is only effective when using OCR (0: disabled; 1: enabled). Default 1.

PDF to HTML

Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.

PDF to HTML：

java

{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrRecognitionLang": "AUTO",
  "pageRanges": "1,2,3-5",
  "pageLayoutMode": "e_Flow",
  "htmlOption": "e_SinglePage",
  "ocrOption": "ALL",
  "isOutputDocumentPerPage": 0,
  "containPageBackgroundImage": 1
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

pageLayoutMode: Specify layout mode. e_Box; e_Flow. Default is e_Flow.

htmlOption: brief Html option. e_SinglePage: Convert the entire PDF file into a single HTML file.; e_SinglePageWithBookmark: Convert the PDF file into a single HTML file with an outline for navigation at the beginning of the HTML page.; e_MultiPage: Convert the PDF file into multiple HTML files.; e_MultiPageWithBookmark: Convert the PDF file into multiple HTML files. Each HTML file corresponds to a PDF page, and users can navigate to the next HTML file via a link at the bottom of the HTML page. Default is e_SinglePage.

ocrOption: OCR recognition range, supported types and definitions:

isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.

containPageBackgroundImage：Whether to include page background images during conversion; this setting is only effective when using OCR (0: disabled; 1: enabled). Default 1.

PDF to RTF

Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent.

PDF to RTF：

java

{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrRecognitionLang": "AUTO",
  "pageRanges": "1,2,3-5",
  "ocrOption": "ALL",
  "isOutputDocumentPerPage": 0,
  "containPageBackgroundImage": 1
}