Skip to content

PDF Conversion

To convert a PDF file to an Office or other format, send a request to /file/handle, including the PDF file as input and file processing parameters. Before you begin, make sure ComPDFKit Processor is started and running.

You will send a POST request to the endpoint /file/handle of the processor. For more information about multipart requests, please refer to the API section.

Convert using local PDF file

Send segmented requests to /file/handle and attach the PDF file:

shell
curl -f -X POST http://localhost:7000/file/handle \
-H "Content-Type: multipart/form-data" \
-F file=@"document.pdf" \
-F executeType="pdf/docx" \
-F password="file open password" \
-F parameter="{ \"contentOptions\": \"2\", \"worksheetOptions\": \"1\"}" \
> result.docx

PDF Conversion Parameters

This section introduces the parameter settings for file processing supported by ComPDFKit Processor. Special parameter settings are available for PDF to Word, Excel, PPT, HTML, RTF, PNG, JPG, and CSV formats. For other functionalities, parameter settings can be ignored (default parameters will be used for document processing).

PDF to Word

Note: Special parameters can be used when uploading files for different functions, while the remaining steps remain consistent.

PDF to Word:

java
{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrLanguage": 8,
  "pageRanges": "1,2,3-5",
  "pageLayoutMode": "e_Flow",
  "formulaToImage": 0
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.

enableOcr: Whether to use OCR (0: not enabled; 1: enabled). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

pageLayoutMode: Specify the layout mode. e_Box; e_Flow. Default is e_Flow.

formulaToImage: Whether to convert formulas to images (0: not enabled; 1: enabled). Default 0.

PDF to Excel

Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.

PDF to Excel:

java
{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrLanguage": 8,
  "pageRanges": "1,2,3-5",
  "excelAllContent": 1,
  "excelWorksheetOption": "e_ForTable"
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: Disable; 1: Enable). Default is 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

excelAllContent: Whether to convert all contents. 1: Yes; 0: No. Default 1.

excelWorksheetOption: brief Excel Worksheet option. e_ForTable: A worksheet to contain only one table.; e_ForPage: A worksheet to contain table for PDF Page; e_ForDocument: A worksheet to contain table for PDF Document. Default e_ForTable.

PDF to PPT

Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.

PDF to PPT:

java
{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrLanguage": 8,
  "pageRanges": "1,2,3-5"
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

PDF to HTML

Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.

PDF to HTML:

java
{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrLanguage": 8,
  "pageRanges": "1,2,3-5",
  "pageLayoutMode": "e_Flow",
  "htmlOption": "e_SinglePage"
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

pageLayoutMode: Specify layout mode. e_Box; e_Flow. Default is e_Flow.

htmlOption: brief Html option. e_SinglePage: Convert the entire PDF file into a single HTML file.; e_SinglePageWithBookmark: Convert the PDF file into a single HTML file with an outline for navigation at the beginning of the HTML page.; e_MultiPage: Convert the PDF file into multiple HTML files.; e_MultiPageWithBookmark: Convert the PDF file into multiple HTML files. Each HTML file corresponds to a PDF page, and users can navigate to the next HTML file via a link at the bottom of the HTML page. Default is e_SinglePage.

PDF to RTF

Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent.

PDF to RTF:

java
{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrLanguage": 8,
  "pageRanges": "1,2,3-5"
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

PDF to Image

Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent.

PDF to Image :

java
{
  "pageRanges": "1,2,3-5",
  "imageColorMode": "e_Color",
  "imageScaling": "1.0"
}

Required parameters

pageRanges: Specify page number conversion, starting from 1. Default is empty.

imageColorMode: Specifies the image color mode of the image file. e_Color; e_Gray; e_Binary;. Default is e_Color.

imageScaling: Specifies the image scaling ratio of the image file. Default is 1.0.

PDF to CSV

Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.

PDF to CSV:

java
{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrLanguage": 8,
  "pageRanges": "1,2,3-5",
  "excelWorksheetOption": "e_ForTable"
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: Disable; 1: Enable). Default is 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

excelWorksheetOption: brief Excel Worksheet option. e_ForTable: A worksheet to contain only one table.; e_ForPage: A worksheet to contain table for PDF Page; e_ForDocument: A worksheet to contain table for PDF Document. Default e_ForTable.

PDF to JSON

Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.

PDF to JSON:

java
{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 0,
  "ocrLanguage": 8,
  "pageRanges": "1,2,3-5",
  "resolveType": "EXTRACT"
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.

resolveType: Extract JSON content type. TEXT; TABLE; EXTRACT; IMAGE. Default EXTRACT(Extract All).

Please refer to the explanation of the JSON file content fields in PDF Data Extraction JSON Format Description.pdf.

PDF to TXT

Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.

PDF to TXT:

java
{
  "enableAiLayout": 1,
  "enableOcr": 0,
  "ocrLanguage": 8,
  "pageRanges": "1,2,3-5",
  "txtTableFormat": 1
}

Required parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

enableOcr: whether to use OCR (0: not enabled; 1: enabled). Default is 0.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: specify page number conversion, starting from 1. Default is empty.

txtTableFormat: whether to format the table when converting pdf to txt (0: not enabled; 1: enabled). Default is 1.

PDF to Editable PDF

Note: When using different functions, you can use their own special parameters when uploading files. The other steps are the same.

PDF to Editable PDF:

java
{
  "enableAiLayout": 1,
  "isContainImg": 1,
  "isContainAnnot": 1,
  "enableOcr": 1,
  "ocrLanguage": 8,
  "pageRanges": "1,2,3-5"
}

Required Parameters

enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.

isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.

isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.

enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 1.

ocrLanguage: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.

pageRanges: Specify page number conversion, starting from 1. Default is empty.