PDF Conversion
To convert a PDF file to an Office or other format, send a request to /file/handle, including the PDF file as input and file processing parameters. Before you begin, make sure ComPDFKit Processor is started and running.
You will send a POST request to the endpoint /file/handle of the processor. For more information about multipart requests, please refer to the API section.
Convert using local PDF file
Send segmented requests to /file/handle and attach the PDF file:
curl -f -X POST http://localhost:7000/file/handle \
-H "Content-Type: multipart/form-data" \
-F file=@"document.pdf" \
-F executeType="pdf/docx" \
-F password="file open password" \
-F parameter="{ \"contentOptions\": \"2\", \"worksheetOptions\": \"1\"}" \
> result.docxPDF Conversion Parameters
This section describes the parameter settings currently supported by ComPDFKit Processor for PDF file conversion and processing.
PDF to Word
Note: Special parameters can be used when uploading files for different functions, while the remaining steps remain consistent.
PDF to Word:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrRecognitionLang": "AUTO",
"pageRanges": "1,2,3-5",
"pageLayoutMode": "e_Flow",
"formulaToImage": 1,
"ocrOption": "ALL",
"isOutputDocumentPerPage": 0,
"containPageBackgroundImage": 1
}Required parameters
enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.
enableOcr: Whether to use OCR (0: not enabled; 1: enabled). Default is 0.
ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.
pageRanges: Specify page number conversion, starting from 1. Default is empty.
pageLayoutMode: Specify the layout mode. e_Box; e_Flow. Default is e_Flow.
Layout differences
Word's Streaming Layout Ideal for editing, with your editing, the content dynamically adapts to different positions. However, a Word file would display differently due to the incompatibility of various software or app versions. It makes it unsuitable for precise documentation like electronic files or certificates.
PDF's Fixed Page Layout: Ensures a stable, uniform appearance and print quality across all devices. The content and formatting are locked upon creation, making alterations difficult without affecting the overall layout. It's preferred for formal documentation such as business reports and official electronic records.
formulaToImage: Whether to convert formulas to images (0: not enabled; 1: enabled). Default 0.If enabled, save as image; if not, save as text. For complex formulas, it is recommended to save as image.
ocrOption: OCR recognition range, supported types and definitions:
INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.
isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.
containPageBackgroundImage:Whether to include page background images during conversion; this setting is only effective when using OCR (0: disabled; 1: enabled). Default 1.
PDF to Excel
Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.
PDF to Excel:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrRecognitionLang": "AUTO",
"pageRanges": "1,2,3-5",
"excelAllContent": 1,
"excelWorksheetOption": "e_ForTable",
"ocrOption": "ALL",
"isOutputDocumentPerPage": 0
}Required parameters
enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot: Whether to include annotations during conversion (0: Disable; 1: Enable). Default is 1.
enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.
pageRanges: Specify page number conversion, starting from 1. Default is empty.
excelAllContent: Whether to convert all contents. 1: Yes; 0: No. Default 1.
excelWorksheetOption: brief Excel Worksheet option. e_ForTable: A worksheet to contain only one table.; e_ForPage: A worksheet to contain table for PDF Page; e_ForDocument: A worksheet to contain table for PDF Document. Default e_ForTable.
ocrOption: OCR recognition range, supported types and definitions:
INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.
isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.
PDF to Slide
Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.
PDF to Slide:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrRecognitionLang": "AUTO",
"pageRanges": "1,2,3-5",
"ocrOption": "ALL",
"isOutputDocumentPerPage": 0,
"containPageBackgroundImage": 1
}Required parameters
enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.
enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.
pageRanges: Specify page number conversion, starting from 1. Default is empty.
ocrOption: OCR recognition range, supported types and definitions:
INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.
isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.
containPageBackgroundImage:Whether to include page background images during conversion; this setting is only effective when using OCR (0: disabled; 1: enabled). Default 1.
PDF to HTML
Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.
PDF to HTML:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrRecognitionLang": "AUTO",
"pageRanges": "1,2,3-5",
"pageLayoutMode": "e_Flow",
"htmlOption": "e_SinglePage",
"ocrOption": "ALL",
"isOutputDocumentPerPage": 0,
"containPageBackgroundImage": 1
}Required parameters
enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.
enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.
pageRanges: Specify page number conversion, starting from 1. Default is empty.
pageLayoutMode: Specify layout mode. e_Box; e_Flow. Default is e_Flow.
htmlOption: brief Html option. e_SinglePage: Convert the entire PDF file into a single HTML file.; e_SinglePageWithBookmark: Convert the PDF file into a single HTML file with an outline for navigation at the beginning of the HTML page.; e_MultiPage: Convert the PDF file into multiple HTML files.; e_MultiPageWithBookmark: Convert the PDF file into multiple HTML files. Each HTML file corresponds to a PDF page, and users can navigate to the next HTML file via a link at the bottom of the HTML page. Default is e_SinglePage.
ocrOption: OCR recognition range, supported types and definitions:
INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.
isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.
containPageBackgroundImage:Whether to include page background images during conversion; this setting is only effective when using OCR (0: disabled; 1: enabled). Default 1.
PDF to RTF
Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent.
PDF to RTF:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrRecognitionLang": "AUTO",
"pageRanges": "1,2,3-5",
"ocrOption": "ALL",
"isOutputDocumentPerPage": 0,
"containPageBackgroundImage": 1
}Required parameters
enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.
enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.
pageRanges: Specify page number conversion, starting from 1. Default is empty.
ocrOption: OCR recognition range, supported types and definitions:
INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.
isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.
containPageBackgroundImage:Whether to include page background images during conversion; this setting is only effective when using OCR (0: disabled; 1: enabled). Default 1.
PDF to Image
Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent.
PDF to Image :
{
"imageFormat": "JPG",
"pageRanges": "1,2,3-5",
"imageColorMode": "e_Color",
"imageScaling": "1.0"
}Required parameters
imageFormat: Image format, supported formats: JPG, JPEG, JPEG2000, PNG, BMP, TIFF, TGA, GIF, WEBP. Default is JPG.
pageRanges: Specify page number conversion, starting from 1. Default is empty.
imageColorMode: Specifies the image color mode of the image file. e_Color; e_Gray; e_Binary;. Default is e_Color.
imageScaling: Specifies the image scaling ratio of the image file. Default is 1.0.
PDF to CSV
Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.
PDF to CSV:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrRecognitionLang": "AUTO",
"pageRanges": "1,2,3-5",
"excelWorksheetOption": "e_ForTable",
"ocrOption": "ALL",
"isOutputDocumentPerPage": 0
}Required parameters
enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot: Whether to include annotations during conversion (0: Disable; 1: Enable). Default is 1.
enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.
pageRanges: Specify page number conversion, starting from 1. Default is empty.
excelWorksheetOption: brief Excel Worksheet option. e_ForTable: A worksheet to contain only one table.; e_ForPage: A worksheet to contain table for PDF Page; e_ForDocument: A worksheet to contain table for PDF Document. Default e_ForTable.
ocrOption: OCR recognition range, supported types and definitions:
INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.
isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.
PDF to JSON
Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.
PDF to JSON:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrRecognitionLang": "AUTO",
"pageRanges": "1,2,3-5",
"resolveType": "EXTRACT",
"ocrOption": "ALL",
"isOutputDocumentPerPage": 0
}Required parameters
enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.
enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.
pageRanges: Specify page number conversion, starting from 1. Default is empty.
resolveType: Extract JSON content type. TEXT; TABLE; EXTRACT; IMAGE. Default EXTRACT(Extract All).
Please refer to the explanation of the JSON file content fields in PDF Data Extraction JSON Format Description.pdf.
ocrOption: OCR recognition range, supported types and definitions:
INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.
isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.
PDF to TXT
Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.
PDF to TXT:
{
"enableAiLayout": 1,
"enableOcr": 0,
"ocrRecognitionLang": "AUTO",
"pageRanges": "1,2,3-5",
"txtTableFormat": 1,
"ocrOption": "ALL",
"isOutputDocumentPerPage": 0
}Required parameters
enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
enableOcr: whether to use OCR (0: not enabled; 1: enabled). Default is 0.
ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.
pageRanges: specify page number conversion, starting from 1. Default is empty.
txtTableFormat: whether to format the table when converting pdf to txt (0: not enabled; 1: enabled). Default is 1.
ocrOption: OCR recognition range, supported types and definitions:
INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.
isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.
PDF to Editable PDF
Note: When using different functions, you can use their own special parameters when uploading files. The other steps are the same.
PDF to Editable PDF:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 1,
"ocrRecognitionLang": "AUTO",
"pageRanges": "1,2,3-5",
"ocrOption": "ALL",
"isOutputDocumentPerPage": 0,
"containPageBackgroundImage": 1
}Required Parameters
enableAiLayout: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.
enableOcr: Whether to use OCR (0: Disable; 1: Enable). Default is 1.
ocrRecognitionLang: OCR recognition language,supported types and definitions: AUTO: Automatic, CHINESE: Simplified Chinese, CHINESE_TRAD: Traditional Chinese, ENGLISH: English, KOREAN: Korean, JAPANESE: Japanese, LATIN: Latin, DEVANAGARI: Devanagari, CYRILLIC: Cyrillic, ARABIC: Arabic, TAMIL: Tamil, TELUGU: Telugu, KANNADA: Kannada, THAI: Thai, GREEK: Greek, ESLAV: Slavic languages. Default is AUTO.
pageRanges: Specify page number conversion, starting from 1. Default is empty.
ocrOption: OCR recognition range, supported types and definitions:
INVALID_CHARACTER: Recognize illegal characters in PDF documents. SCAN_PAGE: Recognize scanned pages in PDF documents. INVALID_CHARACTERAND_SCAN_PAGE: Recognize illegal characters and scanned pages in PDF documents. ALL: Recognize all characters on all pages. Default: ALL.
isOutputDocumentPerPage: Whether to output one document per page (0: disabled; 1: enabled). Default 0.
containPageBackgroundImage:Whether to include page background images during conversion; this setting is only effective when using OCR (0: disabled; 1: enabled). Default 1.