PDF Conversion
To convert a PDF file to an Office or other format, send a request to /file/handle
, including the PDF file as input and file processing parameters. Before you begin, make sure ComPDFKit Processor is started and running.
You will send a POST request to the endpoint /file/handle
of the processor. For more information about multipart requests, please refer to the API section.
Convert using local PDF file
Send segmented requests to /file/handle
and attach the PDF file:
curl -f -X POST http://localhost:7000/file/handle \
-H "Content-Type: multipart/form-data" \
-F file=@"document.pdf" \
-F executeType="pdf/docx" \
-F password="file open password" \
-F parameter="{ \"contentOptions\": \"2\", \"worksheetOptions\": \"1\"}" \
> result.docx
PDF Conversion Parameters
This section introduces the parameter settings for file processing supported by ComPDFKit Processor. Special parameter settings are available for PDF to Word, Excel, PPT, HTML, RTF, PNG, JPG, and CSV formats. For other functionalities, parameter settings can be ignored (default parameters will be used for document processing).
PDF to Word
Note: Special parameters can be used when uploading files for different functions, while the remaining steps remain consistent.
PDF to Word:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5",
"pageLayoutMode": "e_Flow",
"formulaToImage": 0
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.
enableOcr
: Whether to use OCR (0: not enabled; 1: enabled). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
pageLayoutMode
: Specify the layout mode. e_Box; e_Flow. Default is e_Flow.
formulaToImage
: Whether to convert formulas to images (0: not enabled; 1: enabled). Default 0.
PDF to Excel
Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.
PDF to Excel:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5",
"excelAllContent": 1,
"excelWorksheetOption": "e_ForTable"
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: Disable; 1: Enable). Default is 1.
enableOcr
: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
excelAllContent
: Whether to convert all contents. 1: Yes; 0: No. Default 1.
excelWorksheetOption
: brief Excel Worksheet option. e_ForTable
: A worksheet to contain only one table.; e_ForPage
: A worksheet to contain table for PDF Page; e_ForDocument
: A worksheet to contain table for PDF Document. Default e_ForTable
.
PDF to PPT
Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.
PDF to PPT:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5"
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.
enableOcr
: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
PDF to HTML
Note: Different parameters can be used when uploading files for different functions. The rest of the steps remain the same.
PDF to HTML:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5",
"pageLayoutMode": "e_Flow",
"htmlOption": "e_SinglePage"
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.
enableOcr
: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
pageLayoutMode
: Specify layout mode. e_Box; e_Flow. Default is e_Flow.
htmlOption
: brief Html option. e_SinglePage
: Convert the entire PDF file into a single HTML file.; e_SinglePageWithBookmark
: Convert the PDF file into a single HTML file with an outline for navigation at the beginning of the HTML page.; e_MultiPage
: Convert the PDF file into multiple HTML files.; e_MultiPageWithBookmark
: Convert the PDF file into multiple HTML files. Each HTML file corresponds to a PDF page, and users can navigate to the next HTML file via a link at the bottom of the HTML page. Default is e_SinglePage
.
PDF to RTF
Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent.
PDF to RTF:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5"
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.
enableOcr
: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
PDF to Image
Note: Different parameters can be used when uploading files for each specific function. The other steps remain consistent.
PDF to Image :
{
"pageRanges": "1,2,3-5",
"imageColorMode": "e_Color",
"imageScaling": "1.0"
}
Required parameters
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
imageColorMode
: Specifies the image color mode of the image file. e_Color; e_Gray; e_Binary;. Default is e_Color.
imageScaling
: Specifies the image scaling ratio of the image file. Default is 1.0.
PDF to CSV
Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.
PDF to CSV:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5",
"excelWorksheetOption": "e_ForTable"
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: Disable; 1: Enable). Default is 1.
enableOcr
: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
excelWorksheetOption
: brief Excel Worksheet option. e_ForTable
: A worksheet to contain only one table.; e_ForPage
: A worksheet to contain table for PDF Page; e_ForDocument
: A worksheet to contain table for PDF Document. Default e_ForTable
.
PDF to JSON
Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.
PDF to JSON:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5",
"resolveType": "EXTRACT"
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default is 1.
enableOcr
: Whether to use OCR (0: Disable; 1: Enable). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.
resolveType
: Extract JSON content type. TEXT; TABLE; EXTRACT; IMAGE. Default EXTRACT(Extract All).
Please refer to the explanation of the JSON file content fields in PDF Data Extraction JSON Format Description.pdf.
PDF to TXT
Note: You can use specific parameters for each functionality when uploading files, while the other steps remain the same.
PDF to TXT:
{
"enableAiLayout": 1,
"enableOcr": 0,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5",
"txtTableFormat": 1
}
Required parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
enableOcr
: whether to use OCR (0: not enabled; 1: enabled). Default is 0.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: specify page number conversion, starting from 1. Default is empty.
txtTableFormat
: whether to format the table when converting pdf to txt (0: not enabled; 1: enabled). Default is 1.
PDF to Editable PDF
Note: When using different functions, you can use their own special parameters when uploading files. The other steps are the same.
PDF to Editable PDF:
{
"enableAiLayout": 1,
"isContainImg": 1,
"isContainAnnot": 1,
"enableOcr": 1,
"ocrLanguage": 8,
"pageRanges": "1,2,3-5"
}
Required Parameters
enableAiLayout
: Whether to enable AI layout analysis (0: not enabled; 1: enabled). Default 1.
isContainImg
: Whether to include images during conversion (0: not enabled; 1: enabled). Default 1.
isContainAnnot
: Whether to include annotations during conversion (0: not enabled; 1: enabled). Default 1.
enableOcr
: Whether to use OCR (0: Disable; 1: Enable). Default is 1.
ocrLanguage
: OCR recognition language. 1: CHINESE; 2: CHINESE_TRA; 3: ENGLISH; 4: KOREAN; 5: JAPANESE; 6: LATIN; 7: DEVANAGARI; 8: AUTO. Default is 8.
pageRanges
: Specify page number conversion, starting from 1. Default is empty.