PDF to HTML Converter avatar
PDF to HTML Converter

Deprecated

Pricing

Pay per usage

Go to Store
PDF to HTML Converter

PDF to HTML Converter

Deprecated

Developed by

Jan Čurn

Maintained by Community

Converts a PDF document to HTML using the pdf2htmlEX tool.

0.0 (0)

Pricing

Pay per usage

7

Monthly users

2

Runs succeeded

>99%

Last modified

2 years ago

Dockerfile

1FROM debian:jessie
2
3RUN apt-get update --fix-missing \
4 && DEBIAN_FRONTEND=noninteractive apt-get -y upgrade \
5 && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends curl ca-certificates pdf2htmlex \
6 && curl -sL https://deb.nodesource.com/setup_10.x | bash - \
7 && apt-get install -y nodejs \
8 && node -v \
9 && rm -rf /var/lib/apt/lists/*
10
11RUN mkdir -p /pdf/kv-store-dev
12
13WORKDIR /pdf
14
15# Copy all files and directories from the directory to the Docker image
16COPY main.js package.json ./
17
18# Install NPM packages, skip optional and development dependencies to keep the image small,
19# avoid logging to much and show log the dependency tree
20RUN npm install --quiet --only=prod --no-optional \
21 && npm list \
22 && pwd \
23 && ls -l
24
25# Define that start command
26CMD [ "node", "main.js" ]

INPUT_SCHEMA.json

1{
2    "title": "PDF to HTML input",
3    "type": "object",
4    "schemaVersion": 1,
5    "properties": {
6        "url": {
7            "title": "URL",
8            "type": "string",
9            "description": "URL that links to a PDF file",
10            "editor": "textfield",
11            "prefill": "https://apify.com/ext/ycf_application.pdf"
12        }
13    },
14    "required": ["url"]
15}

main.js

1const fs = require('fs');
2const util = require('util');
3const exec = util.promisify(require('child_process').exec);
4const Apify = require('apify');
5const requestPromise = require('request-promise');
6
7Apify.main(async () => {
8    // Fetch the input and check it has a valid format
9    // You don't need to check the input, but it's a good practice.
10    const input = await Apify.getValue('INPUT');
11    if (!input || !input.url) throw new Error('Received invalid input');
12
13    console.log(`Downloading PDF file: ${input.url}`);
14    const options = {
15        url: input.url,
16        encoding: null // set to `null`, if you expect binary data.
17    };
18    const response = await requestPromise(options);
19    const buffer = Buffer.from(response);
20
21    const tmpTarget = 'temp.pdf';
22    console.log('Saving PDF file to: ' + tmpTarget);
23    fs.writeFileSync(tmpTarget, buffer);
24
25    const { stdout, stderr } = await exec('pdf2htmlEX --zoom 1.3 temp.pdf');
26    console.log('stdout:', stdout);
27    console.log('stderr:', stderr);
28
29    const htmlBuffer = fs.readFileSync('temp.html');
30
31    console.log(`Saving HTML (size: ${htmlBuffer.length} bytes) to output...`);
32    await Apify.setValue('OUTPUT', htmlBuffer, { contentType: 'text/html' });
33
34    const storeId = process.env.APIFY_DEFAULT_KEY_VALUE_STORE_ID;
35
36    // NOTE: Adding disableRedirect=1 param, because for some reason Chrome doesn't allow pasting URLs to PDF
37    // that redirect into the browser address bar (yeah, wtf...)
38    console.log('HTML file has been stored to:');
39    console.log(`https://api.apify.com/v2/key-value-stores/${storeId}/records/OUTPUT`);
40});

package.json

1{
2  "name": "act-pdf-to-html",
3  "version": "0.0.1",
4  "private": true,
5  "dependencies": {
6    "apify": "^0.15.2",
7    "request-promise": "^4.2.4"
8  },
9  "devDependencies": {},
10  "scripts": {
11    "test-local": "APIFY_DEV_KEY_VALUE_STORE_DIR=./kv-store-dev/ node main.js"
12  }
13}

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.