Pricing

from $1.40 / 1,000 results

PDF Tools (Merge / Split / Compress / OCR / Watermark)

All-in-one PDF processor: merge multiple PDFs, split by page ranges, compress file size, extract text, OCR scanned documents (Tesseract), add text watermarks, rotate pages, and read metadata. Accepts PDF URLs or Key-Value Store keys.

Pricing

from $1.40 / 1,000 results

Rating

0.0

(0)

Developer

Alex O

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

PDF Tools

All-in-one PDF processing Actor for the Apify platform. Merge, split, compress, extract text, OCR scanned documents, add watermarks, rotate pages, and read metadata — all from simple PDF URL inputs. No coding required.

What is PDF Tools?

PDF Tools is a serverless PDF processor that runs entirely on the Apify platform. It accepts one or more direct-download PDF URLs and performs the operation you select. Results are stored in the run's dataset (structured JSON) and key-value store (binary PDF/TXT files).

Supported operations:

Merge — Combine multiple PDFs into a single document
Split — Split a PDF by page ranges (individual pages or custom groups)
Compress — Reduce file size with 3 compression levels (low / medium / high)
Extract Text — Extract embedded text from PDF pages
OCR — OCR scanned or image-based PDFs using Tesseract (6 languages pre-installed)
Watermark — Add a customizable diagonal text watermark to every page
Rotate — Rotate pages by 90°, 180°, or 270°
Metadata — Read PDF metadata (title, author, creator, dates, page sizes)
Page Count — Quick page count without full processing

What can PDF Tools be used for?

Document automation — Extract text from invoices, contracts, or reports for downstream processing
Bulk PDF processing — Compress hundreds of PDFs to reduce storage costs
Archival workflows — Add "CONFIDENTIAL" or "DRAFT" watermarks to sensitive documents
OCR pipelines — Convert scanned documents to searchable text (supports English, German, French, Spanish, Italian, Portuguese)
Page management — Split large documents into chapters or merge individual pages into one PDF
Data extraction — Read metadata and page counts from PDF files at scale
Integration with AI agents — Use as a tool in agentic workflows via Apify's MCP integration

How to use PDF Tools

Go to the PDF Tools Input tab
Select the Operation you want to perform
Add one or more PDF file URLs (direct-download links ending in .pdf)
Configure any optional settings (page ranges, compression level, watermark text, etc.)
Click Start and wait for the run to complete
Download results from the Dataset tab (JSON) or Key-Value Store tab (PDF/TXT files)

Input

The Actor accepts the following input fields. For a full technical reference, see the Input tab.

Operation (required)

Choose which PDF operation to perform: merge, split, compress, extractText, ocr, watermark, rotate, metadata, or pageCount.

PDF file URLs (required)

A list of direct-download URLs to PDF files. For the merge operation, the order matters — PDFs are combined in the order listed.

Page ranges (optional)

Comma-separated page ranges (1-indexed), e.g. 1-3,5,8-10. Used by split, extractText, ocr, and rotate to target specific pages. If omitted, all pages are processed.

For the split operation, use semicolons to create separate output groups: 1-2;3-4;5 produces three separate PDFs.

Compression settings (optional)

Choose a compression level for the compress operation:

Low — Strips metadata, removes unreferenced objects (lossless)
Medium — Additionally recompresses internal streams using Flate compression
High — Aggressive: linearizes the PDF and applies maximum stream recompression

Watermark settings (optional)

Configure the text watermark for the watermark operation:

Watermark text — The text to overlay (default: CONFIDENTIAL)
Opacity — 0.01 (barely visible) to 1.0 (fully opaque), default: 0.15
Font size — 10 to 200 points, default: 60
Angle — Rotation in degrees (0–360°), default: 45°

Rotation angle (optional)

Clockwise rotation for the rotate operation: 90, 180, or 270 degrees.

OCR languages (optional)

Tesseract language codes for the ocr operation. Pre-installed packs:

Code	Language
`eng`	English
`deu`	German
`fra`	French
`spa`	Spanish
`ita`	Italian
`por`	Portuguese

Combine multiple languages with +, e.g. eng+deu.

Output file name (optional)

Base name for the output file saved to the key-value store (without extension). If omitted, a name is generated automatically based on the operation.

Output

Dataset (structured JSON)

Every run pushes one record per processed PDF to the default dataset. Each record includes:

{
    "operation": "compress",
    "inputFile": "https://example.com/file.pdf",
    "outputKey": "compressed_1.pdf",
    "pageCount": 3,
    "fileSizeKb": 41.3,
    "originalSizeKb": 48.5,
    "reductionPercent": 14.8,
    "status": "OK",
    "error": null
}

Additional fields are included depending on the operation:

Operation	Additional fields
extractText / ocr	`totalChars`, `pages` (array with per-page `text` and `charCount`)
compress	`originalSizeKb`, `reductionPercent`
watermark	`watermarkText`
rotate	`rotateAngle`
metadata	`title`, `author`, `creator`, `creationDate`, `modificationDate`, `pageSizes`

Key-Value Store (binary files)

Operations that produce new PDFs (merge, split, compress, watermark, rotate) save the resulting files to the default key-value store. Text extraction and OCR save .txt files.

Access output files via the Apify API:

https://api.apify.com/v2/key-value-stores/{storeId}/records/{outputKey}

Examples

Count pages

{
    "operation": "pageCount",
    "pdfUrls": [
        "https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"
    ]
}

Result:

{
    "operation": "pageCount",
    "inputFile": "https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf",
    "pageCount": 3,
    "fileSizeKb": 48.5,
    "status": "OK",
    "error": null
}

Extract text from specific pages

{
    "operation": "extractText",
    "pdfUrls": [
        "https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"
    ],
    "pageRanges": "1-2"
}

Result:

{
    "operation": "extractText",
    "outputKey": "text_1.txt",
    "pageCount": 2,
    "totalChars": 6562,
    "pages": [
        { "page": 1, "text": "Sample PDF  Created for testing ...", "charCount": 2977 },
        { "page": 2, "text": "ipsum dolor sit amet ...", "charCount": 3585 }
    ],
    "status": "OK"
}

Compress a PDF

{
    "operation": "compress",
    "pdfUrls": [
        "https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"
    ],
    "compressionLevel": "high"
}

Result: 48.5 KB → 41.3 KB (14.8% reduction)

Add a watermark

{
    "operation": "watermark",
    "pdfUrls": [
        "https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"
    ],
    "watermarkText": "DRAFT",
    "watermarkOpacity": 0.2,
    "watermarkFontSize": 72,
    "watermarkAngle": 45
}

Split into custom groups

Use semicolons to define output groups:

{
    "operation": "split",
    "pdfUrls": [
        "https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"
    ],
    "pageRanges": "1-2;3"
}

Result: Two output PDFs — pages 1-2 (42 KB) and page 3 (26.8 KB).

Read metadata

{
    "operation": "metadata",
    "pdfUrls": [
        "https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"
    ]
}

Result:

{
    "pageCount": 3,
    "fileSizeKb": 48.5,
    "title": "Sample PDF",
    "pageSizes": [
        { "page": 1, "widthPt": 612, "heightPt": 792 },
        { "page": 2, "widthPt": 612, "heightPt": 792 },
        { "page": 3, "widthPt": 612, "heightPt": 792 }
    ],
    "operation": "metadata",
    "status": "OK"
}

Merge two PDFs

{
    "operation": "merge",
    "pdfUrls": [
        "https://example.com/first.pdf",
        "https://example.com/second.pdf"
    ],
    "outputFileName": "combined_report"
}

Rotate pages

{
    "operation": "rotate",
    "pdfUrls": [
        "https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"
    ],
    "rotateAngle": "180",
    "pageRanges": "1"
}

OCR a scanned PDF

{
    "operation": "ocr",
    "pdfUrls": [
        "https://example.com/scanned_document.pdf"
    ],
    "ocrLanguages": "eng+deu"
}

Using PDF Tools with the Apify API

The Apify API gives you programmatic access to PDF Tools. You can start runs, retrieve results, and integrate the Actor into your automation workflows.

To access the API using Python, use the apify-client PyPI package. To access the API using JavaScript, use the apify-client NPM package.

Start a run via REST API:

curl -X POST \
  "https://api.apify.com/v2/acts/mrkrokko~pdf-tools/runs?waitForFinish=120" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "operation": "extractText",
    "pdfUrls": ["https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"]
  }'

Retrieve dataset results:

curl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?format=json" \
  -H "Authorization: Bearer YOUR_API_TOKEN"

Download output files from Key-Value Store:

curl "https://api.apify.com/v2/key-value-stores/{STORE_ID}/records/{outputKey}" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -o output.pdf

For full API documentation, see the API tab or the Apify API reference.

Integrations

PDF Tools can be integrated with almost any cloud service or web app through Apify's built-in integrations:

Make (Integromat) — Trigger PDF processing as part of automated workflows
Zapier — Connect with 5,000+ apps
Google Drive / Sheets — Store results automatically
Webhooks — Get notified when a run completes
MCP — Use as a tool in AI agent workflows via Apify MCP

Pdf Power Tools

agenscrape/pdf-power-tools

Split, merge, compress, convert & OCR PDFs via API. Extract text from scanned documents in 14 languages. Compress files for email, convert pages to PNG/JPEG/WebP, split by pages or ranges, merge multiple PDFs. Perfect for document automation & data extraction workflows.

Agenscrape

PDF OCR Tool — Scanned PDF Text Extraction

junipr/pdf-ocr-tool

Run OCR on scanned PDFs and image-based documents. Extract text by page with language options, confidence scores, and searchable text exports.

junipr

PDF Text Extractor - Bulk PDF to Text & Metadata

santamaria-automations/pdf-extractor

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.

NanoScrape

PDF Extract — Text, Tables & Metadata (OCR-ready)

sathvic_kollu/techtenstein-pdf-extract

Extract clean text, structured tables, and metadata from any PDF URL. Supports OCR for scanned documents. Ideal for building document pipelines, financial data extraction, invoice processing, and research automation.

Techtenstein Services Private Limited

Fast Pdf Processor

contemporary_fruit/pdf-processor-actor

This API is a PDF Processing Service allowing users to upload a PDF to: Extract Text: Reads all text from the PDF and returns it as structured JSON data per page. Merge Pages: Creates a new PDF containing only the specific pages selected by the user. (260 characters)

Andric

PDF Tools MCP Server

zekovdev/pdf-mcp-apify

Merge, split, rotate, watermark, extract text, delete pages, reorder, and set metadata on PDFs. 11 tools via MCP. Fully local processing — zero external APIs, your PDFs stay private. Works with Claude, Cursor, VS Code, ChatGPT.

Zek

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

516

PDF to Markdown Converter

web.harvester/pdf-to-markdown-converter

Convert PDFs to clean Markdown with optional OCR for scanned documents. Uses PDF.js for text extraction and Tesseract.js for optical character recognition.

Web Harvester

PDF OCR API - Document Extraction

alizarin_refrigerator-owner/pdf-ocr-api

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.