PDF Tools (Merge / Split / Compress / OCR / Watermark)
Pricing
from $1.40 / 1,000 results
PDF Tools (Merge / Split / Compress / OCR / Watermark)
All-in-one PDF processor: merge multiple PDFs, split by page ranges, compress file size, extract text, OCR scanned documents (Tesseract), add text watermarks, rotate pages, and read metadata. Accepts PDF URLs or Key-Value Store keys.
Pricing
from $1.40 / 1,000 results
Rating
0.0
(0)
Developer
Alex O
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
PDF Tools
All-in-one PDF processing Actor for the Apify platform. Merge, split, compress, extract text, OCR scanned documents, add watermarks, rotate pages, and read metadata — all from simple PDF URL inputs. No coding required.
What is PDF Tools?
PDF Tools is a serverless PDF processor that runs entirely on the Apify platform. It accepts one or more direct-download PDF URLs and performs the operation you select. Results are stored in the run's dataset (structured JSON) and key-value store (binary PDF/TXT files).
Supported operations:
- Merge — Combine multiple PDFs into a single document
- Split — Split a PDF by page ranges (individual pages or custom groups)
- Compress — Reduce file size with 3 compression levels (low / medium / high)
- Extract Text — Extract embedded text from PDF pages
- OCR — OCR scanned or image-based PDFs using Tesseract (6 languages pre-installed)
- Watermark — Add a customizable diagonal text watermark to every page
- Rotate — Rotate pages by 90°, 180°, or 270°
- Metadata — Read PDF metadata (title, author, creator, dates, page sizes)
- Page Count — Quick page count without full processing
What can PDF Tools be used for?
- Document automation — Extract text from invoices, contracts, or reports for downstream processing
- Bulk PDF processing — Compress hundreds of PDFs to reduce storage costs
- Archival workflows — Add "CONFIDENTIAL" or "DRAFT" watermarks to sensitive documents
- OCR pipelines — Convert scanned documents to searchable text (supports English, German, French, Spanish, Italian, Portuguese)
- Page management — Split large documents into chapters or merge individual pages into one PDF
- Data extraction — Read metadata and page counts from PDF files at scale
- Integration with AI agents — Use as a tool in agentic workflows via Apify's MCP integration
How to use PDF Tools
- Go to the PDF Tools Input tab
- Select the Operation you want to perform
- Add one or more PDF file URLs (direct-download links ending in
.pdf) - Configure any optional settings (page ranges, compression level, watermark text, etc.)
- Click Start and wait for the run to complete
- Download results from the Dataset tab (JSON) or Key-Value Store tab (PDF/TXT files)
Input
The Actor accepts the following input fields. For a full technical reference, see the Input tab.
Operation (required)
Choose which PDF operation to perform: merge, split, compress, extractText, ocr, watermark, rotate, metadata, or pageCount.
PDF file URLs (required)
A list of direct-download URLs to PDF files. For the merge operation, the order matters — PDFs are combined in the order listed.
Page ranges (optional)
Comma-separated page ranges (1-indexed), e.g. 1-3,5,8-10. Used by split, extractText, ocr, and rotate to target specific pages. If omitted, all pages are processed.
For the split operation, use semicolons to create separate output groups: 1-2;3-4;5 produces three separate PDFs.
Compression settings (optional)
Choose a compression level for the compress operation:
- Low — Strips metadata, removes unreferenced objects (lossless)
- Medium — Additionally recompresses internal streams using Flate compression
- High — Aggressive: linearizes the PDF and applies maximum stream recompression
Watermark settings (optional)
Configure the text watermark for the watermark operation:
- Watermark text — The text to overlay (default:
CONFIDENTIAL) - Opacity — 0.01 (barely visible) to 1.0 (fully opaque), default: 0.15
- Font size — 10 to 200 points, default: 60
- Angle — Rotation in degrees (0–360°), default: 45°
Rotation angle (optional)
Clockwise rotation for the rotate operation: 90, 180, or 270 degrees.
OCR languages (optional)
Tesseract language codes for the ocr operation. Pre-installed packs:
| Code | Language |
|---|---|
eng | English |
deu | German |
fra | French |
spa | Spanish |
ita | Italian |
por | Portuguese |
Combine multiple languages with +, e.g. eng+deu.
Output file name (optional)
Base name for the output file saved to the key-value store (without extension). If omitted, a name is generated automatically based on the operation.
Output
Dataset (structured JSON)
Every run pushes one record per processed PDF to the default dataset. Each record includes:
{"operation": "compress","inputFile": "https://example.com/file.pdf","outputKey": "compressed_1.pdf","pageCount": 3,"fileSizeKb": 41.3,"originalSizeKb": 48.5,"reductionPercent": 14.8,"status": "OK","error": null}
Additional fields are included depending on the operation:
| Operation | Additional fields |
|---|---|
| extractText / ocr | totalChars, pages (array with per-page text and charCount) |
| compress | originalSizeKb, reductionPercent |
| watermark | watermarkText |
| rotate | rotateAngle |
| metadata | title, author, creator, creationDate, modificationDate, pageSizes |
Key-Value Store (binary files)
Operations that produce new PDFs (merge, split, compress, watermark, rotate) save the resulting files to the default key-value store. Text extraction and OCR save .txt files.
Access output files via the Apify API:
https://api.apify.com/v2/key-value-stores/{storeId}/records/{outputKey}
Examples
Count pages
{"operation": "pageCount","pdfUrls": ["https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"]}
Result:
{"operation": "pageCount","inputFile": "https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf","pageCount": 3,"fileSizeKb": 48.5,"status": "OK","error": null}
Extract text from specific pages
{"operation": "extractText","pdfUrls": ["https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"],"pageRanges": "1-2"}
Result:
{"operation": "extractText","outputKey": "text_1.txt","pageCount": 2,"totalChars": 6562,"pages": [{ "page": 1, "text": "Sample PDF Created for testing ...", "charCount": 2977 },{ "page": 2, "text": "ipsum dolor sit amet ...", "charCount": 3585 }],"status": "OK"}
Compress a PDF
{"operation": "compress","pdfUrls": ["https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"],"compressionLevel": "high"}
Result: 48.5 KB → 41.3 KB (14.8% reduction)
Add a watermark
{"operation": "watermark","pdfUrls": ["https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"],"watermarkText": "DRAFT","watermarkOpacity": 0.2,"watermarkFontSize": 72,"watermarkAngle": 45}
Split into custom groups
Use semicolons to define output groups:
{"operation": "split","pdfUrls": ["https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"],"pageRanges": "1-2;3"}
Result: Two output PDFs — pages 1-2 (42 KB) and page 3 (26.8 KB).
Read metadata
{"operation": "metadata","pdfUrls": ["https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"]}
Result:
{"pageCount": 3,"fileSizeKb": 48.5,"title": "Sample PDF","pageSizes": [{ "page": 1, "widthPt": 612, "heightPt": 792 },{ "page": 2, "widthPt": 612, "heightPt": 792 },{ "page": 3, "widthPt": 612, "heightPt": 792 }],"operation": "metadata","status": "OK"}
Merge two PDFs
{"operation": "merge","pdfUrls": ["https://example.com/first.pdf","https://example.com/second.pdf"],"outputFileName": "combined_report"}
Rotate pages
{"operation": "rotate","pdfUrls": ["https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"],"rotateAngle": "180","pageRanges": "1"}
OCR a scanned PDF
{"operation": "ocr","pdfUrls": ["https://example.com/scanned_document.pdf"],"ocrLanguages": "eng+deu"}
Using PDF Tools with the Apify API
The Apify API gives you programmatic access to PDF Tools. You can start runs, retrieve results, and integrate the Actor into your automation workflows.
To access the API using Python, use the apify-client PyPI package. To access the API using JavaScript, use the apify-client NPM package.
Start a run via REST API:
curl -X POST \"https://api.apify.com/v2/acts/mrkrokko~pdf-tools/runs?waitForFinish=120" \-H "Authorization: Bearer YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"operation": "extractText","pdfUrls": ["https://ontheline.trincoll.edu/images/bookdown/sample-local-pdf.pdf"]}'
Retrieve dataset results:
curl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?format=json" \-H "Authorization: Bearer YOUR_API_TOKEN"
Download output files from Key-Value Store:
curl "https://api.apify.com/v2/key-value-stores/{STORE_ID}/records/{outputKey}" \-H "Authorization: Bearer YOUR_API_TOKEN" \-o output.pdf
For full API documentation, see the API tab or the Apify API reference.
Integrations
PDF Tools can be integrated with almost any cloud service or web app through Apify's built-in integrations:
- Make (Integromat) — Trigger PDF processing as part of automated workflows
- Zapier — Connect with 5,000+ apps
- Google Drive / Sheets — Store results automatically
- Webhooks — Get notified when a run completes
- MCP — Use as a tool in AI agent workflows via Apify MCP