You can access the PDF Scraper programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

1from apify_client import ApifyClient
2
3# Initialize the ApifyClient with your Apify API token
4# Replace '<YOUR_API_TOKEN>' with your token.
5client = ApifyClient("<YOUR_API_TOKEN>")
6
7# Prepare the Actor input
8run_input = {
9    "pdfUrls": [{ "url": "http://www.pdf995.com/samples/pdf.pdf" }],
10    "downloadTimeout": 90,
11    "proxyConfiguration": { "useApifyProxy": True },
12}
13
14# Run the Actor and wait for it to finish
15run = client.actor("onidivo/pdf-scraper").call(run_input=run_input)
16
17# Fetch and print Actor results from the run's dataset (if there are any)
18print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
19for item in client.dataset(run["defaultDatasetId"]).iterate_items():
20    print(item)
21
22# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

PDF Scraper API in Python

The Apify API client for Python is the official library that allows you to use PDF Scraper API in Python, providing convenience functions and automatic retries on errors.

Install the apify-client

$pip install apify-client

Other API clients include:

PDF Scraper API in JavaScript

PDF Scraper API through CLI

PDF Scraper OpenAPI definition

PDF Scraper API

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

Akash Kumar Naik

PDF Text Extractor

jirimoravcik/pdf-text-extractor

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

Jiří Moravčík

822

5.0

PDF Extractor 2.0

jupri/pdf-extractor-2-0

💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

cat

135

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.

codemaster devops

PDF Text Extractor

sami_apify/PDF-Text-Extractor

This actor downloads PDFs from provided URLs, extracts text content from them, and saves the extracted data into an Apify dataset. It’s ideal for scraping and processing PDFs available online.

sami

HTML to PDF Converter

jancurn/url-to-pdf

Loads a web page in headless Chrome using Puppeteer and prints it to PDF. The input is a JSON object and output is a PDF file.

Jan Čurn

509

HTML to PDF converter

apify/html-to-pdf-converter

Convert HTML string to A4 PDF.

Apify

114

4.3

HTML string to PDF

mhamas/html-string-to-pdf

Convert HTML string to A4 PDF.

Matej Hamas

Markdown Converter

jindrich.bar/markdown-converter

A simple Actor for converting pdf / doc / docx files to Markdown.

Jindřich Bär

Google Scholar Search Scraper

ecomscrape/google-scholar-search-scraper

Extract comprehensive academic data from Google Scholar including research papers, citations, author information, and PDF links. Automate your literature review process with advanced scraping capabilities for researchers and academics.

ecomscrape

1.0