Get started
Product
Back
Start here!
Get data with ready-made web scrapers for popular websites
Browse 6,000+ Actors
Apify platform
Apify Store
Pre-built web scraping tools
Actors
Build and run serverless programs
Integrations
Connect with apps and services
AI agents
Equip your AI agents with Actors
Anti-blocking
Scrape without getting blocked
Proxy
Rotate scraper IP addresses
Open source
Crawlee
Web scraping and crawling library
Solutions
Build and monetize MCP servers
Learn how to turn MCP servers into revenue with the latest webinar from our AI team. Available now.
Watch now on YouTube
Web data for
Enterprise
Startups
Universities
Nonprofits
Use cases
Data for generative AI
Lead generation
Market research
Sentiment analysis
View more →
Consulting
Apify Professional Services
Apify Partners
Developers
Documentation
Full reference for the Apify platform
Code templates
Python, JavaScript, and TypeScript
Web scraping academy
Courses for beginners and experts
Monetize your code
Publish your scrapers and get paid
Learn
API reference
CLI
SDK
Earn from your code
$495k paid out in August alone. Many developers earn $3k+ every month.
Start earning now
Resources
Help and support
Advice and answers about Apify
Submit your ideas
Tell us the Actors you want
Changelog
See what’s new on Apify
Customer stories
Find out how others use Apify
Company
About Apify
Contact us
Blog
Live events
Partners
Jobs
We're hiring!
Join our Discord
Talk to scraping experts
Pricing
Contact sales
Pay per event
valek.josef/document-ocr
Developed by
Josef Válek
0.0 (0)
3
42
12
Last modified
2 days ago
Developer tools
Automation
m3web/image-text-extractor
Extract text from images using OCR (Optical Character Recognition) via direct URLs or uploaded JSON/CSV files. Works with multiple languages and automatically enriches your structured file with the text found inside images.
M3Web
13
sami_apify/PDF-Text-Extractor
This actor downloads PDFs from provided URLs, extracts text content from them, and saves the extracted data into an Apify dataset. It’s ideal for scraping and processing PDFs available online.
sami
43
jirimoravcik/pdf-text-extractor
PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.
Jiří Moravčík
796
5.0
vancura/docling
Docling document parser & converter – Convert documents into structured data without complexity. This Actor leverages the powerful Docling library to parse and transform various document formats into clean, structured outputs ready for analysis or integration.
Václav Vančura
231
akash9078/pdf-text-extractor
Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.
Akash Kumar Naik
15
onidivo/pdf-scraper
Scrape and extract text from PDF links.
Onidivo Technologies
417
jupri/pdf-extractor-2-0
💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.
cat
131
powerful_bachelor/html-to-pdf-converter-pro
🔄 Convert web pages to high-quality PDFs with special canvas element handling! Perfect for 📄 documentation, 🖨️ printing, and 🔒 archiving. Features include batch processing and flexible page settings. Transform your web content into professional PDFs! 🚀
Powerful Bachelor
21
dainty_screw/pdf-text-extractor-pro
PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.
codemaster devops
14
jancurn/url-to-pdf
Loads a web page in headless Chrome using Puppeteer and prints it to PDF. The input is a JSON object and output is a PDF file.
Jan Čurn
500