PDF Scraper
1 day trial then $20.00/month - No credit card required now
PDF Scraper
1 day trial then $20.00/month - No credit card required now
Scrape and extract text from PDF links.
Scrape and extract PDF text from PDF files.
Features
- Scrape multiple files
- Save the file and extracted text to the key-value store
- Want more? Let us know here
Cost of usage
When running the actor with memory of 2048 MB and using datacenter proxies, average consumption is $4-8 for 1000 middle sized files.
Bugs, issues, features, and feedback
You can report issues on the Actor tab "Issues" or here and discuss or leave your feedback here.
Input
You can provide input either through the editor on the Apify platform or as a JSON object.
The only mandatory field you need to provide is the PDF URLs (pdfUrls).
An example of minimal input:
1{ 2 "pdfUrl": [ 3 { 4 "url": "http://www.pdf995.com/samples/pdf.pdf" 5 } 6 ], 7 "proxyConfiguration": { 8 "useApifyProxy": true 9 } 10}
We recommend using the proxies to overcome blocking and detection if required.
Output
The extracted text is saved to the dataset, and it looks like this:
1[ 2 { 3 "pdfUrl": "http://www.pdf995.com/samples/pdf.pdf", 4 "extractedText": "\n\n\n\n\n\n\n\n\nThe pdf995 suite of products - Pdf995, PdfEdit995, and Signature995 - is a complete solution for your document publishing needs. It provides ease of use, flexibility in format, and industry-standard security- and all at no cost to you.\nPdf995 makes it easy and affordable to create professional-quality documents in the popular PDF file format. Its easy-to-use interface helps you to create PDF files by simply selecting the \"print\" command from any application, creating documents which can be viewed on any computer with a PDF viewer. Pdf995 supports network file saving, fast user switching on XP, Citrix/Terminal Server, custom page sizes and large format printing. Pdf995 is a printer...", 5 "extractedTextFileUrl": "" 6 } 7]
Actor Metrics
17 monthly users
-
4 stars
98% runs succeeded
Created in Apr 2023
Modified 8 months ago