PDF Extractor 2.0
7 days trial then $30.00/month - No credit card required now
PDF Extractor 2.0
7 days trial then $30.00/month - No credit card required now
๐ซ Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.
Welcome to PDF Extractor
๐ About PDF Format
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.
๐ About This Actor
๐ซ Extract contents from PDF documents
Features :
- โญ Extract PDF pages as Text or Image (SVG, PNG, JPEG).
- โญ Extract PDF Metadata.
- โญ Extract PDF Table of Contents
- โญ Extract PDF Tables
- โญ Extract Encrypted PDF (password protected)
- โญ Extract Embedded images.
- โญ Extract Attachments.
- โญ Extract multiple URL files
๐ Tutorial
Input Parameters
Name | Type | Description |
---|---|---|
url | Array [String] | List of PDF document URL |
content | String | Output pages format (text, svg, png, jpg ) |
images | Boolean (true/false) | Extract embedded images |
attachments | Boolean (true/false) | Extract embedded files |
tables | Boolean (true/false) | Extract tables |
Notes : All extracted resources other than TEXT will be saved to default Key-Value storage.
Dataset Output Format :
1[ 2 # URL-1: Metadata 3 { "metadata": { "headers": { ... }, "url": "...", "mime": "..." } }, 4 # URL-1: Page Contents 5 { "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] }, 6 { "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] }, 7 ... 8 # URL-2: Metadata 9 { "metadata": { "headers": { ... }, "url": "...", "mime": "..." } }, 10 # URL-2: Page Contents 11 { "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] }, 12 { "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] }, 13 ... 14]
๐ Output Samples
PDF Sample #1
URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf
1{ 2 3}
PDF Sample #2
URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf
1{ 2 3}
โ๏ธ Support
โก๏ธ Feel free to reach out to the developer for any issues or suggestions for improvement.
Actor Metrics
5 monthly users
-
2 stars
>99% runs succeeded
Created in Oct 2023
Modified 13 days ago