PDF Extractor 2.0 avatar

PDF Extractor 2.0

Try for free

7 days trial then $30.00/month - No credit card required now

Go to Store
PDF Extractor 2.0

PDF Extractor 2.0

jupri/pdf-extractor-2-0
Try for free

7 days trial then $30.00/month - No credit card required now

๐Ÿ’ซ Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

Welcome to PDF Extractor

๐Ÿ‚ About PDF Format

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.

๐Ÿ‚ About This Actor

๐Ÿ’ซ Extract contents from PDF documents

Features :

  • โญ Extract PDF pages as Text or Image (SVG, PNG, JPEG).
  • โญ Extract PDF Metadata.
  • โญ Extract PDF Table of Contents
  • โญ Extract PDF Tables
  • โญ Extract Encrypted PDF (password protected)
  • โญ Extract Embedded images.
  • โญ Extract Attachments.
  • โญ Extract multiple URL files

๐Ÿ‚ Tutorial

Input Parameters

NameTypeDescription
urlArray [String]List of PDF document URL
contentStringOutput pages format (text, svg, png, jpg)
imagesBoolean (true/false)Extract embedded images
attachmentsBoolean (true/false)Extract embedded files
tablesBoolean (true/false)Extract tables

Notes : All extracted resources other than TEXT will be saved to default Key-Value storage.

Dataset Output Format :

1[	
2	# URL-1: Metadata
3	{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
4	# URL-1: Page Contents
5	{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
6	{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },
7	...
8	# URL-2: Metadata
9	{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
10	# URL-2: Page Contents
11	{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
12	{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },	
13	...
14]

๐Ÿ‚ Output Samples

PDF Sample #1

URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf

1{
2
3}

PDF Sample #2

URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf

1{
2
3}

โœ๏ธ Support

โšก๏ธ Feel free to reach out to the developer for any issues or suggestions for improvement.

Developer
Maintained by Community

Actor Metrics

  • 5 monthly users

  • 2 stars

  • >99% runs succeeded

  • Created in Oct 2023

  • Modified 13 days ago