PDF Extractor 2.0 avatar
PDF Extractor 2.0

Pricing

$30.00/month + usage

Go to Store
PDF Extractor 2.0

PDF Extractor 2.0

Developed by

cat

cat

Maintained by Community

💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

0.0 (0)

Pricing

$30.00/month + usage

3

Total users

69

Monthly users

9

Runs succeeded

95%

Last modified

5 months ago

Welcome to PDF Extractor

🍂 About PDF Format

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.

🍂 About This Actor

💫 Extract contents from PDF documents

Features :

  • ⭐ Extract PDF pages as Text or Image (SVG, PNG, JPEG).
  • ⭐ Extract PDF Metadata.
  • ⭐ Extract PDF Table of Contents
  • ⭐ Extract PDF Tables
  • ⭐ Extract Encrypted PDF (password protected)
  • ⭐ Extract Embedded images.
  • ⭐ Extract Attachments.
  • ⭐ Extract multiple URL files

🍂 Tutorial

Input Parameters

NameTypeDescription
urlArray [String]List of PDF document URL
contentStringOutput pages format (text, svg, png, jpg)
imagesBoolean (true/false)Extract embedded images
attachmentsBoolean (true/false)Extract embedded files
tablesBoolean (true/false)Extract tables

Notes : All extracted resources other than TEXT will be saved to default Key-Value storage.

Dataset Output Format :

[
# URL-1: Metadata
{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
# URL-1: Page Contents
{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },
...
# URL-2: Metadata
{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
# URL-2: Page Contents
{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },
...
]

🍂 Output Samples

PDF Sample #1

URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf

{
}

PDF Sample #2

URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf

{
}

✏️ Support

⚡️ Feel free to reach out to the developer for any issues or suggestions for improvement.