PDF Extractor 2.0 avatar
PDF Extractor 2.0

Pricing

$30.00/month + usage

Go to Store
PDF Extractor 2.0

PDF Extractor 2.0

Developed by

cat

cat

Maintained by Community

๐Ÿ’ซ Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

0.0 (0)

Pricing

$30.00/month + usage

5

Total users

94

Monthly users

9

Runs succeeded

>99%

Last modified

a month ago

Welcome to PDF Extractor

๐Ÿ‚ About PDF Format

Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991.[4] PDF was standardized as ISO 32000 in 2008.[5] The last edition as ISO 32000-2:2020 was published in December 2020.

๐Ÿ‚ About This Actor

๐Ÿ’ซ Extract contents from PDF documents

Features :

  • โญ Extract PDF pages as Text or Image (SVG, PNG, JPEG).
  • โญ Extract PDF Metadata.
  • โญ Extract PDF Table of Contents
  • โญ Extract PDF Tables
  • โญ Extract Encrypted PDF (password protected)
  • โญ Extract Embedded images.
  • โญ Extract Attachments.
  • โญ Extract multiple URL files

๐Ÿ‚ Tutorial

Input Parameters

NameTypeDescription
urlArray [String]List of PDF document URL
contentStringOutput pages format (text, svg, png, jpg)
imagesBoolean (true/false)Extract embedded images
attachmentsBoolean (true/false)Extract embedded files
tablesBoolean (true/false)Extract tables

Notes : All extracted resources other than TEXT will be saved to default Key-Value storage.

Dataset Output Format :

[
# URL-1: Metadata
{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
# URL-1: Page Contents
{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },
...
# URL-2: Metadata
{ "metadata": { "headers": { ... }, "url": "...", "mime": "..." } },
# URL-2: Page Contents
{ "index": 0, "content": "...page-0 contents...", "images": [...], "tables": [...] },
{ "index": 1, "content": "...page-1 contents...", "images": [...], "tables": [...] },
...
]

๐Ÿ‚ Output Samples

PDF Sample #1

URL : https://www.w3.org/WAI/WCAG21/working-examples/pdf-table/table.pdf

{
}

PDF Sample #2

URL : https://apify.com/img/web-scraping/beginners-guide-to-web-scraping.pdf

{
}

โœ๏ธ Support

โšก๏ธ Feel free to reach out to the developer for any issues or suggestions for improvement.