No credit card required

PDF Text Extractor

jirimoravcik/pdf-text-extractor

No credit card required

PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.

You can access the PDF Text Extractor programmatically from your own applications by using the Apify API. You can choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

1# Set API token
2API_TOKEN=<YOUR_API_TOKEN>
3
4# Prepare Actor input
5cat > input.json << 'EOF'
6{
7  "urls": [
8    "https://arxiv.org/pdf/2307.12856"
9  ]
10}
11EOF
12
13# Run the Actor using an HTTP API
14# See the full API reference at https://docs.apify.com/api/v2
15curl "https://api.apify.com/v2/acts/jirimoravcik~pdf-text-extractor/runs?token=$API_TOKEN" \
16  -X POST \
17  -d @input.json \
18  -H 'Content-Type: application/json'

PDF Text Extractor API

Below, you can find a list of relevant HTTP API endpoints for calling the PDF Text Extractor Actor. For this, you’ll need an Apify account. Replace <YOUR_API_TOKEN> in the URLs with your Apify API token, which you can find under Integrations in Apify Console. For details, see the API reference .

Run Actor

POST

https://api.apify.com/v2/acts/jirimoravcik~pdf-text-extractor/runs?token=<YOUR_API_TOKEN>

Note: By adding the method=POST query parameter, this API endpoint can be called using a GET request and thus used in third-party webhooks. Please refer to our Run Actor API documentation .

Run Actor synchronously and get dataset items

POST

https://api.apify.com/v2/acts/jirimoravcik~pdf-text-extractor/run-sync-get-dataset-items?token=<YOUR_API_TOKEN>

Note: This endpoint supports both POST and GET request methods. However, only the POST method allows you to pass input data. For more information, please refer to our Run Actor synchronously and get dataset items API documentation .

Get Actor

GET

https://api.apify.com/v2/acts/jirimoravcik~pdf-text-extractor?token=<YOUR_API_TOKEN>

For more information, please refer to our Get Actor API documentation .

Actors can be used to scrape web pages, extract data, or automate browser tasks. Use the PDF Text Extractor API programmatically via the Apify API.

You can choose from:

PDF Text Extractor API in Python

PDF Text Extractor API in JavaScript

PDF Text Extractor API through CLI

You can start PDF Text Extractor with the Apify API by sending an HTTP POST request to the Run Actor endpoint. An Actor’s input and its content type can be passed as a payload of the POST request, and additional options can be specified using URL query parameters. The PDF Text Extractor is identified within the API by its ID, which is the creator’s username and the name of the Actor.

When the PDF Text Extractor run finishes you can list the data from its default dataset (storage) via the API or you can preview the data directly on Apify Console .

Developer

Jiří Moravčík

Actor Metrics

43 monthly users
19 stars
>99% runs succeeded
Created in Oct 2023
Modified 4 months ago

Categories

Integrations

Automation

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

235

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

31.7k

852

Google Search Results Scraper

apify/google-search-scraper

Scrape Google Search Engine Results Pages (SERPs). Select the country or language and extract organic and paid results, AI overviews, ads, queries, People Also Ask, prices, reviews, like a Google SERP API. Export scraped data, run the scraper via API, schedule runs, or integrate with other tools.

Apify

52.1k

293

Youtube Video Downloader

epctex/youtube-video-downloader

Effortlessly download YouTube videos of your preferred quality with our user-friendly Video Downloader. Try it now!

epctex

600

Tiktok Shop Scraper

excavator/tiktok-shop-scraper

This is the Actor for crawling data from the TikTok shop product URLs. For example: https://shop.tiktok.com/view/product/XXXXXXXXXX These URLs are only available for TikTok Shop US. You can test it here: https://apify.com/excavator/tiktok-shop-product

Excavator

Reddit Scraper Lite

trudax/reddit-scraper-lite

Pay Per Result, unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

Gustavo Rudiger

4.3k

Download HTML from URLs

mtrunkat/url-list-download-html

This actor takes a list of URLs and downloads HTML of each page.

Marek Trunkát

8.3k

🔥 LinkedIn Jobs Scraper

bebity/linkedin-jobs-scraper

ℹ️ Designed for both personal and professional use, simply enter your desired job title and location to receive a tailored list of job opportunities. Try it today!

Bebity

4.9k

121

Rightmove Scraper

dhrumil/rightmove-scraper

Scrape rightmove.co.uk to crawl millions of sale/rent real estate properties from United Kingdom. Our real estate scraper also lets you monitor specific listing for new updates/listing. You can provide multiple search result listings to scrape/monitor.