Cheerio Scraper
apify/cheerio-scraper
Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.
5.9k
89
Web Scraper
apify/web-scraper
Crawls arbitrary websites using the Chrome browser and extracts data from pages using JavaScript code. The Actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.
73.3k
309
Puppeteer Scraper
apify/puppeteer-scraper
Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.
4.9k
76
Legacy PhantomJS Crawler
apify/legacy-phantomjs-crawler
Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.
1.6k
21
Playwright Scraper
apify/playwright-scraper
Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.
924
21
Pinecone Integration
apify/pinecone-integration
This integration transfers data from Apify Actors to a Pinecone and is a good starting point for a question-answering, search, or RAG use case.
130
19
SuperScraper API
apify/super-scraper-api
Generic REST API for scraping websites: send a URL and get back HTML. This Actor is a drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!
509
29
Forward Dataset to Actor or Task
valek.josef/forward-dataset-to-actor-or-task
Forwards contents of specified dataset to a specified field on the input of another Actor or task.
4
4
OpenSearch Integration
apify/opensearch-integration
Transfer data from Apify Actors to Amazon OpenSearch Service. This Actor is a good starting point for building question-answering systems, search functionality, or Retrieval-Augmented Generation (RAG) use cases.
3
1
Chroma Integration
apify/chroma-integration
This integration transfers data from Apify Actors to a Chroma and is a good starting point for a question-answering, search, or RAG use case.
1
0
Actor Readme Generator
apify/actor-readme-generator
Generates READMEs scrapers using ChatGPT, based on an Apify-approved template.
15
4
JSDOM Scraper
apify/jsdom-scraper
Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Scraper and the browser scrapers.
82
4
Weaviate Integration
apify/weaviate-integration
This integration transfers data from Apify Actors to a Weaviate and is a good starting point for a question-answering, search, or RAG use case.
3
1
OpenAI Vector Store Integration
jiri.spilka/openai-vector-store-integration
The Apify OpenAI Vector Store integration uploads data from Apify Actors to the OpenAI Vector Store linked to OpenAI Assistant.
95
8
Milvus Integration
apify/milvus-integration
This integration transfers data from Apify Actors to a Milvus/Zilliz database and is a good starting point for a question-answering, search, or RAG use case.
5
1
PGVector Integration
apify/pgvector-integration
This integration transfers data from Apify Actors to a Postgres SQL database (with PGVector extension).
7
1
Vanilla JS Scraper
mstephen190/vanilla-js-scraper
Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.
429
3
BeautifulSoup Scraper
apify/beautifulsoup-scraper
Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.
751
4
Airtable Exporter
jupri/airtable-exporter
💫 Export Dataset to Airtable
93
7
Qdrant Integration
apify/qdrant-integration
Transfer data from Apify Actors to a Qdrant vector database.
21
4