Cheerio Scraper
apify/cheerio-scraper
Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.
5.7k
84
Web Scraper
apify/web-scraper
Crawls arbitrary websites using the Chrome browser and extracts data from pages using JavaScript code. The Actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.
72k
233
Puppeteer Scraper
apify/puppeteer-scraper
Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.
4.7k
67
Legacy PhantomJS Crawler
apify/legacy-phantomjs-crawler
Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.
1.6k
21
Playwright Scraper
apify/playwright-scraper
Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.
872
17
SuperScraper API
apify/super-scraper-api
Generic REST API for scraping websites: send a URL and get back HTML. This Actor is a drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!
491
27
Pinecone Integration
apify/pinecone-integration
This integration transfers data from Apify Actors to a Pinecone and is a good starting point for a question-answering, search, or RAG use case.
112
18
Airtable Exporter
jupri/airtable-exporter
💫 Export Dataset to Airtable
88
4
Vanilla JS Scraper
mstephen190/vanilla-js-scraper
Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.
424
3
Qdrant Integration
apify/qdrant-integration
Transfer data from Apify Actors to a Qdrant vector database.
16
3
Actor Readme Generator
apify/actor-readme-generator
Generates READMEs scrapers using ChatGPT, based on an Apify-approved template.
14
4
JSDOM Scraper
apify/jsdom-scraper
Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Scraper and the browser scrapers.
77
4
Forward Dataset to Actor or Task
valek.josef/forward-dataset-to-actor-or-task
Forwards contents of specified dataset to a specified field on the input of another Actor or task.
4
4
OpenAI Vector Store Integration
jiri.spilka/openai-vector-store-integration
The Apify OpenAI Vector Store integration uploads data from Apify Actors to the OpenAI Vector Store linked to OpenAI Assistant.
84
7
Milvus Integration
apify/milvus-integration
This integration transfers data from Apify Actors to a Milvus/Zilliz database and is a good starting point for a question-answering, search, or RAG use case.
5
1
BeautifulSoup Scraper
apify/beautifulsoup-scraper
Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.
733
4
OpenSearch Integration
apify/opensearch-integration
Transfer data from Apify Actors to Amazon OpenSearch Service. This Actor is a good starting point for building question-answering systems, search functionality, or Retrieval-Augmented Generation (RAG) use cases.
3
1
Chroma Integration
apify/chroma-integration
This integration transfers data from Apify Actors to a Chroma and is a good starting point for a question-answering, search, or RAG use case.
1
0
PGVector Integration
apify/pgvector-integration
This integration transfers data from Apify Actors to a Postgres SQL database (with PGVector extension).
6
1
Weaviate Integration
apify/weaviate-integration
This integration transfers data from Apify Actors to a Weaviate and is a good starting point for a question-answering, search, or RAG use case.
3
1