AI web scrapers
Actors to extract data from websites using AI, generate data to train AI models, feed LLM applications, RAG, or GPTs. With π¦οΈπ LangChain support.
612 Actors
Website Content Crawler
apify/website-content-crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with π¦π LangChain, LlamaIndex, and the wider LLM ecosystem.
68K
4.3
Google Search Results Scraper
apify/google-search-scraper
Scrape Google Search Engine Results Pages (SERPs). Select the country or language and extract organic and paid results, AI overviews, ads, queries, People Also Ask, prices, reviews, like a Google SERP API. Export scraped data, run the scraper via API, schedule runs, or integrate with other tools.
66K
4.3
Reddit Scraper Lite
trudax/reddit-scraper-lite
Pay Per Result, unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.
8.5K
3.9
RAG Web Browser
apify/rag-web-browser
Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports Model Context Protocol (MCP).
5.2K
4.4
SAFER FMCSA DOT Crawler
jungle_synthesizer/fmcsa-dot-crawler
Crawl the SAFER DOT.GOV database for publicly registered vehicles. Supports Address, phone, email, DUNS and other registration details. Perfect for Lead Generation!
115
5.0
Linkedin Posts Search Scraper | No Cookies
apimaestro/linkedin-posts-search-scraper-no-cookies
Scrape LinkedIn posts by keyword without login. Get post content, reactions, author details, and media. Sort by relevance or date. Perfect for research, analysis, and monitoring trends.
1.8K
3.8