Pricing

$8.00 / 1,000 results

Go to Apify Store

AI Web Scraper - Powered by Crawl4AI

Try for free

Developed by

Raizen Technology

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

1.0 (1)

Pricing

$8.00 / 1,000 results

Last modified

4 months ago

Agents

Automation

You can access the AI Web Scraper - Powered by Crawl4AI programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

1import { ApifyClient } from 'apify-client';
2
3// Initialize the ApifyClient with your Apify API token
4// Replace the '<YOUR_API_TOKEN>' with your token
5const client = new ApifyClient({
6    token: '<YOUR_API_TOKEN>',
7});
8
9// Prepare Actor input
10const input = {
11    "startUrls": [
12        {
13            "url": "https://www.cnbc.com/2025/03/12/googles-deepmind-says-it-will-use-ai-models-to-power-physical-robots.html"
14        }
15    ],
16    "browserConfig": {
17        "browser_type": "chromium",
18        "headless": true,
19        "verbose_logging": false,
20        "ignore_https_errors": true,
21        "user_agent": "random",
22        "proxy": "",
23        "viewport_width": 1280,
24        "viewport_height": 720,
25        "accept_downloads": false,
26        "extra_headers": {}
27    },
28    "crawlerConfig": {
29        "cache_mode": "BYPASS",
30        "page_timeout": 20000,
31        "simulate_user": true,
32        "override_navigator": true,
33        "magic": true,
34        "remove_overlay_elements": true,
35        "delay_before_return_html": 0.75,
36        "wait_for": "",
37        "screenshot": false,
38        "pdf": false,
39        "enable_rate_limiting": false,
40        "memory_threshold_percent": 90,
41        "word_count_threshold": 200,
42        "css_selector": "",
43        "excluded_tags": [],
44        "excluded_selector": "",
45        "only_text": false,
46        "prettify": false,
47        "keep_data_attributes": false,
48        "remove_forms": false,
49        "bypass_cache": false,
50        "disable_cache": false,
51        "no_cache_read": false,
52        "no_cache_write": false,
53        "wait_until": "domcontentloaded",
54        "wait_for_images": false,
55        "check_robots_txt": false,
56        "mean_delay": 0.1,
57        "max_range": 0.3,
58        "js_code": "",
59        "js_only": false,
60        "ignore_body_visibility": true,
61        "scan_full_page": false,
62        "scroll_delay": 0.2,
63        "process_iframes": false,
64        "adjust_viewport_to_content": false,
65        "screenshot_wait_for": 0,
66        "screenshot_height_threshold": 20000,
67        "image_description_min_word_threshold": 50,
68        "image_score_threshold": 3,
69        "exclude_external_images": false,
70        "exclude_social_media_domains": [],
71        "exclude_external_links": false,
72        "exclude_social_media_links": false,
73        "exclude_domains": [],
74        "verbose": true,
75        "log_console": false,
76        "stream": false
77    },
78    "deepCrawlConfig": {
79        "max_pages": 100,
80        "max_depth": 3,
81        "include_external": false,
82        "score_threshold": 0.5,
83        "filter_chain": [],
84        "keywords": [
85            "crawl",
86            "example",
87            "async",
88            "configuration"
89        ],
90        "weight": 0.7
91    },
92    "markdownConfig": {
93        "ignore_links": false,
94        "ignore_images": false,
95        "escape_html": true,
96        "skip_internal_links": false,
97        "include_sup_sub": false,
98        "citations": false,
99        "body_width": 80,
100        "fit_markdown": false
101    },
102    "contentFilterConfig": {
103        "type": "pruning",
104        "user_query": "",
105        "threshold": 0.45,
106        "min_word_threshold": 5,
107        "bm25_threshold": 1.2,
108        "apply_llm_filter": false,
109        "semantic_filter": "",
110        "word_count_threshold": 10,
111        "sim_threshold": 0.3,
112        "max_dist": 0.2,
113        "top_k": 3,
114        "linkage_method": "ward"
115    },
116    "userAgentConfig": {
117        "user_agent_mode": "random",
118        "device_type": "desktop",
119        "browser_type": "chrome",
120        "num_browsers": 1
121    },
122    "llmConfig": {
123        "provider": "groq/deepseek-r1-distill-llama-70b",
124        "api_token": "",
125        "instruction": "Summarize content in clean markdown.",
126        "base_url": "",
127        "chunk_token_threshold": 2048,
128        "apply_chunking": true,
129        "input_format": "markdown",
130        "temperature": 0.7,
131        "max_tokens": 4096
132    },
133    "extractionSchema": {
134        "name": "Custom Extraction",
135        "baseSelector": "div.article",
136        "fields": [
137            {
138                "name": "title",
139                "selector": "h1",
140                "type": "text"
141            },
142            {
143                "name": "link",
144                "selector": "a",
145                "type": "attribute",
146                "attribute": "href"
147            }
148        ]
149    }
150};
151
152// Run the Actor and wait for it to finish
153const run = await client.actor("raizen/ai-web-scraper").call(input);
154
155// Fetch and print Actor results from the run's dataset (if any)
156console.log('Results from dataset');
157console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
158const { items } = await client.dataset(run.defaultDatasetId).listItems();
159items.forEach((item) => {
160    console.dir(item);
161});
162
163// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

AI Web Scraper - Crawl4AI for LLMs, AI Agents & Automation API in JavaScript

The Apify API client for JavaScript is the official library that allows you to use AI Web Scraper - Powered by Crawl4AI API in JavaScript or TypeScript, providing convenience functions and automatic retries on errors.

Install the apify-client

$npm install apify-client

Other API clients include:

AI Web Scraper - Powered by Crawl4AI API in Python

AI Web Scraper - Powered by Crawl4AI API through CLI

AI Web Scraper - Powered by Crawl4AI OpenAPI definition

AI Web Scraper - Powered by Crawl4AI API

Universal AI GPT Scraper

louisdeconinck/ai-gpt-scraper

Transform any website into structured data with AI-powered extraction. This versatile tool combines advanced web scraping with intelligent content analysis to deliver clean, customized JSON output - perfect for automating data collection from any web source.

Louis Deconinck

104

5.0

Smartcontext AI Web Crawler

bluelightco/smartcontext-ai-crawler

Scrape any website and extract structured data using AI-powered instructions. Provide URLs and a natural language prompt to get tailored JSON outputs.

Bluelight

5.0

RAG Web Browser

apify/rag-web-browser

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports Model Context Protocol (MCP).

Apify

5.3K

4.4

Smart Scrape AI

llayaa112/smart-scrape-ai

Smart Scrape AI is an autonomous web automation and scraping actor powered by Playwright and AI. It dynamically interprets prompts, navigates websites, performs tasks, extracts data, and provides intelligent answers. Ideal for zero-code, prompt-driven data extraction and interaction workflows.

laya albshlawy

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠

EasyApi

103

5.0

AI-Powered Web Content & Link Extractor

scrapercoder/ai-powered-web-content-link-extractor

Crawls websites to extract clean, structured content for AI/LLM use, ideal for training datasets, knowledge bases, and RAG systems. Json output includes: * text: Normalized page content * links: Extracted sub-URLs

wallnut.ai

113

Ai Web Scraper - Extract Data With Ease

eloquent_mountain/ai-web-scraper-extract-data-with-ease

Ai Web Scraper enables scraping for everyone, including non-techies! It uses Google's Gemini LLM to scrape websites with natural language commands. It dynamically extracts data, no selector input needed, handles dynamic content and cookie consent, avoids bot detection, outputs JSON or other formats.

Paco

519

2.0

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

667

4.6

Ai Web Scraper - Natural language and Vision scraper

eloquent_mountain/ai-universal-web-scraper-natural-language

Powerful AI Web Scraper using Google's Gemini Vision. Specify data extraction in natural language. Supports infinite scroll, above-the-fold analysis, automatic cookie consent, pay-per-event pricing, and screenshot storage for debugging.

Paco

162

AI Vision Scraper

zscrape/ai-vision-scraper

AI Vision Scraper automates web tasks, navigating sites, solving CAPTCHAs, and extracting data on demand using a single prompt. From competitor tracking to form submissions, it streamlines workflows and automation across industries like e-commerce, sales, and recruiting.