Pricing

$8.00 / 1,000 results

AI Web Scraper - Powered by Crawl4AI

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

Pricing

$8.00 / 1,000 results

Rating

1.0

(1)

Developer

Raizen Technology

Actor stats

Bookmarked

354

Total users

Monthly active users

5 months ago

Last modified

Categories

Agents

Automation

You can access the AI Web Scraper - Powered by Crawl4AI programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

{
  "openapi": "3.0.1",
  "info": {
    "version": "0.0",
    "x-build-id": "yLnHaI297EvKjolLE"
  },
  "servers": [
    {
      "url": "https://api.apify.com/v2"
    }
  ],
  "paths": {
    "/acts/raizen~ai-web-scraper/run-sync-get-dataset-items": {
      "post": {
        "operationId": "run-sync-get-dataset-items-raizen-ai-web-scraper",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK"
          }
        }
      }
    },
    "/acts/raizen~ai-web-scraper/runs": {
      "post": {
        "operationId": "runs-sync-raizen-ai-web-scraper",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor and returns information about the initiated run in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/runsResponseSchema"
                }
              }
            }
          }
        }
      }
    },
    "/acts/raizen~ai-web-scraper/run-sync": {
      "post": {
        "operationId": "run-sync-raizen-ai-web-scraper",
        "x-openai-isConsequential": false,
        "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
        "tags": [
          "Run Actor"
        ],
        "requestBody": {
          "required": true,
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/inputSchema"
              }
            }
          }
        },
        "parameters": [
          {
            "name": "token",
            "in": "query",
            "required": true,
            "schema": {
              "type": "string"
            },
            "description": "Enter your Apify token here"
          }
        ],
        "responses": {
          "200": {
            "description": "OK"
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "inputSchema": {
        "type": "object",
        "required": [
          "startUrls"
        ],
        "properties": {
          "startUrls": {
            "title": "URLs to Scrape",
            "type": "array",
            "description": "List of webpages to scrape.",
            "items": {
              "type": "object",
              "required": [
                "url"
              ],
              "properties": {
                "url": {
                  "type": "string",
                  "title": "URL of a web page",
                  "format": "uri"
                }
              }
            }
          },
          "extractionStrategy": {
            "title": "Extraction Strategy",
            "enum": [
              "SimpleExtractionStrategy",
              "LLMExtractionStrategy",
              "JsonCssExtractionStrategy",
              "JsonXPathExtractionStrategy"
            ],
            "type": "string",
            "description": "Select how content is extracted.",
            "default": "SimpleExtractionStrategy"
          },
          "crawlStrategy": {
            "title": "Crawl Strategy",
            "enum": [
              "SimpleCrawlStrategy",
              "BFSDeepCrawlStrategy",
              "DFSDeepCrawlStrategy",
              "BestFirstCrawlingStrategy"
            ],
            "type": "string",
            "description": "Select how pages are crawled.",
            "default": "SimpleCrawlStrategy"
          },
          "browserConfig": {
            "title": "Browser Configuration",
            "type": "object",
            "description": "Browser settings as JSON object."
          },
          "crawlerConfig": {
            "title": "Crawler Configuration",
            "type": "object",
            "description": "Crawler settings as JSON object."
          },
          "deepCrawlConfig": {
            "title": "Deep Crawl Configuration",
            "type": "object",
            "description": "Settings for deep crawling when using BFS, DFS, or Best-First Strategies."
          },
          "markdownConfig": {
            "title": "Markdown Generator Configuration",
            "type": "object",
            "description": "Markdown settings as JSON object."
          },
          "contentFilterConfig": {
            "title": "Content Filter Configuration",
            "type": "object",
            "description": "Content filter settings as JSON object."
          },
          "userAgentConfig": {
            "title": "User Agent Configuration",
            "type": "object",
            "description": "User agent settings for browser requests."
          },
          "llmConfig": {
            "title": "LLM Configuration",
            "type": "object",
            "description": "Configure LLM usage for content extraction."
          },
          "extractionSchema": {
            "title": "Extraction Schema",
            "type": "object",
            "description": "Define custom extraction rules when using JsonCssExtractionStrategy or JsonXPathExtractionStrategy."
          },
          "session_id": {
            "title": "Session ID",
            "type": "string",
            "description": "Use a session ID to persist browser state across multiple requests.",
            "default": ""
          }
        }
      },
      "runsResponseSchema": {
        "type": "object",
        "properties": {
          "data": {
            "type": "object",
            "properties": {
              "id": {
                "type": "string"
              },
              "actId": {
                "type": "string"
              },
              "userId": {
                "type": "string"
              },
              "startedAt": {
                "type": "string",
                "format": "date-time",
                "example": "2025-01-08T00:00:00.000Z"
              },
              "finishedAt": {
                "type": "string",
                "format": "date-time",
                "example": "2025-01-08T00:00:00.000Z"
              },
              "status": {
                "type": "string",
                "example": "READY"
              },
              "meta": {
                "type": "object",
                "properties": {
                  "origin": {
                    "type": "string",
                    "example": "API"
                  },
                  "userAgent": {
                    "type": "string"
                  }
                }
              },
              "stats": {
                "type": "object",
                "properties": {
                  "inputBodyLen": {
                    "type": "integer",
                    "example": 2000
                  },
                  "rebootCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "restartCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "resurrectCount": {
                    "type": "integer",
                    "example": 0
                  },
                  "computeUnits": {
                    "type": "integer",
                    "example": 0
                  }
                }
              },
              "options": {
                "type": "object",
                "properties": {
                  "build": {
                    "type": "string",
                    "example": "latest"
                  },
                  "timeoutSecs": {
                    "type": "integer",
                    "example": 300
                  },
                  "memoryMbytes": {
                    "type": "integer",
                    "example": 1024
                  },
                  "diskMbytes": {
                    "type": "integer",
                    "example": 2048
                  }
                }
              },
              "buildId": {
                "type": "string"
              },
              "defaultKeyValueStoreId": {
                "type": "string"
              },
              "defaultDatasetId": {
                "type": "string"
              },
              "defaultRequestQueueId": {
                "type": "string"
              },
              "buildNumber": {
                "type": "string",
                "example": "1.0.0"
              },
              "containerUrl": {
                "type": "string"
              },
              "usage": {
                "type": "object",
                "properties": {
                  "ACTOR_COMPUTE_UNITS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_WRITES": {
                    "type": "integer",
                    "example": 1
                  },
                  "KEY_VALUE_STORE_LISTS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_INTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_EXTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_SERPS": {
                    "type": "integer",
                    "example": 0
                  }
                }
              },
              "usageTotalUsd": {
                "type": "number",
                "example": 0.00005
              },
              "usageUsd": {
                "type": "object",
                "properties": {
                  "ACTOR_COMPUTE_UNITS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATASET_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "KEY_VALUE_STORE_WRITES": {
                    "type": "number",
                    "example": 0.00005
                  },
                  "KEY_VALUE_STORE_LISTS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_READS": {
                    "type": "integer",
                    "example": 0
                  },
                  "REQUEST_QUEUE_WRITES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_INTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "DATA_TRANSFER_EXTERNAL_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                    "type": "integer",
                    "example": 0
                  },
                  "PROXY_SERPS": {
                    "type": "integer",
                    "example": 0
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

AI Web Scraper - Crawl4AI for LLMs, AI Agents & Automation OpenAPI definition

OpenAPI is a standard for designing and describing RESTful APIs, allowing developers to define API structure, endpoints, and data formats in a machine-readable way. It simplifies API development, integration, and documentation.

OpenAPI is effective when used with AI agents and GPTs by standardizing how these systems interact with various APIs, for reliable integrations and efficient communication.

By defining machine-readable API specifications, OpenAPI allows AI models like GPTs to understand and use varied data sources, improving accuracy. This accelerates development, reduces errors, and provides context-aware responses, making OpenAPI a core component for AI applications.

You can download the OpenAPI definitions for AI Web Scraper - Powered by Crawl4AI from the options below:

OpenAPI.json

If you’d like to learn more about how OpenAPI powers GPTs, read our blog post.

You can also check out our other API clients:

AI Web Scraper - Powered by Crawl4AI API in Python

AI Web Scraper - Powered by Crawl4AI API in JavaScript

AI Web Scraper - Powered by Crawl4AI API through CLI

AI Web Scraper - Powered by Crawl4AI API

Crawl4ai To Markdown Pro2

juryless_rainbow/crawl4ai-to-markdown-pro2

A high-performance web-to-markdown crawler for AI agents, optimized for LLM data extraction using Crawl4AI. Features stealth browsing and high-fidelity content extraction.

aaron jungs

Crawl4AI

janbuchar/crawl4ai

Wraps the Crawl4AI open-source library for retrieving text content from websites.

Jan Buchar

786

3.3

Web to Markdown Converter: AI-Ready Scraper for RAG & LLMs

raional/web-to-markdown-converter

Convert any webpage into clean Markdown or JSON for AI, RAG, and LLM pipelines. Strips ads, navigation, and cookie banners. Optionally follows links to convert an entire site. Powered by the open-source Crawl4AI library.

Raion Al

Website to Markdown for LLM & RAG — Crawl4AI URL to Clean

bikram07/web-to-markdown-crawl4ai

Convert any URL, sitemap, or whole website into clean, LLM-ready Markdown for RAG, vector databases, and AI agents. Hosted Crawl4AI in a real Chromium browser — renders JavaScript and SPAs, strips boilerplate, and exports JSON/CSV. Callable over MCP from Claude and Cursor.

Bikram

Reddit Scraper - Markdown for AI & n8n

clearpath/reddit-to-llm-api

Extract Reddit posts and comments as LLM-ready Markdown. No API key needed. Direct n8n/Make integration—connect output to AI nodes instantly. 20x faster than browser scrapers. Perfect for lead gen, product validation, and market research workflows.

ClearPath

AI Sentiment Analysis API

alizarin_refrigerator-owner/sentiment-api

Analyze text sentiment with AI. Emotion detection, key phrase extraction. Supports OpenAI GPT-4, Claude. Webhook Integration for Zapier, Make, n8n, or custom webhook.

The Howlers

web-content-extractor

morph_coder/web-content-extractor

HTTP-first web scraper without proxy by default. Returns clean article text, SEO metadata, and headings. Apify Proxy and Playwright used only as fallback. No-code friendly JSON output for Make, n8n, and Zapier

Morph Coder

5.0

AI Web Content Crawler - Markdown for LLMs

intelscrape/ai-web-content-crawler

Crawl any website and extract clean Markdown optimized for LLM training, RAG pipelines, and AI knowledge bases - removes boilerplate and outputs structured JSON with URL, title, markdown, and metadata.