Universal AI GPT Scraper

Pricing

Pay per event

Try for free

Go to Apify Store

Universal AI GPT Scraper

Try for free

Transform any website into structured data with AI-powered extraction. This versatile tool combines advanced web scraping with intelligent content analysis to deliver clean, customized JSON output - perfect for automating data collection from any web source.

Pricing

Pay per event

Rating

5.0

(3)

Developer

Louis Deconinck

Maintained by Community

Actor stats

Bookmarked

143

Total users

Monthly active users

10 months ago

Last modified

Use Cases

Extract product information from e-commerce sites
Gather pricing data from service providers
Collect structured data from blog posts or articles
Extract specific fields from documentation pages
Convert any web content into structured JSON data

Features

🎯 Custom Field Extraction: Define exactly what fields you want to extract
🤖 AI-Powered: Uses advanced language models to understand and extract content
📊 Structured Output: Get clean JSON or CSV data with your specified fields
🔄 Type Support: Specify the data type for each field (string, number, boolean, etc.)
🎛️ Model Selection: Choose from predefined AI models or use your own
🎯 CSS Selector Support: Target specific page elements using CSS selectors
🔒 Secure: Support for secret API keys and proxy configuration

Input Configuration

Required Fields

URLs (array): List of web pages to scrape
Fields (array): Specification of fields to extract, each containing:
- name: Field name in the output
- description: Description to guide the AI, be as specific and descriptive as possible
- type: Data type (string, number, boolean, array, object)

Content Extraction Options

CSS Selector (string, optional): CSS selector to target specific elements on the page. This can greatly reduce the AI cost, by reducing the number of input tokens. It can also have a positive impact on accuracy. If provided, only text from elements matching this selector will be extracted. If not provided, the default content extraction will be used. This is an advanced option, if you are not familiar with CSS selectors, please do not provide one. Inspect the HTML of a page to find the correct CSS selector.

Example CSS selectors:

main: selects elements with tag "main".
#price: selects elements with id "price".
.product-details-container .price, .product-details-container .description: selects elements with class "price" and "description" that are descendants of elements with class "product-details-container".
article.main-story, .article-body > p: selects elements with tag "article" and class "main-story", as well as direct child "p" elements under elements with class "article-body".
.documentation-content h2, .documentation-content .method-signature: selects "h2" elements and elements with class "method-signature" that are descendants of elements with class "documentation-content".
.post-container[data-type="user-post"] .content: selects elements with class "content" that are descendants of elements with both class "post-container" and data-type attribute "user-post".
#product-listing div.item:not(.ad) .details h3, .price-info span.current-price: selects "h3" elements under elements with class "details" that are descendants of "div" elements with class "item" but not class "ad" under element with ID "product-listing", as well as "span" elements with class "current-price" under elements with class "price-info".

AI Model Configuration

You can either use one of our predfined models which we verified that work well. Or you could specify your own model from OpenRouter. If you use a predefined model, you don't have to bring your own API key we will cover the AI cost and you will be charged for it through Apify usage. If you bring your own OpenRouter API key you will not be charged for the AI cost. Your API key is stored securly and encrypted with Apify.

After some testing we found Google Gemini Flash 2.0 to give the best quality for the lowest price.

Free Apify users can only process 1 URL every 24 hours using predefined models to test out this functionality. If you are a free user you will have to upgrade your Apify account to a paying subscription tier to use predefined models or bring your own OpenRouter API key.

Option 1: Predefined Models

Predefined Model (string): Choose from supported models:
- Google Gemini Flash 1.5
- Google Gemini Flash 2.0 (recommended)
- OpenAI GPT-4o-mini
- Google Gemini Pro 1.5
- OpenAI GPT-4o

Option 2: Custom Model

Use Custom Model (boolean): Toggle to use your own model
Custom Model Name (string): OpenRouter model identifier e.g. google/gemini-2.0-flash-001
OpenRouter API Key (string): Your API key for custom model access (is stored encrypted)

Make sure your model supports structured outputs. Check model compatibility at: https://openrouter.ai/models?supported_parameters=structured_outputs

Proxies

Proxy Configuration (object): Configure proxy settings for web scraping

Example input

{
    "urls": [
        "https://apify.com/clockworks/free-tiktok-scraper"
    ],
    "fields": [
        {
            "name": "name",
            "description": "The name/title of the scraper tool",
            "type": "string"
        },
        {
            "name": "price",
            "description": "The price per 1000 results, only the number",
            "type": "number"
        },
        {
            "name": "author",
            "description": "The author or maintainer of the scraper",
            "type": "string"
        }
    ],
    "cssSelector": "main > article",
    "useCustomModel": false,
    "predefinedModel": "google/gemini-2.0-flash-001",
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
}

Output

The actor outputs a dataset where each item contains:

url: The source URL
Custom fields as specified in your input configuration

Example output:

{
    "url": "https://apify.com/clockworks/free-tiktok-scraper",
    "author": "Clockworks",
    "name": "TikTok Data Extractor",
    "price": 4
}

Cost

There are 3 costs to using this model: startup cost, cost per result and AI cost. We split it up like this to make our pricing as competitive as possible.

There's a one time charge of $0.05 (5 cents) every time you start an actor run. This cost is to cover server startup times.
Every result pushed to the dataset (= every input URL) is charged at $0.001 (1/10th of a cent).
If you use a predefined model you will be charged for every 1,000 tokens depending on the AI model used. If you bring your own API key you will not be charged this.
- Google Gemini Flash 1.5: $0.0006 / 1,000 tokens (6/100th of a cent)
- Google Gemini Flash 2.0: $0.0008 / 1,000 tokens (8/100th of a cent, best value)
- Open AI GPT-4o mini: $0.0012 / 1,000 tokens (12/100th of a cent)
- Google Gemini Pro 1.5: $0.02 / 1,000 tokens (2 cents)
- Open AI GPT4o: $0.04 / 1,000 tokens (4 cents)

You can check how many tokens are in a given text by using the Open AI Tokenizer: https://platform.openai.com/tokenizer. Generally speaking 1 token = 1 word.

Limitations

The AI models require clear, well-structured content for best results
Some models may have token limits affecting the amount of text they can process
Custom models must support structured output format
Rate limits may apply based on the chosen AI provider

Cost of Usage

When using predefined models, AI costs are covered
Custom model usage requires your own OpenRouter API key and credits
Standard Apify platform charges apply (proxy usage if enabled)

Tips for Best Results

Be specific in your field descriptions
Choose appropriate data types for each field
Test with a small number of URLs first
Use the model that best fits your needs (faster models for simple extraction, more powerful models for complex tasks)
Consider using proxies when scraping at scale
Use CSS selectors when you know exactly which elements contain the relevant information
Test your CSS selectors first in browser DevTools to ensure they match the desired elements

Technical Details

Built with TypeScript
Uses Crawlee for web scraping
Integrates with OpenRouter for AI processing
Supports structured output with JSON schema validation
Includes automatic error handling and retries
Supports both default content extraction and CSS selector-based extraction

GPT Scraper

drobnikj/gpt-scraper

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Jakub Drobník

6.3K

4.0

🤖 Any Website URL to Article Summarizer

easyapi/any-website-url-to-article-summarizer

Transform any article, blog post, or web content into concise, AI-powered summaries. Get key insights and main points instantly with smart text analysis and markdown formatting. Perfect for researchers, content creators, and busy professionals who need quick, accurate content digests.

EasyApi

5.0

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

Raizen Technology

269

1.0

GPT Search

tri_angle/gpt-search

Send queries to ChatGPT and retrieve structured answers with full source citations. Easily integrate into your tools or workflows for flexible, scalable AI-powered solutions.

Tri⟁angle

126

Ai 32

onescales/ai-32

Ask AI (chatgpt) anything about a URL, webpage or content. Run it in bulk or for single URL’s. Used for research, content writing, ideas, analysis, reports, translations and more. Currently supports OpenAI Chatgpt models GPT-5, GPT-5-mini and GPT-4o. (Types supported are html, pdf)

One Scales

5.0

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠

EasyApi

199

5.0

Extended GPT Scraper

drobnikj/extended-gpt-scraper

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Jakub Drobník

1.6K

4.8

LLMScraper

ohlava/LLMScraper

Find best scraper for your website and data you need.

Ondřej Hlava

Gemini AI Scraper

jupri/google-bard

Interact with Gemini AI formerly (Google Bard) and save conversation to dataset

cat

ChatGPT Conversation Scraper

straightforward_understanding/chatgpt-conversation-scraper

Extract complete conversations from ChatGPT shared links with smart Pay-Per-Event pricing. Get full dialogues, code blocks, and metadata - perfect for training datasets, conversation analysis, and knowledge management.