Pricing

$9.00 / 1,000 pages

Try for free

Go to Store

GPT Scraper

Try for free

Developed by

Jakub Drobník

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

4.0 (7)

Pricing

$9.00 / 1,000 pages

Issues response

12 days

Last modified

7 months ago

Lead generation

This changelog tracks updates to both GTP Scraper and Extended GPT Scraper actors.

2024-12-30

Fixes

Fixed extraction of multiple URLs with disabled saveSnapshots option.

2024-11-17

Features

Improved GPT call handling, which should parallelize the calls together with the crawling better.
Added error results to output, which will contain the failed website URL to help with debugging and error handling.

2024-10-07

Fixes

Fixed initial cookies not being set correctly from input.

2024-09-22

Fixes

Fixed a bug where HTML minimization was failing on some specific websites.

2024-08-12

Features

Added support for GPT-4o-mini model. (Extended GPT scraper)
Set this model as the default one for the the Pay Per Result scraper with a set token limit.
- With this, the maximum token limit for the Pay Per Result scraper was increased by 150%.
Ignore HTTPS errors, which will allow the scraper to work on broken websites with invalid certificates.

Fixes

Fixed concurrency scaling issues that were causing the Actor to fail due to scaling too quickly.

2024-05-20

Features

Added support for GPT-4o model. (Extended GPT Scraper only)

2024-05-01

Fixes

Fixed Actor resurrection bug that caused the Actor to not process GPT after being resurrected.

2024-03-05

Features

Added option to wait for a specific time and let the page load before scraping, useful for dynamic pages. (dynamicContentWaitSecs)
Added option to remove link URLs while keeping their displayed text. Helps to reduce the amount of content sent to GPT. (removeLinkUrls)

2024-03-05

Features

Added separate schema description field (input schemaDescription). By default the value is taken from instructions input.
Refactored and improved the Actor's input schema to be more user-friendly.

Fixes

Properly handle OpenAI's schema description too long error.

2024-01-31

Fixes

Eliminated the bug, when on some sites that contain erronous javascript the scraper would fail

2024-01-26

Fixes

Fixed "max pages per run" not working correctly on specific websites.

2024-01-21

Fixes

Fixed a bug where the Actor would fail on "repetitive patterns in prompt" error from OpenAI. The Actor will now gracefully skip GPT processing for the webpages that trigger the error.

2024-01-10

Features

Added excludeUrlGlobs and renamed globs to includeUrlGlobs, the old globs input will still work the same.
Added initialCookies to be able to extract data behind login.
Added removeElementsCssSelector to enable custom HTML cleanup before sending to models.
Added support for GPT-4 Turbo model. (Extended GPT Scraper only)
Added skipGptGlobs to enable not using GPT on some pages that should only be used for finding further links. (Extended GPT Scraper only)

Fixes

Always return answer in the output for consistency. It was previously sometimes missing if jsonAnswer was available.
Improve error handling for errors coming from OpenAI. The actor will now fail if user doesn't have access to a model

2023-12-21

Features

Add GPT model settings temperature, topP, frequencyPenalty and presencePenalty to input
Allow using HTML as a prompt to models. By default, the actor still converts HTML to markdown before sending it to models.
Store HTML, screenshot and content (as it was sent to GPT) as links to the output. This is enabled by default but can be turned off.

Fixes

Fail an actor run if GPT doesn't accept formatted output schema defined by the user in input. This can happen because OpenAI doesn't fully follow JSON Schema specification but the problem happens very rarely.

Changes

Use LangChain to connect to GPT models. This means some error messages are different.
The default model temperature is now set to 0 instead of 1. This should improve the reliability of scraping. While this is technically a breaking change, it should mostly behave as an improvement so we don't consider need to release a separate version.

Extended GPT Scraper

drobnikj/extended-gpt-scraper

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Jakub Drobník

1.5K

4.1

GPT Browser

anchor/gpt-browser

A GPT browser to use OpenAI prompt on any website. Put a list of URLs and a prompt, then the GPT agent will give you the answer you need. Fast, easy, and not limited with OpenAI ChatGPT restrictions. The best way to search and use GPT on large number of websites. Upload Excel or CSV. Screenshots 📸

Anchor

GPT Search

tri_angle/gpt-search

Send queries to ChatGPT and retrieve structured answers with full source citations. Easily integrate into your tools or workflows for flexible, scalable AI-powered solutions.

Tri⟁angle

Universal AI GPT Scraper

louisdeconinck/ai-gpt-scraper

Transform any website into structured data with AI-powered extraction. This versatile tool combines advanced web scraping with intelligent content analysis to deliver clean, customized JSON output - perfect for automating data collection from any web source.

Louis Deconinck

101

5.0

🔍 GPT Search [Private API]

openapi/gpt-search-private-api

Use OpenAI's GPT4o Search mode via API! No cookie or proxy is required. Fast, cheap and reliable.

Open API

5.0

Auto GPT

lukaskrivka/auto-gpt

Run Auto GPT sessions directly on Apify. No OpenAI account or API token is required! Store parsed thoughts into datasets for later analysis.

Lukáš Křivka

199

ChatGPT

pertosh/chatgpt

You can use this Actor to transform scraped results, such as reviews from restaurants, by rephrasing the sentences. Additionally, translation is also supported. You can also use it to generate new website descriptions, keywords, and other similar metadata.

Alper

148

OpenAI Vector Store Integration

jiri.spilka/openai-vector-store-integration

The Apify OpenAI Vector Store integration uploads data from Apify Actors to the OpenAI Vector Store linked to OpenAI Assistant.

Jiří Spilka

183

4.8

🔥fireSummarize AI Summarize any Website Content

mohamedgb00714/fireScraper-AI-sammarize-Website-Content

fireSummarize is an AI-powered tool that scrapes any website using Crawlee and Puppeteer, extracts and converts content into Markdown, and then summarizes it using a custom prompt — perfect for generating clean, structured insights from any webpage.

mohamed el hadi msaid

5.0

RAG Web Browser

apify/rag-web-browser

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports Model Context Protocol (MCP).