GPT Scraper

Pay $9.00 for 1,000 pages

GPT Scraper

GPT Scraper

drobnikj/gpt-scraper

Pay $9.00 for 1,000 pages

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Start URLs

startUrls
array
Required

A static list of URLs to scrape.

For details, see Start URLs in README.

Glob patterns

globs
array
Optional

Glob patterns to match links in the page that you want to enqueue. Combine with Link selector to tell the scraper where to find links. Omitting the Glob patterns will cause the scraper to enqueue all links matched by the Link selector.

Default value of this property is

[]
linkSelector
string
Optional

A CSS selector saying which links on the page (<a> elements with href attribute) shall be followed and added to the request queue. To filter the links added to the queue, use the Pseudo-URLs setting.

If Link selector is empty, the page links are ignored.

For details, see Link selector in README.

Instructions for GPT

instructions
string
Required

Instruct GPT how to generate text. For example: "Summarize this page into three sentences."

You can instruct to OpenAI to answer with "skip this page", which will skip the page. For example: "Summarize this page into three sentences. If the page is about Apify Proxy answer with 'skip this page'.".

Content selector

targetSelector
string
Optional

A CSS selector of HTML element on the page will be used in instruction. Istead of whole page you can use only part of the page. For example: "div#content".

Max crawling depth

maxCrawlingDepth
integer
Optional

Specifies how many links away from Start URLs the scraper will descend. This value is a safeguard against infinite crawling depths for misconfigured scrapers.

If set to 0, there is no limit.

Default value of this property is

0

Max pages per run

maxPagesPerCrawl
integer
Optional

Maximum number of pages that the scraper will open. 0 means unlimited.

Default value of this property is

10

Use JSON schema to format answer

useStructureOutput
boolean
Optional

If true, the answer will be transformed into a structured format based on the schema in the jsonAnswer attribute.

Schema

schema
object
Optional

This defines how the output will be stored in structured format using [JSON SchemaJSON Schema. Keep in mind that it uses function, so by setting the description of the fields and the correct title, you can get better results.

Proxy configuration

proxyConfiguration
object
Optional

Specifies proxy servers that will be used by the scraper in order to hide its origin.

For details, see Proxy configuration in README.

Default value of this property is

{"useApifyProxy":false}
Developer
Maintained by Apify
Actor stats
  • 2.7k users
  • 132.5k runs
  • Modified about 23 hours ago

You might also like these Actors