AI Web Scraper avatar

AI Web Scraper

Under maintenance
Try for free

2 hours trial then $15.00/month - No credit card required now

Go to Store
This Actor is under maintenance.

This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?

See alternative Actors
AI Web Scraper

AI Web Scraper

vulnv/ai-web-scraper
Try for free

2 hours trial then $15.00/month - No credit card required now

Scrape structured data effortlessly - just describe what you need in plain language, and get precise results tailored to your request. Simplify data extraction with a tool designed for ease and accuracy, no coding required.

Start URLs

start_urlsarrayRequired

URLs to start with

Prompt

promptstringRequired

Describe the desired output of the scraper, for example 'Find all articles and their authors'.

Default value of this property is "List me all the features with their description."

OpenAI API Key

api_keystringRequired

API key for OpenAI

Model

modelEnumRequired

The model to use for the OpenAI API

Value options:

"gpt-4o": string"gpt-4o-mini": string"o1": string

Default value of this property is "gpt-4o-mini"

Crawler type

crawler_typeEnumOptional

Type of the crawler to use

Value options:

"playwright": string"http": string

Default value of this property is "http"

Maximum depth

max_depthintegerRequired

Depth to which to scrape to

Default value of this property is 0

Initial cookies

initial_cookiesstringOptional

Cookies that will be pre-set to all pages the scraper opens. This is useful for pages that require login. The value is expected to be a JSON array of objects with name and value properties. For example: [{"name": "cookieName", "value": "cookieValue"}].

You can use the EditThisCookie browser extension to copy browser cookies in this format, and paste it here.

Default value of this property is "[]"

Proxy configuration

proxy_configurationobjectOptional

Select proxies to be used by your crawler.

Save HTML to Key-Value store

save_html_to_key_value_storebooleanOptional

If enabled, the crawler stores full transformed HTML of all pages found to the default key-value store and saves links to the files as htmlUrl field in the output dataset. Storing HTML in key-value store is preferred to storing it into the dataset with the saveHtml option, because there's no size limit and it's easier for debugging as you can easily view the HTML.

Default value of this property is false

Save Markdown

save_markdown_to_key_value_storebooleanOptional

If enabled, the crawler converts the transformed HTML of all pages found to Markdown, and stores it under the markdown field in the output dataset.

Default value of this property is false

Developer
Maintained by Community

Actor Metrics

  • 2 monthly users

  • 1 star

  • 25% runs succeeded

  • Created in Dec 2024

  • Modified 2 days ago