AI Web Scraper avatar

AI Web Scraper

Deprecated
Go to Store
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
AI Web Scraper

AI Web Scraper

vulnv/ai-web-scraper

Scrape structured data effortlessly - just describe what you need in plain language, and get precise results tailored to your request. Simplify data extraction with a tool designed for ease and accuracy, no coding required.

Effortlessly extract structured data from web pages by simply describing what you need. This scraper is designed for precision and ease of use, allowing you to customize your data extraction with natural language prompts.

Features

  • Start URLs: Specify the URLs to begin your scraping.
  • Natural Language Prompts: Define the desired output by describing it in plain language.
  • Custom Depth: Configure the scraping depth to suit your needs.
  • Flexible AI Model Selection: Choose from multiple models for AI processing.
  • Secure API Key Integration: Seamlessly connect using your API key.

Configuration

Input Schema

The actor accepts the following input parameters:

FieldTypeDescriptionDefault Value
start_urlsArrayList of URLs to start scraping from.[{"url": "https://apify.com"}]
promptStringNatural language description of the desired scraping output."List me all the features with their description."
api_keyStringOpenAI API key. This field is required for AI processing.(Required)
modelStringThe model to use for AI processing. Options: gpt-4o, gpt-4o-mini, o1.gpt-4o-mini
max_depthIntegerThe maximum depth for recursive scraping.0
initial_cookiesStringCookies that will be pre-set to all pages the scraper opens.[]
proxy_configurationObjectSelect proxies to be used by your crawler.{ "useApifyProxy": false }
save_html_to_key_value_storeBooleanIf enabled, stores full transformed HTML of all pages found to the default key-value store.false
save_markdown_to_key_value_storeBooleanIf enabled, converts the transformed HTML of all pages found to Markdown, and stores it under the markdown field in the output dataset.false

Example Input

1{
2    "start_urls": [
3        { "url": "https://apify.com" }
4    ],
5    "prompt": "List me all the features with their description.",
6    "api_key": "your-openai-api-key",
7    "model": "gpt-4o-mini",
8    "max_depth": 0
9}

How to Use

  1. Set the start URLs to specify the pages you want to scrape.
  2. Write a prompt to describe your desired output.
  3. Add your OpenAI API key for natural language processing.
  4. Choose a model for AI handling, if needed.
  5. Set the maximum depth to control recursive scraping.
  6. Run the actor and get structured results based on your input!

Output

The actor outputs structured data in JSON format, tailored to your provided prompt.


🚀 Looking for a hassle-free web scraping solution? 🚀 This next-generation actor, available on the Apify platform requires no API keys or configurations, simplifies web scraping like never before. Explore advanced features like CAPTCHA bypass, JavaScript rendering, and more. 🌐 Learn more and start using it at AI Web Scraper. 🔍 💡 🌐


Explore More Actors

Looking for additional solutions? Check out more actors on Apify that can help with your web automation and data extraction needs. Discover a wide range of tools tailored for different scenarios at 🌐 Explore Vulnv's Actors on Apify.

📧 For inquiries or support, feel free to reach out to us at apify@vulnv.com.

Developer
Maintained by Community