Deprecated

Pricing

$160.00 / 1,000 requests

See alternative Actors

Go to Store

AI Web Scraper [No API Key Needed]

Deprecated

See alternative Actors

Developed by

VulnV

Scrape structured data effortlessly - just describe what you need in plain language, and get precise results tailored to your request. Simplify data extraction with a tool designed for ease and accuracy, no coding required.

0.0 (0)

Pricing

$160.00 / 1,000 requests

Total users

Monthly users

Last modified

6 months ago

Automation

Developer tools

Effortlessly extract structured data from web pages by simply describing what you need. This AI-powered web scraper is designed for precision and ease of use, allowing you to customize your data extraction with natural language prompts. Additionally, it attempts to bypass captchas to ensure uninterrupted scraping. Perfect for developers and data analysts looking to streamline their web scraping tasks.

Features

Start URLs: Specify the URLs to begin your scraping.
Natural Language Prompts: Define the desired output by describing it in plain language.
Custom Depth: Configure the scraping depth to suit your needs.
Captcha Bypass: Attempts to bypass captchas to ensure uninterrupted scraping.
Initial Cookies: Pre-set cookies for all pages the scraper opens.
Save HTML: Option to store the full transformed HTML of all pages.
Save Markdown: Option to convert and store the transformed HTML as Markdown.

Configuration

Input Schema

The actor accepts the following input parameters:

Field	Type	Description	Default Value
start_urls	Array	List of URLs to start scraping from.	`[{"url": "https://apify.com"}]`
prompt	String	Natural language description of the desired scraping output.	`"List me all the features with their description."`
max_depth	Integer	The maximum depth for recursive scraping.	`0`
initial_cookies	String	Cookies that will be pre-set to all pages the scraper opens.	`[]`
save_html_to_key_value_store	Boolean	If enabled, stores full transformed HTML of all pages found to the default key-value store.	`true`
save_markdown_to_key_value_store	Boolean	If enabled, converts the transformed HTML of all pages found to Markdown, and stores it under the markdown field in the output dataset.	`true`

Example Input 1

{
    "start_urls": [
        { "url": "https://simple.wikipedia.org/wiki/List_of_European_countries" }
    ],
    "prompt": "List all the European countries",
    "max_depth": 0
}

Output:

[
    {
        "EuropeanCountries": [
            "Albania",
            "Andorra",
            "Armenia",
            "Austria",
            "Azerbaijan",
            "Belarus",
            "Belgium",
            "Bosnia and Herzegovina",
            "Bulgaria",
            "Croatia",
            "Cyprus",
            "Czech Republic",
            "Denmark",
            "Estonia",
            "Finland",
            "France",
            "Georgia",
            "Germany",
            "Greece",
            "Hungary",
            "Iceland",
            "Ireland",
            "Italy",
            "Kazakhstan",
            "Kosovo",
            "Latvia",
            "Liechtenstein",
            "Lithuania",
            "Luxembourg",
            "Malta",
            "Moldova",
            "Monaco",
            "Montenegro",
            "Netherlands",
            "North Macedonia",
            "Norway",
            "Poland",
            "Portugal",
            "Romania",
            "Russia",
            "San Marino",
            "Serbia",
            "Slovakia",
            "Slovenia",
            "Spain",
            "Sweden",
            "Switzerland",
            "Turkey",
            "Ukraine",
            "United Kingdom",
            "Vatican City"
        ],
        "url": "https://simple.wikipedia.org/wiki/List_of_European_countries",
        "key": "simple_wikipedia_org_wiki_List_of_European_countries"
    }
]

Example Input 2

{
    "max_depth": 0,
    "prompt": "This page contains a list of fashion products. Per each product, scrape the following fields for each product: product code, full price, price, currency, itemurl, imageurl, product category, product subcategory, product name. The product code is a numeric string that can be found in the item url. Full price is the price of the product without discounts, if any. If there is no discount, use the only price shown. Price is the product price after discounts: if there's no discount, use the only product price available. Currency is the ISO code of the currency used to display prices on this page. Imageurl is the URL of the image of the product, used as a thumbnail on this page.",
    "save_html_to_key_value_store": true,
    "save_markdown_to_key_value_store": true,
    "start_urls": [
        {
            "url": "https://www.net-a-porter.com/en-it/shop/clothing",
            "method": "GET"
        }
    ]
}

Output:

[
    {
        "products": [
            {
                "product_code": "1647597349677034",
                "full_price": "1590",
                "price": "1590",
                "currency": "EUR",
                "itemurl": "https://www.net-a-porter.com/en-it/shop/product/gabriela-hearst/clothing/midi-dresses/tenes-belted-ribbed-silk-and-cashmere-blend-midi-dress/1647597349677034",
                "imageurl": "//www.net-a-porter.com/variants/images/1647597349677034/in/w358_q60.jpg",
                "product_category": "Clothing",
                "product_subcategory": "Midi Dresses",
                "product_name": "Tenes belted ribbed silk and cashmere-blend midi dress"
            },
            {
                "product_code": "1647597344535411",
                "full_price": "3243",
                "price": "3243",
                "currency": "EUR",
                "itemurl": "https://www.net-a-porter.com/en-it/shop/product/suzie-kondi/clothing/long/kyma-cashmere-coat/1647597344535411",
                "imageurl": "//www.net-a-porter.com/variants/images/1647597344535411/in/w358_q60.jpg",
                "product_category": "Clothing",
                "product_subcategory": "Coats",
                "product_name": "Kyma cashmere coat"
            },
            ...
        ],
        "url": "https://www.net-a-porter.com/en-it/shop/clothing",
        "key": "www_net_a_porter_com_en_it_shop_clothing"
    }
]

How to Use

Set the start URLs to specify the pages you want to scrape.
Write a prompt to describe your desired output.
Set the maximum depth to control recursive scraping.
Run the actor and get structured results based on your input!

Output

The actor outputs structured data in JSON format, tailored to your provided prompt.

Explore More Actors

✨ Looking for additional solutions? Check out more actors on Apify that can help with your web automation and data extraction needs. Discover a wide range of tools tailored for different scenarios at 🌐 Explore Vulnv's Actors on Apify.

📧 For inquiries or support, feel free to reach out to us at apify@vulnv.com.

On this page

AI Web Scraper [No API Key Needed]

Share Actor:

tsboi index

trim_flag/tsboi-index

Indexing for LLMs. This application crawls specified websites, processes their content into a searchable vector database, and enables users to ask natural language questions about the content.

Ikenna Chidoka

AFZU VERIFYER

user-jxqocdpplm4gwu5iz/afzu-verifyer

Email Address Validation: validates if a string contains a valid email. Email Verification Lookup via SMTP: performs an email verification on the passed email (catchAll detection enabled by default) MX Validation: checks the

aftab shaikh

Substack Newsletter Scraper

red.cars/substack-newsletter-scraper

Extract newsletter content, subscriber counts, post analytics, and creator intelligence from any Substack publication - completely free, no authentication needed!

AutomateLab

1.0

Mastra.ai MCP Agent

jakub.kopecky/actor-mastra-mcp-agent

🤖 AI agent using mastra.ai with Apify MCP Server. 🚀 Runs queries via OpenAI models, taps Apify Actors for web data, and outputs to datasets. 🛠️

Jakub Kopecký

Pinecone GPT Chatbot

tri_angle/pinecone-gpt-chatbot

Pinecone GPT Chatbot combines OpenAI's GPT models with Pinecone's database to generate insightful responses. Its interactive chatbot interface presents precise and comprehensive answers to user queries. Benefit from semantic understanding, efficient workflows, and enriched knowledge integration!

Tri⟁angle

4.6

Ask Website with AI

fayoussef/ask-website-with-ai

Analyzes websites using AI (Gemini/OpenAI) to answer questions from scraped content. It can explore internal links for comprehensive answers, taking a list of URLs and questions. Ideal for targeted data extraction and content summarization.

youssef farhan

5.0

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

64K

4.3

TrustRadius

canadesk/trustradius

Get Software overview, reviews and sentiment (Negative, Neutral, Positive) from TrustRadius. It's fast and costs little.

Canadesk Support

Twitter(X) Comment Scraper:Support Sentiment&Tone Analyzer

fastcrawler/twitter-x-comment-scraper-support-sentiment-tone-analyzer

Capture all Twitter replies, including hidden and nested ones that conversation ID methods often miss. With advanced sentiment and tone analysis, quickly sort replies by likes, relevance, or emotion. No cookies required. Fast, accurate, and ideal for complete conversation scraping.

fastcrawler

244

5.0

Threads Scraper

red.cars/threads-scraper

Threads Scraper is a powerful Apify actor that extracts public profile and post data from Meta's Threads platform. Get comprehensive user profiles, posts, and engagement metrics without authentication or API keys.

AutomateLab

Pinterest Board Scraper

red.cars/pinterest-board-scraper

Transform Pinterest boards into actionable business intelligence. Extract comprehensive board analytics, pin performance metrics, and visual trends to drive your e-commerce strategy, marketing campaigns, and product development decisions.

AutomateLab