Under maintenance

Pricing

Pay per usage

Try for free

Go to Apify Store

Ai Powered Scraper

Under maintenance

Try for free

AI Powered Scraper using LangChain and OpenAI.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Dev with Bobby

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Categories

Automation

Developer tools

Start URLs

startUrls

Optional

One or more URLs of pages where the crawler will start. Note that the Actor will additionally only crawl sub-pages of these URLs. For example, for start URL https://www.example.com/blog, it will crawl pages like https://example.com/blog/article-1, but will skip https://example.com/docs/something-else.

Type:array

Max pages

maxCrawlPages

Optional

The maximum number pages to crawl. It includes the start URLs, pagination pages, pages with no content, etc. The crawler will automatically finish after reaching this number. This setting is useful to prevent accidental crawler runaway.

Type:integer

Minimum:0

Default:9999999

OpenAI API key

openAIApiKey

Optional

Enter your OpenAI account and an API key. This is needed for vectorizing the data and also to be able to prompt the OpenAI model.

Type:string

Query

query

Optional

The query you want to ask the model about the crawled data.

Type:string

Re-crawl the data

forceRecrawl

Optional

If enabled, the data will be re-crawled even if cached vector index is available.

Type:boolean

Default:false

Load URLs from Sitemaps

loadUrlsFromSitemaps

Optional

If enabled, the scraper will automatically find and load URLs from sitemap.xml files.

Type:boolean

Default:false

Respect robots.txt file

respectRobotsTxt

Optional

If enabled, the scraper will respect the robots.txt file and avoid crawling disallowed pages.

Type:boolean

Default:true

Crawler type

crawlerType

Optional

Select the crawler type based on your needs

Type:string

Default:adaptive

Options:

adaptivecheerioplaywrightjsdom

User Agent

userAgent

Optional

Custom User-Agent string to use for requests

Type:string

Default:Mozilla/5.0 (compatible; AI-Powered-Scraper/1.0)

Request delay (ms)

requestDelay

Optional

Delay between requests in milliseconds to avoid overwhelming the server

Type:integer

Minimum:0

Maximum:10000

Default:1000

Max request retries

maxRequestRetries

Optional

Maximum number of times to retry failed requests

Type:integer

Minimum:0

Maximum:10

Default:3

Best Data Extractor API

crawlkit/best-data-extractor-api

Extract structured data from any website with AI-powered extraction. Powered by Crawlkit.

Crawlkit

OpenAI Web Scraper

dtrungtin/openai-web-scraper

Crawl web pages and extract structured information using OpenAI

Tin

Best Web Search API

crawlkit/best-web-search-api

Search the web and get structured results with AI-powered relevance. Powered by Crawlkit.

Crawlkit

LangChain.js template

ellustar/my-actor-17

The LangChain.js Template Actor is a ready-to-use foundation for building AI agents with LangChain.js. It supports LLM integration, chains, tools, and memory for fast prototyping and scalable deployment of chatbots, automation workflows, and AI apps.

Ellustar

AI Extraction Agent - Smart Scraper

alizarin_refrigerator-owner/ai-extraction-agent

AI-powered data extraction using natural language prompts. Describe what you need & let AI extract structured data from any webpage automatically.

The Howlers

Sora AI Video Scraper - OpenAI Text-to-Video

payai/sora-video-scraper

Extract AI-generated videos from Sora by OpenAI. Collect video URLs, thumbnails, prompts, and metadata. Perfect for AI researchers and content creators.

PayAI

127

1.0

News Intelligence Pro (AI Powered)

obsequious_ontologist/news-intelligence-pro

Turn news into business intelligence. AI-powered sentiment analysis, brand monitoring, and competitor tracking from 50+ sources. Get market insights, trend alerts, and reputation management in real-time.

Dave Thomson

Website Contact Scraper - AI-Powered Lead Finder

timo.sieber/website-lead-scraper

AI-powered website scraper that extracts real contact data from company sites! Finds people, positions, emails & phone numbers using LLM technology. Scans team pages, contact sections & company info. Perfect for B2B lead generation and sales research.

Timo Sieber

Review Analyzer & Image Generator - AI Powered

viralanalyzer/review-analyzer-image-generator

Collect reviews, analyze sentiment with AI, and generate social proof images.

viralanalyzer

5.0

Instagram Influencer Deep Analyzer - AI-Powered Analytics

charlestechy/instagram-influencer-deep-analyzer

Comprehensive Instagram influencer analysis with AI-powered fake follower detection, engagement metrics, and deep metadata extraction. Perfect for influencer marketing agencies and brands.