No credit card required

Website Content Crawler

apify/website-content-crawler

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Back to issues Create new issue

Limit actur run to just exact input URL

Closed

thom_vd_donk opened this issue

We are using an actor to scrape URL's en extract their content. The problem is that the actor scrapes all connected URL's and we just want the exact URL.

How can I limit the actor to just running one URL?

Jiří Spilka (jiri.spilka)

Hi, thank you for using Website Content Crawler.

If you need to scrape and extract content only from the URLs specified in startURLs, set "maxCrawlDepth": 0.
Additionally, ensure "useSitemaps": false (which you’ve already done).

I hope this helps. Jiri

Jiří Spilka (jiri.spilka)

I’ll go ahead and close this issue for now. If you have any further questions or need assistance, feel free to ask or open a new issue. Best, Jiri

Add comment

Developer

Apify

Actor Metrics

5.4k monthly users
990 bookmarks
>99% runs succeeded
1 days response time
Created in Mar 2023
Modified 13 days ago

Categories

Fast Website Content Crawler

6sigmag/fast-website-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David Deng

283

Deep Website Content Crawler

6sigmag/deep-website-content-crawler

Scrape Failed Killer! A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David Deng

163

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

329

Pinecone Integration

apify/pinecone-integration

This integration transfers data from Apify Actors to a Pinecone and is a good starting point for a question-answering, search, or RAG use case.

Apify

164

Example Website Screenshot Crawler

dz_omar/example-website-screenshot-crawler

Automated website screenshot crawler using Pyppeteer and Apify. This open-source actor captures screenshots from specified URLs, uploads them to the Apify Key-Value Store, and provides easy access to the results, making it ideal for monitoring website changes and archiving web content.

Omar Abdlhakim

Web Scraper

apify/web-scraper

Crawls arbitrary websites using the Chrome browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

75.9k

451

Video Link Crawler

infoweaver/video-link-crawler

Effortlessly discover and extract video links from any website with our powerful Video Link Crawler within few seconds. Starting from a specified URL, it navigates through web pages, identifies video content, and compiles structured datasets.! Try it Now!

InfoWeaver

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Xtech

Web Crawler

rigelbytes/webcrawler

This web crawler is designed to provide users with complete flexibility by allowing them to use their **own proxies**. The scraper collects all pages from the website and returns extracts the **MetaData**, **Title**, and **Content** of the page in MarkDown.

Rigel Bytes

📩📍 Google Maps Email Extractor

lukaskrivka/google-maps-with-contact-details

Extract Google Maps contact details. Scrape websites of Google Maps places for contact details and get email addresses, website, location, address, zipcode, phone number, social media links. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.