Under maintenance

No credit card required

Go to Store

This Actor is under maintenance.

This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?

See alternative Actors

My Actr testing 2

xrhibiyftd/my-actr-testing-2

Try for free

No credit card required

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Change Log

3.0.8 (2023-08-22)

Updated Crawlee version to v3.5.2.
Updated Node.js version to v18.
Added new options:
- Dismiss cookie modals (closeCookieModals): Using the I don't care about cookies browser extension. When on, the crawler will automatically try to dismiss cookie consent modals. This can be useful when crawling European websites that show cookie consent modals.
  - Maximum scrolling distance in pixels (maxScrollHeightPixels): The crawler will scroll down the page until all content is loaded or the maximum scrolling distance is reached. Setting this to 0 disables scrolling altogether.
- Exclude Glob Patterns (excludes): Glob patterns to match links in the page that you want to exclude from being enqueued.

3.0 (`version-3`)

Rewrite from Apify SDK to Crawlee, see the v3 migration guide for more details.
Proxy usage is now required.

2.0 (`version-2`)

Main difference between v1 and v2 of the scrapers is the upgrade of SDK to v2, which requires node v15.10+. SDK v2 uses http2 to do the requests with cheerio-scraper, and the http2 support in older node versions were too buggy, so we decided to drop support for those. If you need to run on older node version, use SDK v1.

Please refer to the SDK 1.0 migration guide for more details about functional changes in the SDK. SDK v2 basically only changes the required node version and has no other breaking changes.

deprecated useRequestQueue option has been removed
- RequestQueue will be always used
deprecated context.html getter from the cheerio-scraper has been removed
- use context.body instead
deprecated prepareRequestFunction input option
- use pre/postNavigationHooks instead
removed puppeteerPool/autoscaledPool from the crawlingContext object
- puppeteerPool was replaced by browserPool
- autoscaledPool and browserPool and available on the crawler property of crawlingContext object
custom "Key-value store name" option in Advanced configuration is now fixed, previously the default store was always used

Developer

Actor Metrics

1 monthly user
1 star
>99% runs succeeded
Created in Jan 2024
Modified a year ago

Categories

Other

Traffic Generator (Youtube, Web, Etsy, Behance and many more!)

epctex/traffic-generator

Maximize your website's performance and visibility with our Traffic Generator. Drive targeted traffic, simulate page views, and stress-test against potential threats. With the power to generate millions of visits, it's the ultimate solution for boosting your online presence.

epctex

25.8k

184

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

28.4k

716

Google Maps Extractor

compass/google-maps-extractor

Extract data from hundreds of places fast. Scrape Google Maps by keyword, category, location, URLs & other filters. Get addresses, contact info, opening hours, popular times, prices, menus & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

Compass

20.8k

452

Instagram Scraper

apify/instagram-scraper

Scrape and download Instagram posts, profiles, places, hashtags, photos, and comments. Get data from Instagram using one or more Instagram URLs or search queries. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.

Apify

61.7k

584

Facebook Posts Scraper

apify/facebook-posts-scraper

Extract data from hundreds of Facebook posts from one or multiple Facebook pages and profiles. Get post URL, post text, page or profile URL, timestamp, number of likes, shares, comments, and more. Download the data in JSON, CSV, and Excel and use it in apps, spreadsheets, and reports.

Apify

15.6k

203

Contact Details Scraper

vdrmota/contact-info-scraper

Free email extractor and lead scraper to extract and download emails, phone numbers, Facebook, Twitter, LinkedIn, and Instagram profiles from any website. Extract contact information at scale from lists of URLs and download the data as Excel, CSV, JSON, HTML, and XML.

Vojta Drmota

25.4k

224

Google Maps Scraper

compass/crawler-google-places

Extract data from hundreds of Google Maps locations and businesses. Get Google Maps data including reviews, images, contact info, opening hours, location, popular times, prices & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.

Compass

80.1k

627

Facebook Groups Scraper

apify/facebook-groups-scraper

Extract data from one or multiple public Facebook groups. Get group and post URLs, post text, comments, timestamp, likes and comments count, and basic commentator info. Download the data in JSON, CSV, and Excel and use it in apps, spreadsheets, and reports.

Apify

7.4k

Facebook Comments Scraper

apify/facebook-comments-scraper

Extract data from hundreds of Facebook comments from one or multiple Facebook posts. Get comment text, timestamp, likes count and basic commenter info. Download the data in JSON, CSV, Excel and use it in apps, spreadsheets, and reports.

Apify

7.1k