Website Crawler
Pricing
Pay per usage
Website Crawler
Crawls a website starting from one or more URLs and extracts the title, meta description, headings and text from each page.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
elcon software
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
2
Monthly active users
3 days ago
Last modified
Categories
Share
Website Crawler (Apify Actor)
A simple Apify actor built with Crawlee that crawls a website starting from one or more URLs and extracts, for each page:
url,title,description(meta),h1- all
h1/h2/h3headings wordCountof the body textdepthandcrawledAt
It uses CheerioCrawler (fast, HTTP-only, no browser) so it's cheap to run.
Input
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | required | URLs to start crawling from. |
maxRequestsPerCrawl | integer | 50 | Hard cap on pages visited. |
maxCrawlDepth | integer | 2 | How many links deep to follow (0 = start URLs only). |
sameDomainOnly | boolean | true | Only follow links on the start URL's hostname. |
Run locally
npm installnpm run start:dev
Local input is read from storage/key_value_stores/default/INPUT.json.
Results are written to storage/datasets/default/.
Build
npm run build # compiles src -> distnpm run start:prod # runs the compiled actor
Deploy to Apify
See the step-by-step in the project chat, or in short:
- Install the CLI:
npm install -g apify-cli apify login- From this folder:
apify push
Alternatively, connect this Git repo in the Apify Console (Actors → Create new → link Git repository) and Apify will build from the .actor/ config automatically.