Website Crawler avatar

Website Crawler

Pricing

Pay per usage

Go to Apify Store
Website Crawler

Website Crawler

Crawls a website starting from one or more URLs and extracts the title, meta description, headings and text from each page.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

elcon software

elcon software

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

3 days ago

Last modified

Categories

Share

Website Crawler (Apify Actor)

A simple Apify actor built with Crawlee that crawls a website starting from one or more URLs and extracts, for each page:

  • url, title, description (meta), h1
  • all h1/h2/h3 headings
  • wordCount of the body text
  • depth and crawledAt

It uses CheerioCrawler (fast, HTTP-only, no browser) so it's cheap to run.

Input

FieldTypeDefaultDescription
startUrlsarrayrequiredURLs to start crawling from.
maxRequestsPerCrawlinteger50Hard cap on pages visited.
maxCrawlDepthinteger2How many links deep to follow (0 = start URLs only).
sameDomainOnlybooleantrueOnly follow links on the start URL's hostname.

Run locally

npm install
npm run start:dev

Local input is read from storage/key_value_stores/default/INPUT.json. Results are written to storage/datasets/default/.

Build

npm run build # compiles src -> dist
npm run start:prod # runs the compiled actor

Deploy to Apify

See the step-by-step in the project chat, or in short:

  1. Install the CLI: npm install -g apify-cli
  2. apify login
  3. From this folder: apify push

Alternatively, connect this Git repo in the Apify Console (Actors → Create new → link Git repository) and Apify will build from the .actor/ config automatically.