Easy URL & Title avatar
Easy URL & Title

Pricing

from $40.00 / 1,000 results

Go to Apify Store
Easy URL & Title

Easy URL & Title

What if getting URLs and titles was just easier? Crawl any site and extract page titles and URLs. Choose Cheerio, JSDOM, or Playwright. Presets, proxy, robots.txt, and depth included.

Pricing

from $40.00 / 1,000 results

Rating

0.0

(0)

Developer

Petros Hong

Petros Hong

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 days ago

Last modified

Share

Website Title Crawler — Best-in-class web crawler

Crawl any website and extract page titles and URLs with one actor. Three engines (Cheerio · JSDOM · Playwright), respect robots.txt, configurable timeouts and retries, crawl depth and scrapedAt in output, live status during runs, optional URL globs and Playwright wait-for-selector. Ideal for sitemaps, SEO checks, and link inventories.

What this actor does

  • Crawler typeCheerio (fast, static), JSDOM (light JS), or Playwright (full browser).
  • Polite crawling – Respect robots.txt, configurable request timeout and retries.
  • Live progress – Status message updates as pages are crawled.
  • Rich output – Each row: title, url, depth (crawl depth from start URL), scrapedAt (ISO timestamp), and error when failed.
  • URL globs – Optionally only follow links matching patterns (e.g. **/blog/**).
  • Playwright – Optional “wait for selector” before extracting (for JS-heavy pages).
  • Crawl presets – Quick (10), Standard (100), Large (1000), or Custom.
  • User charge limit – Stops the crawl when the run’s max result charge limit is reached (pay-per-event).

Input (run configuration)

FieldDescription
Crawler typeCheerio = fastest, static HTML only. JSDOM = light JavaScript. Playwright = full browser, any JS-heavy site.
Start URLsOne or more URLs where the crawl starts. The crawler follows links from these pages.
Crawl sizeQuick (10 pages), Standard (100), Large (1000), or Custom (use Max requests below).
Max requests (custom only)Used only when Crawl size is Custom. Maximum number of pages to crawl.
Use proxyUse Apify proxy to rotate IPs and reduce blocking. Turn off for quick local tests.
Max concurrencyHow many pages to fetch in parallel (1–50).
Limit to same domainOnly follow links on the same domain as the start URLs. Recommended for focused crawls.
Respect robots.txtSkip URLs disallowed by the site’s robots.txt. Recommended for polite crawling.
Request timeout (seconds)Max seconds to wait for each page load.
Max request retriesHow many times to retry a failed request before recording an error.
URL globs (optional)Only follow links matching these patterns (e.g. **/blog/**). Empty = all links (within same domain if enabled).
Playwright: wait for selectorOptional CSS selector to wait for before extracting (Playwright only).

Output

The actor writes one row per page to the run dataset:

  • title – Page <title> (or (no title) if missing)
  • url – Final URL (after redirects)
  • depth – Crawl depth from start URL (0 = start page)
  • scrapedAt – ISO timestamp when the page was scraped
  • error – Set only when the request failed (e.g. timeout, block)

In Apify Console you can view, filter, and export the dataset (JSON, CSV, etc.).

Pricing and cost (per 1,000 results)

For max cost at highest capacity (Playwright + proxy + large crawl) and recommended price per 1,000 results, see ./COST-ESTIMATE.md.

Quick start

pnpm install
apify run

Push changes to your Actor on Apify

  1. Install Apify CLI (if needed): npm install -g apify-cli
  2. Log in: apify login (opens browser; use your Apify account).
  3. Link the project (first time only): from the actor folder run apify init and follow the prompts to link to an existing actor or create one.
  4. Push: apify push — builds the Docker image, pushes it to Apify, and updates the actor.

Your code and .actor/ config (input schema, dataset schema, etc.) are uploaded; the actor on Apify Console will use the new version on the next run. To run locally first: apify run.

Project structure

.actor/
├── actor.json # Actor config: name, version, runtime settings
├── dataset_schema.json # How dataset output is displayed in Console
├── input_schema.json # Input validation & run form (presets, options)
└── output_schema.json # Where the Actor stores its output
src/
└── main.ts # Actor entry point and crawler logic

See Actor definition for details.

Tech stack

  • Apify SDK – storage, input, proxy, lifecycle
  • Crawlee – CheerioCrawler, JSDOMCrawler, PlaywrightCrawler
  • Cheerio – fast HTML parsing (no browser)
  • JSDOM – DOM API, light JS execution
  • Playwright – headless browser for JS-heavy sites
  • Proxy – optional IP rotation

Resources