Easy URL & Title
Pricing
from $40.00 / 1,000 results
Easy URL & Title
What if getting URLs and titles was just easier? Crawl any site and extract page titles and URLs. Choose Cheerio, JSDOM, or Playwright. Presets, proxy, robots.txt, and depth included.
Pricing
from $40.00 / 1,000 results
Rating
0.0
(0)
Developer

Petros Hong
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
9 days ago
Last modified
Categories
Share
Website Title Crawler — Best-in-class web crawler
Crawl any website and extract page titles and URLs with one actor. Three engines (Cheerio · JSDOM · Playwright), respect robots.txt, configurable timeouts and retries, crawl depth and scrapedAt in output, live status during runs, optional URL globs and Playwright wait-for-selector. Ideal for sitemaps, SEO checks, and link inventories.
What this actor does
- Crawler type – Cheerio (fast, static), JSDOM (light JS), or Playwright (full browser).
- Polite crawling – Respect robots.txt, configurable request timeout and retries.
- Live progress – Status message updates as pages are crawled.
- Rich output – Each row: title, url, depth (crawl depth from start URL), scrapedAt (ISO timestamp), and error when failed.
- URL globs – Optionally only follow links matching patterns (e.g.
**/blog/**). - Playwright – Optional “wait for selector” before extracting (for JS-heavy pages).
- Crawl presets – Quick (10), Standard (100), Large (1000), or Custom.
- User charge limit – Stops the crawl when the run’s max result charge limit is reached (pay-per-event).
Input (run configuration)
| Field | Description |
|---|---|
| Crawler type | Cheerio = fastest, static HTML only. JSDOM = light JavaScript. Playwright = full browser, any JS-heavy site. |
| Start URLs | One or more URLs where the crawl starts. The crawler follows links from these pages. |
| Crawl size | Quick (10 pages), Standard (100), Large (1000), or Custom (use Max requests below). |
| Max requests (custom only) | Used only when Crawl size is Custom. Maximum number of pages to crawl. |
| Use proxy | Use Apify proxy to rotate IPs and reduce blocking. Turn off for quick local tests. |
| Max concurrency | How many pages to fetch in parallel (1–50). |
| Limit to same domain | Only follow links on the same domain as the start URLs. Recommended for focused crawls. |
| Respect robots.txt | Skip URLs disallowed by the site’s robots.txt. Recommended for polite crawling. |
| Request timeout (seconds) | Max seconds to wait for each page load. |
| Max request retries | How many times to retry a failed request before recording an error. |
| URL globs (optional) | Only follow links matching these patterns (e.g. **/blog/**). Empty = all links (within same domain if enabled). |
| Playwright: wait for selector | Optional CSS selector to wait for before extracting (Playwright only). |
Output
The actor writes one row per page to the run dataset:
- title – Page
<title>(or(no title)if missing) - url – Final URL (after redirects)
- depth – Crawl depth from start URL (0 = start page)
- scrapedAt – ISO timestamp when the page was scraped
- error – Set only when the request failed (e.g. timeout, block)
In Apify Console you can view, filter, and export the dataset (JSON, CSV, etc.).
Pricing and cost (per 1,000 results)
For max cost at highest capacity (Playwright + proxy + large crawl) and recommended price per 1,000 results, see ./COST-ESTIMATE.md.
Quick start
pnpm installapify run
Push changes to your Actor on Apify
- Install Apify CLI (if needed):
npm install -g apify-cli - Log in:
apify login(opens browser; use your Apify account). - Link the project (first time only): from the actor folder run
apify initand follow the prompts to link to an existing actor or create one. - Push:
apify push— builds the Docker image, pushes it to Apify, and updates the actor.
Your code and .actor/ config (input schema, dataset schema, etc.) are uploaded; the actor on Apify Console will use the new version on the next run. To run locally first: apify run.
Project structure
.actor/├── actor.json # Actor config: name, version, runtime settings├── dataset_schema.json # How dataset output is displayed in Console├── input_schema.json # Input validation & run form (presets, options)└── output_schema.json # Where the Actor stores its outputsrc/└── main.ts # Actor entry point and crawler logic
See Actor definition for details.
Tech stack
- Apify SDK – storage, input, proxy, lifecycle
- Crawlee – CheerioCrawler, JSDOMCrawler, PlaywrightCrawler
- Cheerio – fast HTML parsing (no browser)
- JSDOM – DOM API, light JS execution
- Playwright – headless browser for JS-heavy sites
- Proxy – optional IP rotation


