Pricing

Pay per event

Website Key Pages Finder

Find key pages (pricing, docs, status, security, privacy, terms) on any website. Crawls start URLs and returns structured URLs with confidence scores and evidence. Great for competitor analysis, lead enrichment, and audits.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Howard

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

🔍 What does Website Key Pages Finder do?

Website Key Pages Finder is an Apify Actor that automatically discovers important pages on any website. Given a list of URLs, it crawls each site and returns the URLs of six key page types along with confidence scores and evidence explaining how each page was found.

Key page types discovered:

Pricing - Plans, costs, and billing information
Documentation - API docs, guides, and developer resources
Status - System uptime and incident pages
Security - Trust centers, compliance, and security policies
Privacy - Privacy policies and data protection information
Terms - Terms of service and legal agreements

The Actor uses a multi-phase discovery approach that combines URL pattern probing, homepage link extraction, and intelligent crawling to find pages even on sites with non-standard structures.

🎯 Why scrape key pages from websites?

Finding key pages manually across dozens or hundreds of websites is time-consuming and error-prone. This Actor automates the process, making it valuable for:

Competitor Analysis - Quickly gather pricing pages and documentation from competitor websites to understand their offerings and positioning
Sales Intelligence - Enrich lead data with links to company pricing, security, and compliance pages before outreach
Website Auditing - Verify that your own sites have discoverable key pages and assess how competitors structure their information architecture
Market Research - Collect pricing pages across an industry to analyze pricing trends and strategies
Due Diligence - Gather legal documents (privacy policies, terms) from potential partners or acquisition targets
Compliance Monitoring - Track privacy policy and terms changes across a portfolio of vendors

🚀 How to use Website Key Pages Finder

Follow these steps to find key pages on any website:

Open the Actor in Apify Console or via the API
Add your URLs to the Start URLs field (homepage URLs work best)
Configure options (optional) - adjust crawl depth, page limits, and timeout as needed
Run the Actor by clicking "Start" or calling the API
Download results from the Dataset tab in JSON, CSV, or Excel format

Example Input

{
    "startUrls": [
        { "url": "https://apify.com" },
        { "url": "https://stripe.com" },
        { "url": "https://github.com" }
    ],
    "maxDepth": 1,
    "maxPagesPerDomain": 12,
    "includeSubdomains": true
}

💰 How much does it cost to find key pages?

Website Key Pages Finder uses Pay Per Event (PPE) pricing, so you only pay for the websites you analyze.

Pricing:

Per website analyzed: $0.005 per site
Start fee: $0.005 per run
No hidden compute costs - the price per website includes all crawling and processing

Cost control:

Set a maximum spend per run in the Actor input to limit costs
The Actor stops gracefully when your spending limit is reached
Remaining URLs are skipped (not charged) when limit is hit

Example costs:

Websites	Cost
10	$0.055
100	$0.505
1,000	$5.005

Free tier: Apify provides a free tier with monthly credits, typically sufficient for testing and small-scale usage. Check your Apify account for current free tier limits.

📥 Input

Field	Type	Default	Description
`startUrls`	array	required	URLs of websites to analyze. Each URL should be a homepage or any page from the domain.
`maxDepth`	integer	`1`	Crawl depth. `0` = homepage only, `1` = homepage + priority pages (recommended).
`maxPagesPerDomain`	integer	`12`	Maximum pages to fetch per domain. Controls costs and processing time.
`includeSubdomains`	boolean	`true`	Whether to include subdomains when discovering pages (e.g., `docs.example.com`).
`returnTopN`	integer	`1`	Number of top candidates to return per page type. Set higher to see alternative candidates.
`timeoutSecs`	integer	`30`	Timeout in seconds for processing each site.
`proxyConfiguration`	object	`{ "useApifyProxy": false }`	Proxy settings for sites that block direct access.
`debug`	boolean	`false`	Include debug information (raw candidates) in output.

📤 Output

Each website produces one result object in the dataset:

{
    "schemaVersion": "1.0.0",
    "inputUrl": "https://apify.com",
    "finalUrl": "https://apify.com/",
    "domain": "apify.com",
    "pages": {
        "pricing": {
            "url": "https://apify.com/pricing",
            "confidence": 0.95,
            "evidence": ["exact_path:/pricing", "anchor:Pricing", "footer_link"]
        },
        "docs": {
            "url": "https://docs.apify.com",
            "confidence": 0.92,
            "evidence": ["subdomain:docs", "anchor:Documentation"]
        },
        "status": {
            "url": "https://status.apify.com",
            "confidence": 0.88,
            "evidence": ["subdomain:status", "anchor:Status"]
        },
        "security": {
            "url": "https://apify.com/security",
            "confidence": 0.85,
            "evidence": ["path_token:security", "footer_link"]
        },
        "privacy": {
            "url": "https://apify.com/privacy-policy",
            "confidence": 0.90,
            "evidence": ["path_token:privacy", "anchor:Privacy Policy", "footer_link"]
        },
        "terms": {
            "url": "https://apify.com/terms-of-service",
            "confidence": 0.88,
            "evidence": ["path_token:terms", "anchor:Terms of Service", "footer_link"]
        }
    },
    "crawlStats": {
        "pagesFetched": 8,
        "timeMs": 2340,
        "errors": [],
        "likelyJsRendered": false
    },
    "timestamp": "2024-01-15T10:30:00.000Z"
}

Output Fields

Field	Description
`inputUrl`	The URL you provided
`finalUrl`	The URL after following redirects
`domain`	The root domain extracted from the URL
`pages`	Object containing discovered pages for each type
`pages.[type].url`	URL of the discovered page
`pages.[type].confidence`	Confidence score from 0 to 1
`pages.[type].evidence`	Array of signals that contributed to the score
`crawlStats.pagesFetched`	Number of pages fetched during discovery
`crawlStats.timeMs`	Processing time in milliseconds
`crawlStats.errors`	Any errors encountered during crawling
`crawlStats.likelyJsRendered`	Whether the site appears to be JavaScript-rendered
`topCandidates`	(Optional) When `returnTopN > 1`, contains all top candidates per type

Page Types

Type	Common URL Patterns	Description
`pricing`	`/pricing`, `/plans`, `/price`	Pricing and plan information
`docs`	`/docs`, `/documentation`, `/api`, `/developer`	Documentation and API reference
`status`	`/status`, `/uptime`, `status.example.com`	System status and uptime pages
`security`	`/security`, `/trust`, `/compliance`	Security and compliance information
`privacy`	`/privacy`, `/privacy-policy`, `/data-protection`	Privacy policy
`terms`	`/terms`, `/tos`, `/terms-of-service`, `/legal`	Terms of service

📊 How does confidence scoring work?

Each discovered page includes a confidence score between 0 and 1 that indicates how certain the Actor is that the page is correct.

Score	Meaning
0.80 - 1.00	Very confident - strong signals from URL path, anchor text, and page location
0.50 - 0.79	Probable match - good evidence but some ambiguity
0.30 - 0.49	Best guess - limited evidence, may need manual verification
Below 0.30	Not returned - insufficient confidence

Scoring Factors

Discovery Source - Base score from how the page was found
- Fast-path (direct URL probe): +0.40
- Homepage link: +0.30
- Depth-1 crawl: +0.20
- Sitemap: +0.10
Positive Signals - Added to the score
- Exact path match (e.g., /pricing): +0.30
- Token in path (e.g., /pricing-plans): +0.20
- Anchor text match: +0.25
- Footer/nav location: +0.12-0.15
- Subdomain match (e.g., docs.example.com): +0.25
Verification - Final adjustment after checking page content
- Title matches expected keywords: +0.20
- Content verified: +0.15
- HTTP error: -0.50
- Wrong content type: -0.30

🔗 Integrations and API access

REST API

Run the Actor via the Apify API:

curl -X POST "https://api.apify.com/v2/acts/YOUR_USERNAME~website-key-pages-finder/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startUrls": [
      { "url": "https://example.com" }
    ],
    "maxDepth": 1,
    "maxPagesPerDomain": 12
  }'

JavaScript SDK

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('YOUR_USERNAME/website-key-pages-finder').call({
    startUrls: [
        { url: 'https://example.com' },
        { url: 'https://another-site.com' }
    ],
    maxDepth: 1,
    maxPagesPerDomain: 12
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python SDK

from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')

run = client.actor('YOUR_USERNAME/website-key-pages-finder').call(run_input={
    'startUrls': [
        {'url': 'https://example.com'},
        {'url': 'https://another-site.com'}
    ],
    'maxDepth': 1,
    'maxPagesPerDomain': 12
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

Webhooks and Integrations

Apify supports webhooks to notify your systems when a run completes. You can also integrate with:

Zapier - Trigger workflows when new data is available
Make (Integromat) - Build automated pipelines
Google Sheets - Export results directly to spreadsheets
Slack - Get notifications when runs complete

See Apify Integrations for more options.

❓ FAQ

What happens if a page type isn't found?

If the Actor cannot find a page type with sufficient confidence (score >= 0.30), that type will be omitted from the pages object in the output. This is normal - not all websites have all page types.

Why are some confidence scores lower than expected?

Confidence scores depend on the signals found during crawling. Sites with non-standard URL structures, unusual navigation, or pages behind authentication may have lower scores. Check the evidence array to understand what signals were detected.

Can this Actor handle JavaScript-rendered websites?

This Actor uses HTTP-based crawling (CheerioCrawler) for speed and efficiency. Sites that heavily rely on JavaScript for rendering may have incomplete results. The output includes a likelyJsRendered flag to indicate when this might be an issue. For such sites, consider using a browser-based scraper.

How do I increase accuracy for specific sites?

Increase maxPagesPerDomain to allow more thorough crawling
Set returnTopN > 1 to see alternative candidates
Enable debug mode to see all candidates and their scores

What's the difference between maxDepth 0 and 1?

maxDepth: 0 - Only analyzes the homepage (fastest, cheapest)
maxDepth: 1 - Analyzes homepage plus follows promising links (recommended for best results)

No, this Actor only crawls publicly accessible pages. It cannot handle authentication or login flows.

⚖️ Is it legal to scrape websites for key pages?

Web scraping legality varies by jurisdiction and use case. When using this Actor:

Respect robots.txt - The Actor follows standard web crawling conventions
Review Terms of Service - Some websites explicitly prohibit scraping in their ToS
Use reasonable rate limits - The Actor includes delays to avoid overwhelming servers
Public data only - Only scrape publicly accessible information
Intended use - Ensure your use case complies with applicable laws (GDPR, CCPA, etc.)

This Actor is designed for legitimate business purposes such as competitive research, lead enrichment, and website auditing. Users are responsible for ensuring their use complies with applicable laws and website terms of service.

Disclaimer: This information is not legal advice. Consult with a legal professional for guidance specific to your jurisdiction and use case.

⚠️ Limitations

JavaScript-rendered content - Uses HTTP-based crawling (CheerioCrawler), so heavily JavaScript-rendered sites may have incomplete results. Check the likelyJsRendered flag.
Rate limiting - Some sites may block rapid requests. The Actor includes retry logic, but sites with aggressive anti-bot measures may cause failures.
Page budget - Limited to maxPagesPerDomain fetches per site to control costs. Increase this for complex sites.
Crawl depth - Currently supports depth 0 (homepage only) or depth 1 (homepage + one level). Deep recursive crawling is not supported.
Authentication - Cannot access pages behind login or authentication.

Looking for more web scraping solutions? Check out these related Actors:

Website Content Crawler - Extract all text content from websites
Web Scraper - General-purpose web scraping with custom selectors
Cheerio Scraper - Fast HTTP-based scraping for static sites

📚 Resources and support

Apify Platform Documentation - Learn how to use Apify
Report Issues - Found a bug? Let us know
Apify Discord - Join the community for help and discussions

🛠️ Local Development

Prerequisites

Node.js 18+
npm

Setup

# Install dependencies
npm install

# Run locally
apify run

# Run with custom input
apify run --input='{"startUrls":[{"url":"https://example.com"}]}'

Deploy

# Login to Apify
apify login

# Deploy to Apify platform
apify push

Website Contact Finder

janbruinier/jan-website-contact-finder

Find contact information from any website

Jan Bruinier

Website Contact Finder

automation-lab/website-contact-finder

Website Contact Finder crawls any website and extracts email addresses, phone numbers, and social media profile links. It prioritizes contact and about pages, filters out false positives, and returns clean, structured data ready for CRM import or outreach campaigns.

Stas Persiianenko

Magento Website Detector

kvantis/magento-website-detector

🔍 Instantly detect Magento 1.x/2.x websites with 90%+ accuracy. Perfect for lead generation, competitor analysis & market research. Bulk process URLs, get confidence scores & detailed evidence. Ideal for agencies, developers & researchers. Input multiple URLs, get structured results.

Kvantis

5.0

Website Contact Finder

pushpak_af/website-contact-finder

Pushpak Agrawal

Website Contact Scraper

logiover/website-contact-scraper

Extract emails, phone numbers and social links (LinkedIn, Instagram, X/Twitter, Facebook, YouTube) from any website. Auto-detects Contact/About pages (depth 1) and returns clean JSON per domain. Great for B2B lead gen, outreach, CRM enrichment and research.

Logiover

Website Enrichment Scraper

gtgyani206/website-enrichment-scraper

Website Enrichment Scraper extracts structured business intelligence from any website, including business name, category, and verified email addresses. Designed for lead enrichment, sales intelligence, and data validation workflows at scale.

Gyanendra Thakur

Contact Details & Email Finder - Lead Generation

sovereigntaylor/contact-email-finder

Extract emails, phone numbers, social profiles, and contact info from any website. Crawls contact/about/team pages automatically. Bulk URL support.

Ricardo Akiyoshi

Sitemap Generator - Crawl Website & Create XML Sitemap

scrappy_garden/sitemap-generator

Generate an XML sitemap for any website. Crawls internal pages from start URLs (with depth + page limits), deduplicates URLs, and stores a ready-to-submit sitemap.xml plus a structured dataset and summary for SEO audits.

Bikram Adhikari

Phone Number Scanner

himalyancoder/Phone-number-scanner

[cheap] Find where a phone number appears on public webpages. Searches Google, crawls pages, validates matches, and outputs URLs with evidence snippets to support removal requests.

Sameer Pun

Website Email & Phone Finder

scraper-mind/website-email-phone-finder

Accelerate your lead generation with a powerful Website Email Finder that extracts verified emails, phone numbers, and social media links from any site. Perfect for sales, outreach, and marketers who need fast, accurate website contacts at scale. 🚀

Scraper Mind

Website Key Pages Finder

🔍 What does Website Key Pages Finder do?

🎯 Why scrape key pages from websites?

🚀 How to use Website Key Pages Finder

Example Input

💰 How much does it cost to find key pages?

📥 Input

📤 Output

Output Fields

Page Types

📊 How does confidence scoring work?

Scoring Factors

🔗 Integrations and API access

REST API

JavaScript SDK

Python SDK

Webhooks and Integrations

❓ FAQ

What happens if a page type isn't found?

Why are some confidence scores lower than expected?

Can this Actor handle JavaScript-rendered websites?

How do I increase accuracy for specific sites?

What's the difference between maxDepth 0 and 1?

Does this work with sites behind login?

⚖️ Is it legal to scrape websites for key pages?

⚠️ Limitations

🔄 Related Actors

📚 Resources and support

🛠️ Local Development

Prerequisites

Setup

Deploy

You might also like

Website Contact Finder

Website Contact Finder

Magento Website Detector

Website Contact Finder

Website Contact Scraper

Website Enrichment Scraper

Contact Details & Email Finder - Lead Generation

Sitemap Generator - Crawl Website & Create XML Sitemap

Phone Number Scanner

Website Email & Phone Finder