Website Links Graph Generator

Pricing

from $5.00 / 1,000 results

Try for free

Go to Apify Store

Website Links Graph Generator

Try for free

Creates an oriented graph visualizing links between webpages. Outputs: graph.png (visual network diagram) and graph.json (structured data) saved to Key-Value Store, plus detailed dataset of all crawled pages. Configure depth, boundaries, and layout.

Pricing

from $5.00 / 1,000 results

Rating

5.0

(5)

Developer

Crawler Bros

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

24 days ago

Last modified

Web Link Graph Visualizer

Creates oriented graphs visualizing links between webpages

Crawl a website starting from a URL, extract all links, build a directed graph of the link structure, and export it as a PNG image or JSON file.

📥 What You'll Get

After the actor completes, you'll receive:

🖼️ graph.png - Visual Network Diagram

Location: Key-Value Store → graph.png
Format: High-resolution PNG (2000x1600px)
Content: Visual graph with color-coded nodes and directed edges
Download: Click "Actions" → "Download" in Key-Value Store tab

📊 graph.json - Structured Data

Location: Key-Value Store → graph.json
Format: JSON file with complete graph structure
Content: All nodes, edges, and statistics
Use: Import into analysis tools or custom visualizations

📑 Dataset - All Crawled Pages

Location: Dataset tab (Storage section)
Format: JSON records (one per page)
Content: URL, title, depth, all links per page
Export: CSV, JSON, or Excel from Dataset tab

🔍 Where to Find in Apify Console:

After actor finishes, go to "Storage" section
Key-Value Store tab:
- Download graph.png (your visual graph image)
- Download graph.json (data for analysis)
Dataset tab:
- View/export all crawled pages
- See links extracted from each page

Features

✅ Smart Crawling:

Start from any URL
Follow links matching a boundary regex
Configurable depth and page limits
Respects robots.txt (via Playwright)
Adjustable request delays

✅ Graph Building:

Directed graph (oriented edges)
Track internal vs external links
URL normalization (remove fragments, trailing slashes)
Depth tracking for each node
Duplicate link detection

✅ Visualization:

Multiple layout algorithms (hierarchical, spring, circular, random)
Customizable node labels (URL, path, title, or index)
Color-coded nodes (internal=blue, external=red)
High-resolution PNG export
JSON export for programmatic use

✅ Statistics:

Total nodes and edges
Average outgoing links per page
Max depth reached
Internal vs external link counts

Input Parameters

Parameter	Type	Default	Description
`startUrl`	String	Required	The URL to start crawling from
`boundaryRegex`	String	`.*`	Regex to limit which URLs to crawl
`maxDepth`	Integer	`3`	Maximum crawl depth (1-10)
`maxPages`	Integer	`50`	Maximum pages to crawl (1-1000)
`exportFormat`	Select	`both`	Output format: `both`, `image`, or `json`
`graphLayout`	Select	`hierarchical`	Layout: `hierarchical`, `spring`, `circular`, `random`
`nodeLabels`	Select	`path`	Label type: `url`, `path`, `title`, `index`
`includeExternal`	Boolean	`true`	Show external links in graph
`waitForSelector`	String	-	CSS selector to wait for (optional)
`requestDelay`	Integer	`1000`	Delay between requests (ms)

Example Inputs

Example 1: Small Website

{
  "startUrl": "https://example.com",
  "boundaryRegex": "^https://example\\.com/.*",
  "maxDepth": 2,
  "maxPages": 20,
  "exportFormat": "both",
  "graphLayout": "hierarchical",
  "nodeLabels": "path"
}

Example 2: Documentation Site

{
  "startUrl": "https://docs.python.org/3/",
  "boundaryRegex": "^https://docs\\.python\\.org/3/tutorial/.*",
  "maxDepth": 3,
  "maxPages": 50,
  "exportFormat": "image",
  "graphLayout": "spring",
  "nodeLabels": "title",
  "includeExternal": false,
  "requestDelay": 500
}

Example 3: Blog with Subdomains

{
  "startUrl": "https://blog.example.com",
  "boundaryRegex": "^https://.*\\.example\\.com/.*",
  "maxDepth": 2,
  "maxPages": 30,
  "exportFormat": "both",
  "graphLayout": "circular",
  "nodeLabels": "path"
}

Output

Dataset

Each crawled page is saved to the dataset with:

url - Page URL
title - Page title
depth - Depth from start URL
links - All extracted links
internal_links - Links matching boundary
external_links - Links outside boundary
crawled_at - Timestamp

Key-Value Store

graph.json (if JSON export enabled):

{
  "graph": {
    "nodes": [
      {
        "id": "https://example.com",
        "url": "https://example.com",
        "title": "Example Domain",
        "depth": 0,
        "is_internal": true,
        "outgoing_links": 3
      }
    ],
    "edges": [
      {
        "source": "https://example.com",
        "target": "https://example.com/page1"
      }
    ],
    "directed": true
  },
  "statistics": {
    "nodes": 15,
    "edges": 42,
    "crawled_pages": 15,
    "external_links": 3,
    "avg_outgoing_links": 2.8,
    "max_depth_reached": 2
  }
}

graph.png (if image export enabled):

High-resolution PNG image (2000x1600px)
Color-coded nodes (blue=internal, red=external)
Directed edges with arrows
Legend and statistics

OUTPUT:

{
  "start_url": "https://example.com",
  "statistics": {
    "nodes": 15,
    "edges": 42,
    "crawled_pages": 15
  },
  "exports": {
    "json": true,
    "image": true
  }
}

Boundary Regex Examples

Pattern	Matches
`^https://example\\.com/.*`	All pages on example.com
`^https://example\\.com/blog/.*`	Only blog section
`^https://.\\.example\\.com/.`	All subdomains
`^https://example\\.com/(?!admin).*`	Exclude admin section
`.*`	Everything (no boundary)

Use Cases

🔍 SEO Analysis:

Visualize site structure
Find orphan pages
Identify link depth issues

📊 Content Strategy:

Map content relationships
Find hub pages
Identify external dependencies

🔗 Link Building:

Discover internal linking opportunities
Find broken link paths
Analyze link distribution

🛠️ Site Migration:

Document current structure
Plan URL redirects
Validate link integrity

Graph Layouts

Hierarchical (Default)

Best for: Sites with clear hierarchy (docs, blogs)

Top-down structure
Shows depth clearly

Spring (Force-Directed)

Best for: Discovering clusters

Nodes repel/attract based on connections
Reveals natural groupings

Circular

Best for: Small sites

Nodes arranged in a circle
Shows connections clearly

Random

Best for: Quick visualization

Fast to generate
Good for dense graphs

Node Label Types

Type	Example	Best For
`url`	`https://example.com/page`	Small graphs
`path`	`/blog/post-title`	Medium graphs (default)
`title`	`My Blog Post`	Readable labels
`index`	`1`, `2`, `3`	Large graphs

Performance Tips

Start Small:
- Use maxPages: 20 for initial runs
- Increase gradually
Tight Boundaries:
- Use specific regex patterns
- Avoid crawling entire domains
Adjust Depth:
- Depth 2-3 is usually sufficient
- Depth 4+ can explode exponentially
Request Delays:
- Use 1000ms+ for courtesy
- Reduce for fast sites
External Links:
- Set includeExternal: false for cleaner graphs
- Enable to see dependencies

Limitations

Max Pages: 1000 (configurable limit)
Max Depth: 10 (configurable limit)
JavaScript: Rendered via Playwright (may be slow)
Image Size: Large graphs (100+ nodes) may have small labels

Technical Details

Built With:

Python 3.11
Apify SDK
Playwright (browser automation)
BeautifulSoup4 (HTML parsing)
NetworkX (graph algorithms)
Matplotlib (visualization)

Graph Type:

Directed graph (DiGraph)
Nodes = URLs
Edges = Links (from → to)

URL Normalization:

Removes fragments (#section)
Removes trailing slashes
Preserves query strings
Converts relative to absolute

Example Output

Small Site (10 pages)

Nodes: 10
Edges: 28
Crawled pages: 10
External links: 3
Avg links per page: 2.8
Max depth reached: 2

Documentation Site (50 pages)

Nodes: 53 (50 internal + 3 external)
Edges: 142
Crawled pages: 50
External links: 3
Avg links per page: 2.7
Max depth reached: 3

Troubleshooting

Issue: No links found

Check waitForSelector for dynamic sites
Verify boundary regex matches start URL

Issue: Too many nodes

Reduce maxPages or maxDepth
Tighten boundary regex

Issue: Image labels too small

Use nodeLabels: "index" for large graphs
Reduce number of nodes

Issue: Slow crawling

Reduce requestDelay
Decrease maxPages
Check site performance

Support

For issues or questions:

Check input parameters
Verify boundary regex
Test with small maxPages first
Review dataset for crawl results

License

MIT License - Free for commercial and personal use

Built with ❤️ using Apify SDK

Signature Generator

crawlerbros/signature-generator

Create professional email signatures in seconds! Choose from multiple templates, customize with your brand colors and logo, add social media icons, and export to HTML (copy-paste ready for Gmail/Outlook), PNG, JPG, or SVG. All outputs are saved to the dataset and downloadable from the Storage tab.

Crawler Bros

5.0

URL to BibTeX Converter

crawlerbros/url-to-bibtex-converter

Convert any URL (academic papers, articles, books, web pages) to properly formatted BibTeX citations. Automatically extracts metadata from arXiv, PubMed, IEEE, ACM, and general web pages. Supports multiple citation types.

Crawler Bros

5.0

VIN decoder API

njoylab/vin-decoder-api

Decode and validate VINs at scale. Extract WMI/manufacturer, country/region, model year, plant, and sequence. ISO 3779 check digit with configurable policy (auto/require/ignore). Accepts an array of VINs, outputs a structured dataset with validity views.

njoylab

5.0

Google Maps MCP

crawlerbros/google-maps-mcp

Unified Apify MCP server for Google Maps. Search for businesses and extract comprehensive data including ratings, reviews, contact info, and more. Scrape detailed reviews from any Google Maps place.

Crawler Bros

5.0

PromptBase Scraper

crawlerbros/PromptBase

Extract comprehensive data from PromptBase.com, the world's largest AI prompt marketplace with 220k+ prompts for AI models. This actor scrapes detailed prompt content, pricing data , creator profiles, AI model classifications, and high-quality prompt images.

Crawler Bros

5.0

Reddit MCP Scraper

crawlerbros/reddit-mcp-scraper

Unified Reddit scraper supporting 3 modes: (1) Subreddit posts with content extraction, (2) Post comments with threading, (3) User profiles with metadata. Extract comprehensive data including scores, timestamps, flairs, NSFW flags, and more.

Crawler Bros

5.0

Internal Links Scraper

mysteriousshadow/internal-links-scraper

When given a sitemap of a website, this scraper will go through every page listed on the sitemap and find all the internal links. Useful for SEO, finding orphaned pages, and visualizing internal linking structure.

Mysterious Shadow

Gemini AI MCP SERVER

bhansalisoft/gemini-ai-mcp-server

Gemini AI MCP SERVER unique tool for Gamini AI functionality integration with apify and other AI tool.

bhansalisoft

Twitter Keywords Scraper

crawlerbros/twitter-keywords-scraper

Extract tweets from Twitter/X based on keywords. Scrapes tweet text, usernames, engagement metrics, media, and timestamps for multiple search terms.

Crawler Bros

5.0

Slack MCP SERVER

bhansalisoft/slack-mcp-server

Slack AI MCP SERVER unique tool for integration slack tool into AI based automation.

bhansalisoft

Website Links Graph Generator

Website Links Graph Generator

Web Link Graph Visualizer

📥 What You'll Get

🖼️ graph.png - Visual Network Diagram

📊 graph.json - Structured Data

📑 Dataset - All Crawled Pages

🔍 Where to Find in Apify Console:

Features

Input Parameters

Example Inputs

Example 1: Small Website

Example 2: Documentation Site

Example 3: Blog with Subdomains

Output

Dataset

Key-Value Store

Boundary Regex Examples

Use Cases

Graph Layouts

Hierarchical (Default)

Spring (Force-Directed)

Circular

Random

Node Label Types

Performance Tips

Limitations

Technical Details

Example Output

Small Site (10 pages)

Documentation Site (50 pages)

Troubleshooting

Support

License

You might also like

Signature Generator

URL to BibTeX Converter

VIN decoder API

Google Maps MCP

PromptBase Scraper

Reddit MCP Scraper

Internal Links Scraper

Gemini AI MCP SERVER

Twitter Keywords Scraper

Slack MCP SERVER