Internal Links Scraper avatar
Internal Links Scraper

Pricing

$25.00/month + usage

Go to Apify Store
Internal Links Scraper

Internal Links Scraper

When given a sitemap of a website, this scraper will go through every page listed on the sitemap and find all the internal links. Useful for SEO, finding orphaned pages, and visualizing internal linking structure.

Pricing

$25.00/month + usage

Rating

0.0

(0)

Developer

Mysterious Shadow

Mysterious Shadow

Maintained by Community

Actor stats

3

Bookmarked

98

Total users

2

Monthly active users

10 months ago

Last modified

Share

Sitemap-Based Web Scraper

This tool crawls every page listed on a website's sitemap and retrieves all internal links from each page. It’s ideal for SEO analysis, identifying orphaned pages, and visualizing the internal linking structure of a site.

Features

  • Crawl Entire Website: Starts with a sitemap and navigates through each page listed for thorough coverage.
  • Internal Link Extraction: Finds and catalogs all internal links on each page.
  • Internal Link Validation: All internal links are validated to ensure accuracy. For example, self-referencing links, which point to the same page (e.g., /about linking to /about), are pointless and are ignored in the results.
  • SEO Insights: Helps identify orphaned pages and underlinked/overlinked pages.

Usage

  1. Provide Sitemap URL: Start by giving the scraper the URL of the sitemap (XML format).
  2. Run the Scraper: The scraper will visit each URL in the sitemap and collect internal links.
  3. Data Analysis: Use the output to get insights and make improvements on your site.

How to Get a Sitemap

If you're unsure how to find a website's sitemap, follow this guide:
How to Find a Sitemap on Any Website.

Note: For larger websites, more RAM, CPU, and time may be needed to handle the extensive data collection.

Output Format

The scraper produces a structured output showing internal link relationships for each URL in the sitemap. The output includes:

  • linking_structure: The complete internal linking structure of the site. Relative paths are shown for better clarity. For example:
    • The root domain is represented as "" (empty string).
    • /about instead of https://example.com/about.
  • incoming_links: The number of internal links pointing to the URL.
    • incoming_links[url] == 0 indicates an orphaned page (a page listed in the sitemap but not linked to from any other page).
  • outgoing_links: The number of internal links the URL contains, pointing to other pages within the site.

Troubleshooting

If there are no results or unexpected results:

  1. Wait: It can take a while for the result to show up after the Actor has exited.
  2. Ensure the sitemap is accessible and in XML format: Double-check that the sitemap is reachable and correctly formatted in XML.
  3. Ensure the pages are accessible: If pages are not being crawled, you might need to adjust the proxy settings.
  4. Contact me: If the issue persists, feel free to reach out, and I’ll address the problem as soon as possible.

Sample Output

{
"linking_structure": {
"https://pliwriters.com": [
"/blog",
"/about",
"/contact",
"/contact",
"/contact",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/blog/how-to-find-internal-links-to-a-page": [
"",
"",
"/blog",
"/about",
"/contact",
"/blog/category/uncategorized",
"/blog/internal-links-vs-external-links",
"/internal-link-visualization-beta",
"/blog/how-to-find-a-sitemap-on-any-website",
"/blog/the-ultimate-guide-to-anchor-text",
"/blog/how-to-find-internal-links-to-a-page/",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/about": [
"",
"",
"/blog",
"/contact",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/blog/category/uncategorized": [
"",
"",
"/blog",
"/about",
"/contact",
"/blog/the-ultimate-guide-to-anchor-text",
"/blog/best-practices-for-website-navigation",
"/blog/what-are-orphan-pages",
"/blog/internal-links-vs-external-links",
"/blog/internal-links-vs-external-links",
"/blog/how-to-find-internal-links-to-a-page",
"/blog/how-to-find-a-sitemap-on-any-website",
"/blog/how-to-find-a-sitemap-on-any-website",
"/blog/3-key-components-of-seo",
"/blog/3-key-components-of-seo",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/blog/what-are-orphan-pages": [
"",
"",
"/blog",
"/about",
"/contact",
"/blog/category/uncategorized",
"/blog/how-to-find-a-sitemap-on-any-website",
"/blog/what-are-orphan-pages/",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/terms-and-conditions": [
"",
"",
"/blog",
"/about",
"/contact",
"/about",
"/blog",
"/contact",
"/privacy-policy"
],
"https://pliwriters.com/privacy-policy": [
"",
"",
"/blog",
"/about",
"/contact",
"/about",
"/blog",
"/contact",
"/terms-and-conditions"
],
"https://pliwriters.com/blog": [
"",
"",
"/about",
"/contact",
"/blog/the-ultimate-guide-to-anchor-text",
"/blog/the-ultimate-guide-to-anchor-text",
"/blog/best-practices-for-website-navigation",
"/blog/best-practices-for-website-navigation",
"/about",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/internal-link-visualization-beta": [
"",
"",
"/blog",
"/about",
"/contact",
"/blog/how-to-find-a-sitemap-on-any-website",
"/contact",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/blog/the-ultimate-guide-to-anchor-text": [
"",
"",
"/blog",
"/about",
"/contact",
"/blog/category/uncategorized",
"/blog/best-practices-for-website-navigation",
"/blog/the-ultimate-guide-to-anchor-text/",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/blog/3-key-components-of-seo": [
"",
"",
"/blog",
"/about",
"/contact",
"/blog/category/uncategorized",
"",
"/contact",
"/blog/3-key-components-of-seo/",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/contact": [
"",
"",
"/blog",
"/about",
"/about",
"/blog",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/orphan-page-test": [
"",
"",
"/blog",
"/about",
"/contact",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/blog/best-practices-for-website-navigation": [
"",
"",
"/blog",
"/about",
"/contact",
"/blog/category/uncategorized",
"/blog/internal-links-vs-external-links",
"/blog/what-are-orphan-pages",
"/blog/how-to-find-a-sitemap-on-any-website",
"/blog/best-practices-for-website-navigation/",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/blog/how-to-find-a-sitemap-on-any-website": [
"",
"",
"/blog",
"/about",
"/contact",
"/blog/category/uncategorized",
"/blog/how-to-find-a-sitemap-on-any-website/",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
],
"https://pliwriters.com/blog/internal-links-vs-external-links": [
"",
"",
"/blog",
"/about",
"/contact",
"/blog/category/uncategorized",
"/blog/what-are-orphan-pages",
"/blog/internal-links-vs-external-links/",
"/about",
"/blog",
"/contact",
"/privacy-policy",
"/terms-and-conditions"
]
},
"incoming_links": {
"/orphan-page-test": 0,
"/internal-link-visualization-beta": 1,
"/blog/how-to-find-internal-links-to-a-page": 2,
"/blog/3-key-components-of-seo": 3,
"/blog/what-are-orphan-pages": 4,
"/blog/the-ultimate-guide-to-anchor-text": 5,
"/blog/best-practices-for-website-navigation": 5,
"/blog/internal-links-vs-external-links": 5,
"/blog/category/uncategorized": 7,
"/blog/how-to-find-a-sitemap-on-any-website": 7,
"/privacy-policy": 15,
"/terms-and-conditions": 15,
"/about": 30,
"/blog": 30,
"": 31,
"/contact": 34
},
"outgoing_links": {
"/blog/category/uncategorized": 20,
"/blog/how-to-find-internal-links-to-a-page": 16,
"/blog/best-practices-for-website-navigation": 15,
"/blog/3-key-components-of-seo": 14,
"/blog/what-are-orphan-pages": 13,
"/blog/the-ultimate-guide-to-anchor-text": 13,
"/blog/internal-links-vs-external-links": 13,
"/blog": 12,
"/internal-link-visualization-beta": 12,
"/blog/how-to-find-a-sitemap-on-any-website": 12,
"": 10,
"/orphan-page-test": 10,
"/terms-and-conditions": 9,
"/privacy-policy": 9,
"/about": 8,
"/contact": 8
}
}