Internal Links Scraper
Pricing
$25.00/month + usage
Internal Links Scraper
When given a sitemap of a website, this scraper will go through every page listed on the sitemap and find all the internal links. Useful for SEO, finding orphaned pages, and visualizing internal linking structure.
Pricing
$25.00/month + usage
Rating
0.0
(0)
Developer

Mysterious Shadow
Actor stats
3
Bookmarked
98
Total users
2
Monthly active users
10 months ago
Last modified
Categories
Share
Sitemap-Based Web Scraper
This tool crawls every page listed on a website's sitemap and retrieves all internal links from each page. It’s ideal for SEO analysis, identifying orphaned pages, and visualizing the internal linking structure of a site.
Features
- Crawl Entire Website: Starts with a sitemap and navigates through each page listed for thorough coverage.
- Internal Link Extraction: Finds and catalogs all internal links on each page.
- Internal Link Validation: All internal links are validated to ensure accuracy. For example, self-referencing links, which point to the same page (e.g.,
/aboutlinking to/about), are pointless and are ignored in the results. - SEO Insights: Helps identify orphaned pages and underlinked/overlinked pages.
Usage
- Provide Sitemap URL: Start by giving the scraper the URL of the sitemap (XML format).
- Run the Scraper: The scraper will visit each URL in the sitemap and collect internal links.
- Data Analysis: Use the output to get insights and make improvements on your site.
How to Get a Sitemap
If you're unsure how to find a website's sitemap, follow this guide:
How to Find a Sitemap on Any Website.
Note: For larger websites, more RAM, CPU, and time may be needed to handle the extensive data collection.
Output Format
The scraper produces a structured output showing internal link relationships for each URL in the sitemap. The output includes:
- linking_structure: The complete internal linking structure of the site. Relative paths are shown for better clarity. For example:
- The root domain is represented as
""(empty string). /aboutinstead ofhttps://example.com/about.
- The root domain is represented as
- incoming_links: The number of internal links pointing to the URL.
incoming_links[url] == 0indicates an orphaned page (a page listed in the sitemap but not linked to from any other page).
- outgoing_links: The number of internal links the URL contains, pointing to other pages within the site.
Troubleshooting
If there are no results or unexpected results:
- Wait: It can take a while for the result to show up after the Actor has exited.
- Ensure the sitemap is accessible and in XML format: Double-check that the sitemap is reachable and correctly formatted in XML.
- Ensure the pages are accessible: If pages are not being crawled, you might need to adjust the proxy settings.
- Contact me: If the issue persists, feel free to reach out, and I’ll address the problem as soon as possible.
Sample Output
{"linking_structure": {"https://pliwriters.com": ["/blog","/about","/contact","/contact","/contact","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/blog/how-to-find-internal-links-to-a-page": ["","","/blog","/about","/contact","/blog/category/uncategorized","/blog/internal-links-vs-external-links","/internal-link-visualization-beta","/blog/how-to-find-a-sitemap-on-any-website","/blog/the-ultimate-guide-to-anchor-text","/blog/how-to-find-internal-links-to-a-page/","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/about": ["","","/blog","/contact","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/blog/category/uncategorized": ["","","/blog","/about","/contact","/blog/the-ultimate-guide-to-anchor-text","/blog/best-practices-for-website-navigation","/blog/what-are-orphan-pages","/blog/internal-links-vs-external-links","/blog/internal-links-vs-external-links","/blog/how-to-find-internal-links-to-a-page","/blog/how-to-find-a-sitemap-on-any-website","/blog/how-to-find-a-sitemap-on-any-website","/blog/3-key-components-of-seo","/blog/3-key-components-of-seo","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/blog/what-are-orphan-pages": ["","","/blog","/about","/contact","/blog/category/uncategorized","/blog/how-to-find-a-sitemap-on-any-website","/blog/what-are-orphan-pages/","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/terms-and-conditions": ["","","/blog","/about","/contact","/about","/blog","/contact","/privacy-policy"],"https://pliwriters.com/privacy-policy": ["","","/blog","/about","/contact","/about","/blog","/contact","/terms-and-conditions"],"https://pliwriters.com/blog": ["","","/about","/contact","/blog/the-ultimate-guide-to-anchor-text","/blog/the-ultimate-guide-to-anchor-text","/blog/best-practices-for-website-navigation","/blog/best-practices-for-website-navigation","/about","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/internal-link-visualization-beta": ["","","/blog","/about","/contact","/blog/how-to-find-a-sitemap-on-any-website","/contact","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/blog/the-ultimate-guide-to-anchor-text": ["","","/blog","/about","/contact","/blog/category/uncategorized","/blog/best-practices-for-website-navigation","/blog/the-ultimate-guide-to-anchor-text/","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/blog/3-key-components-of-seo": ["","","/blog","/about","/contact","/blog/category/uncategorized","","/contact","/blog/3-key-components-of-seo/","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/contact": ["","","/blog","/about","/about","/blog","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/orphan-page-test": ["","","/blog","/about","/contact","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/blog/best-practices-for-website-navigation": ["","","/blog","/about","/contact","/blog/category/uncategorized","/blog/internal-links-vs-external-links","/blog/what-are-orphan-pages","/blog/how-to-find-a-sitemap-on-any-website","/blog/best-practices-for-website-navigation/","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/blog/how-to-find-a-sitemap-on-any-website": ["","","/blog","/about","/contact","/blog/category/uncategorized","/blog/how-to-find-a-sitemap-on-any-website/","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"],"https://pliwriters.com/blog/internal-links-vs-external-links": ["","","/blog","/about","/contact","/blog/category/uncategorized","/blog/what-are-orphan-pages","/blog/internal-links-vs-external-links/","/about","/blog","/contact","/privacy-policy","/terms-and-conditions"]},"incoming_links": {"/orphan-page-test": 0,"/internal-link-visualization-beta": 1,"/blog/how-to-find-internal-links-to-a-page": 2,"/blog/3-key-components-of-seo": 3,"/blog/what-are-orphan-pages": 4,"/blog/the-ultimate-guide-to-anchor-text": 5,"/blog/best-practices-for-website-navigation": 5,"/blog/internal-links-vs-external-links": 5,"/blog/category/uncategorized": 7,"/blog/how-to-find-a-sitemap-on-any-website": 7,"/privacy-policy": 15,"/terms-and-conditions": 15,"/about": 30,"/blog": 30,"": 31,"/contact": 34},"outgoing_links": {"/blog/category/uncategorized": 20,"/blog/how-to-find-internal-links-to-a-page": 16,"/blog/best-practices-for-website-navigation": 15,"/blog/3-key-components-of-seo": 14,"/blog/what-are-orphan-pages": 13,"/blog/the-ultimate-guide-to-anchor-text": 13,"/blog/internal-links-vs-external-links": 13,"/blog": 12,"/internal-link-visualization-beta": 12,"/blog/how-to-find-a-sitemap-on-any-website": 12,"": 10,"/orphan-page-test": 10,"/terms-and-conditions": 9,"/privacy-policy": 9,"/about": 8,"/contact": 8}}