Internal Links Scraper
1 day trial then $25.00/month - No credit card required now
Internal Links Scraper
1 day trial then $25.00/month - No credit card required now
When given a sitemap of a website, this scraper will go through every page listed on the sitemap and find all the internal links. Useful for SEO, finding orphaned pages, and visualizing internal linking structure.
Sitemap-Based Web Scraper
This tool crawls every page listed on a website's sitemap and retrieves all internal links from each page. It’s ideal for SEO analysis, identifying orphaned pages, and visualizing the internal linking structure of a site.
Features
- Crawl Entire Website: Starts with a sitemap and navigates through each page listed for thorough coverage.
- Internal Link Extraction: Finds and catalogs all internal links on each page.
- Internal Link Validation: All internal links are validated to ensure accuracy. For example, self-referencing links, which point to the same page (e.g.,
/about
linking to/about
), are pointless and are ignored in the results. - SEO Insights: Helps identify orphaned pages and underlinked/overlinked pages.
Usage
- Provide Sitemap URL: Start by giving the scraper the URL of the sitemap (XML format).
- Run the Scraper: The scraper will visit each URL in the sitemap and collect internal links.
- Data Analysis: Use the output to get insights and make improvements on your site.
How to Get a Sitemap
If you're unsure how to find a website's sitemap, follow this guide:
How to Find a Sitemap on Any Website.
Note: For larger websites, more RAM, CPU, and time may be needed to handle the extensive data collection.
Output Format
The scraper produces a structured output showing internal link relationships for each URL in the sitemap. The output includes:
- linking_structure: The complete internal linking structure of the site. Relative paths are shown for better clarity. For example:
- The root domain is represented as
""
(empty string). /about
instead ofhttps://example.com/about
.
- The root domain is represented as
- incoming_links: The number of internal links pointing to the URL.
incoming_links[url] == 0
indicates an orphaned page (a page listed in the sitemap but not linked to from any other page).
- outgoing_links: The number of internal links the URL contains, pointing to other pages within the site.
Troubleshooting
If there are no results or unexpected results:
- Wait: It can take a while for the result to show up after the Actor has exited.
- Ensure the sitemap is accessible and in XML format: Double-check that the sitemap is reachable and correctly formatted in XML.
- Ensure the pages are accessible: If pages are not being crawled, you might need to adjust the proxy settings.
- Contact me: If the issue persists, feel free to reach out, and I’ll address the problem as soon as possible.
Sample Output
1{ 2 "linking_structure": { 3 "https://pliwriters.com": [ 4 "/blog", 5 "/about", 6 "/contact", 7 "/contact", 8 "/contact", 9 "/about", 10 "/blog", 11 "/contact", 12 "/privacy-policy", 13 "/terms-and-conditions" 14 ], 15 "https://pliwriters.com/blog/how-to-find-internal-links-to-a-page": [ 16 "", 17 "", 18 "/blog", 19 "/about", 20 "/contact", 21 "/blog/category/uncategorized", 22 "/blog/internal-links-vs-external-links", 23 "/internal-link-visualization-beta", 24 "/blog/how-to-find-a-sitemap-on-any-website", 25 "/blog/the-ultimate-guide-to-anchor-text", 26 "/blog/how-to-find-internal-links-to-a-page/", 27 "/about", 28 "/blog", 29 "/contact", 30 "/privacy-policy", 31 "/terms-and-conditions" 32 ], 33 "https://pliwriters.com/about": [ 34 "", 35 "", 36 "/blog", 37 "/contact", 38 "/blog", 39 "/contact", 40 "/privacy-policy", 41 "/terms-and-conditions" 42 ], 43 "https://pliwriters.com/blog/category/uncategorized": [ 44 "", 45 "", 46 "/blog", 47 "/about", 48 "/contact", 49 "/blog/the-ultimate-guide-to-anchor-text", 50 "/blog/best-practices-for-website-navigation", 51 "/blog/what-are-orphan-pages", 52 "/blog/internal-links-vs-external-links", 53 "/blog/internal-links-vs-external-links", 54 "/blog/how-to-find-internal-links-to-a-page", 55 "/blog/how-to-find-a-sitemap-on-any-website", 56 "/blog/how-to-find-a-sitemap-on-any-website", 57 "/blog/3-key-components-of-seo", 58 "/blog/3-key-components-of-seo", 59 "/about", 60 "/blog", 61 "/contact", 62 "/privacy-policy", 63 "/terms-and-conditions" 64 ], 65 "https://pliwriters.com/blog/what-are-orphan-pages": [ 66 "", 67 "", 68 "/blog", 69 "/about", 70 "/contact", 71 "/blog/category/uncategorized", 72 "/blog/how-to-find-a-sitemap-on-any-website", 73 "/blog/what-are-orphan-pages/", 74 "/about", 75 "/blog", 76 "/contact", 77 "/privacy-policy", 78 "/terms-and-conditions" 79 ], 80 "https://pliwriters.com/terms-and-conditions": [ 81 "", 82 "", 83 "/blog", 84 "/about", 85 "/contact", 86 "/about", 87 "/blog", 88 "/contact", 89 "/privacy-policy" 90 ], 91 "https://pliwriters.com/privacy-policy": [ 92 "", 93 "", 94 "/blog", 95 "/about", 96 "/contact", 97 "/about", 98 "/blog", 99 "/contact", 100 "/terms-and-conditions" 101 ], 102 "https://pliwriters.com/blog": [ 103 "", 104 "", 105 "/about", 106 "/contact", 107 "/blog/the-ultimate-guide-to-anchor-text", 108 "/blog/the-ultimate-guide-to-anchor-text", 109 "/blog/best-practices-for-website-navigation", 110 "/blog/best-practices-for-website-navigation", 111 "/about", 112 "/contact", 113 "/privacy-policy", 114 "/terms-and-conditions" 115 ], 116 "https://pliwriters.com/internal-link-visualization-beta": [ 117 "", 118 "", 119 "/blog", 120 "/about", 121 "/contact", 122 "/blog/how-to-find-a-sitemap-on-any-website", 123 "/contact", 124 "/about", 125 "/blog", 126 "/contact", 127 "/privacy-policy", 128 "/terms-and-conditions" 129 ], 130 "https://pliwriters.com/blog/the-ultimate-guide-to-anchor-text": [ 131 "", 132 "", 133 "/blog", 134 "/about", 135 "/contact", 136 "/blog/category/uncategorized", 137 "/blog/best-practices-for-website-navigation", 138 "/blog/the-ultimate-guide-to-anchor-text/", 139 "/about", 140 "/blog", 141 "/contact", 142 "/privacy-policy", 143 "/terms-and-conditions" 144 ], 145 "https://pliwriters.com/blog/3-key-components-of-seo": [ 146 "", 147 "", 148 "/blog", 149 "/about", 150 "/contact", 151 "/blog/category/uncategorized", 152 "", 153 "/contact", 154 "/blog/3-key-components-of-seo/", 155 "/about", 156 "/blog", 157 "/contact", 158 "/privacy-policy", 159 "/terms-and-conditions" 160 ], 161 "https://pliwriters.com/contact": [ 162 "", 163 "", 164 "/blog", 165 "/about", 166 "/about", 167 "/blog", 168 "/privacy-policy", 169 "/terms-and-conditions" 170 ], 171 "https://pliwriters.com/orphan-page-test": [ 172 "", 173 "", 174 "/blog", 175 "/about", 176 "/contact", 177 "/about", 178 "/blog", 179 "/contact", 180 "/privacy-policy", 181 "/terms-and-conditions" 182 ], 183 "https://pliwriters.com/blog/best-practices-for-website-navigation": [ 184 "", 185 "", 186 "/blog", 187 "/about", 188 "/contact", 189 "/blog/category/uncategorized", 190 "/blog/internal-links-vs-external-links", 191 "/blog/what-are-orphan-pages", 192 "/blog/how-to-find-a-sitemap-on-any-website", 193 "/blog/best-practices-for-website-navigation/", 194 "/about", 195 "/blog", 196 "/contact", 197 "/privacy-policy", 198 "/terms-and-conditions" 199 ], 200 "https://pliwriters.com/blog/how-to-find-a-sitemap-on-any-website": [ 201 "", 202 "", 203 "/blog", 204 "/about", 205 "/contact", 206 "/blog/category/uncategorized", 207 "/blog/how-to-find-a-sitemap-on-any-website/", 208 "/about", 209 "/blog", 210 "/contact", 211 "/privacy-policy", 212 "/terms-and-conditions" 213 ], 214 "https://pliwriters.com/blog/internal-links-vs-external-links": [ 215 "", 216 "", 217 "/blog", 218 "/about", 219 "/contact", 220 "/blog/category/uncategorized", 221 "/blog/what-are-orphan-pages", 222 "/blog/internal-links-vs-external-links/", 223 "/about", 224 "/blog", 225 "/contact", 226 "/privacy-policy", 227 "/terms-and-conditions" 228 ] 229 }, 230 "incoming_links": { 231 "/orphan-page-test": 0, 232 "/internal-link-visualization-beta": 1, 233 "/blog/how-to-find-internal-links-to-a-page": 2, 234 "/blog/3-key-components-of-seo": 3, 235 "/blog/what-are-orphan-pages": 4, 236 "/blog/the-ultimate-guide-to-anchor-text": 5, 237 "/blog/best-practices-for-website-navigation": 5, 238 "/blog/internal-links-vs-external-links": 5, 239 "/blog/category/uncategorized": 7, 240 "/blog/how-to-find-a-sitemap-on-any-website": 7, 241 "/privacy-policy": 15, 242 "/terms-and-conditions": 15, 243 "/about": 30, 244 "/blog": 30, 245 "": 31, 246 "/contact": 34 247 }, 248 "outgoing_links": { 249 "/blog/category/uncategorized": 20, 250 "/blog/how-to-find-internal-links-to-a-page": 16, 251 "/blog/best-practices-for-website-navigation": 15, 252 "/blog/3-key-components-of-seo": 14, 253 "/blog/what-are-orphan-pages": 13, 254 "/blog/the-ultimate-guide-to-anchor-text": 13, 255 "/blog/internal-links-vs-external-links": 13, 256 "/blog": 12, 257 "/internal-link-visualization-beta": 12, 258 "/blog/how-to-find-a-sitemap-on-any-website": 12, 259 "": 10, 260 "/orphan-page-test": 10, 261 "/terms-and-conditions": 9, 262 "/privacy-policy": 9, 263 "/about": 8, 264 "/contact": 8 265 } 266}
Actor Metrics
10 monthly users
-
2 stars
91% runs succeeded
Created in Nov 2024
Modified 17 days ago