Pricing

from $0.10 / 1,000 results

Sitemap Extractor

This Apify Actor extracts all URLs from a website's sitemaps and checks their status codes via lightweight HTTP requests. It provides a clean list of valid links, acting as an ideal pre-processor to ensure your larger crawling projects target only active URLs.

Pricing

from $0.10 / 1,000 results

Rating

3.1

(5)

Developer

Apify

Actor stats

Bookmarked

183

Total users

Monthly active users

19 hours ago

Last modified

Features

Recursive Sitemap Discovery: Automatically detects and traverses nested sitemaps (sitemap indexes).
Efficiency: Uses HTTP HEAD requests for URL validation, which are significantly faster and consume less bandwidth than full GET requests.
Proxy Support: Integrated with Apify Proxy to prevent rate limiting or blocking during the discovery phase.
Detailed Output: Provides the final URL, the corresponding HTTP status code, and the date-time of the page's last modification.

How it Works

Input: You provide one or more "Start URLs" pointing to the domain name root, sitemaps or sitemap indexes.
Extraction: The Actor parses the XML, extracting both page URLs and links to further sitemaps.
Validation: For every page URL found, the Actor performs a status check.
Deduplication: The crawler uses unique keys to ensure that even if a URL appears in multiple sitemaps, it is only checked once.

Output

For each page URL, the Actor outputs:

Field	Description
`url`	The page URL from the sitemap.
`status`	The HTTP status code returned by the HEAD request.
`lastmod`	Best-effort last-modification time (ISO 8601). See the note below.

A note on last-modification data

The lastmod field is a single best-effort timestamp derived from two sources, in this order of preference:

The <lastmod> tag declared for the URL in the sitemap.
The Last-Modified HTTP header returned by the page (used only when the sitemap has no <lastmod>).

We cannot guarantee that this information is available. Both sources are optional: many sitemaps omit <lastmod> entirely, and a lot of servers don't send a Last-Modified header (this is especially common for dynamically generated pages). When neither source provides a value, lastmod is null. Even when present, the value is self-reported by the site and may not reflect the true last-modification time of the content.

Usage

This Actor is ideal for:

Pre-crawling filter: Generating a "clean" list of URLs for actors like Website Content Crawler or Web Scraper.
SEO Audits: Quickly identifying 404 Not Found or 500 Server Error pages listed in your sitemap.
Site Mapping: Getting a high-level overview of a site's architecture.

Configuration

Field	Description
Start URLs	Just a domain name or a list of sitemap XML URLs to start from.
Proxy configuration	Settings for Apify Proxies.

Google_Reviews_Scrapper

vagadro/google-reviews-scrapper

The Google Reviews Scraper is a powerful Apify actor designed to automatically extract reviews, ratings, and reviewer data from Google Maps. It helps businesses, researchers, and marketers collect large volumes of review data quickly and reliably.

Vagadro

5.0

Website Image Scraper

gomorrhadev/website-image-scraper

Website Image Scraper is a fast, lightweight tool that crawls websites to extract image URLs (jpg, png, svg) without downloading files or using browsers. It supports recursive crawling, respects robots.txt, and efficiently collects image links for analysis or monitoring or a later download.

Gomorrha UG (haftungsbeschränkt)

316

5.0

Website Image Downloader Pro

powerful_bachelor/website-image-downloader-pro

📸 Website Image Downloader Pro: Extract and download images from any URL! 🚀 Features include image URL extraction, SVG to PNG conversion, downloading, and zipping images. Perfect for market research, AI training, and creating visual archives. 🌐✨ Try it now on Apify! 💾

Powerful Bachelor

515

2.5

Instagram related user scraper

thenetaji/instagram-related-user-scraper

Scrape related user for instagram. Supports email extraction too. Try Now!

The Netaji

782

5.0

Bing Images Scraper

thodor/bing-images

Bing images scraper is an online webscraper to scrape images from Bing. Get full-size URLs, source pages, and metadata.

Thodor

Instagram Related Person Scraper

scrapio/instagram-related-person-scraper

Scrapes related people from any Instagram profile, capturing suggested accounts, usernames, bios, follower counts, categories, and profile URLs. Ideal for influencer discovery, audience mapping, competitor research, and automated identification of similar Instagram profiles

Scrapio

110

5.0

Google Maps Scraper

mina_safwat/google-maps-scraper

Extracts rich place data from Google Maps at lightning speed ⚡ .. including names, addresses, phone numbers, websites, ratings, reviews, and precise coordinates 📍. Run it on one or multiple search queries and get clean, reliable results in seconds 🚀

Mina

Instagram Related Profiles Scraper

seemuapps/instagram-related-profiles-scraper

Find Instagram accounts similar to any public profile using Instagram's own suggested-for-you graph. Returns up to 80 related accounts per seed, with optional depth-2 expansion.

Andrew

Facebook People Search Scraper · No Cookies

data-slayer/facebook-search-people

Search and find Facebook profiles by keyword without login. Get profile names, URLs, Facebook IDs, verification status, and profile pictures. Build prospect databases from Facebook people search results. No cookies, no authentication. JSON/CSV/Excel export.