Pricing

from $0.05 / 1,000 results

Sitemap URL Finder

Find and export URLs from any website’s robots.txt and sitemaps. Enter a domain or website URL, optionally filter matching URLs by text, and get clean dataset rows with the URL, domain, path, source sitemap, and match details.

Pricing

from $0.05 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

Input

The normal input has two required choices, plus an optional URL text filter.

{
  "websites": [
    {
      "url": "https://docs.apify.com"
    }
  ],
  "maxResults": 25
}

Fields

websites: One or more website homepages or domains. The Actor discovers robots.txt and /sitemap.xml automatically.
includeUrlText: Optional text that found URLs must contain. Leave it empty to save every sitemap URL.
maxResults: Maximum number of URLs to save. The default example uses 25 so scheduled smoke runs finish quickly.

Advanced API users can still pass targetUrlRegex, maxRequestsPerCrawl, proxyConfiguration, or legacy startUrls, but they are not needed for a normal run. Concurrency is fixed internally at a low value for the 256 MB memory tier.

Output

The Actor saves useful URL rows to the default dataset.

{
  "url": "https://docs.apify.com/platform/actors",
  "domain": "docs.apify.com",
  "path": "/platform/actors",
  "sourceUrl": "https://docs.apify.com/sitemap_base.xml",
  "sourceDomain": "docs.apify.com",
  "filterType": "contains",
  "filterValue": "/platform/",
  "matchedAt": "2026-05-17T08:50:04.000Z"
}

The run summary is stored in the key-value store as OUTPUT. It includes the filter used, number of matched URLs, processed sitemap counts, discovered URL counts, and failed request count.

Notes

Duplicate URLs are filtered with a compact memory-safe deduper.
Leave includeUrlText empty when you want the full sitemap URL list.
The Actor defaults to a cheap 256 MB memory tier.
Fixed concurrency keeps large gzip sitemap runs inside the cheap memory tier.
The bundled example input is intentionally small and unfiltered so daily checks return results quickly.

Python API

from apify_client import ApifyClient

TOKEN = "YOUR_APIFY_TOKEN"
ACTOR_ID = "TheScrapeLab/sitemap-target-url-extractor"

apify_client = ApifyClient(TOKEN)
actor_client = apify_client.actor(ACTOR_ID)

run_input = {
    "websites": [{"url": "https://docs.apify.com"}],
    "maxResults": 25,
}

call_result = actor_client.call(run_input=run_input)

if call_result is None:
    raise RuntimeError("Actor run failed")

dataset_client = apify_client.dataset(call_result.default_dataset_id)
items = dataset_client.list_items().items

for item in items:
    print(item["url"], item["sourceUrl"])

Sitemap Finder & URL Extractor · Crawl Any XML Sitemap

corent1robert/sitemap-detector

Find and crawl XML sitemaps from any website. Follows sitemap indexes, handles gzip, and exports every page URL with source file and lastmod into a clean dataset. No config needed.

Corentin Robert

Sitemap Detector

coder_zoro/sitemap-detector

Find sitemap URLs fast with our free Sitemap Finder tool. Instantly detect sitemaps from any website for SEO audits, indexing checks, and crawl planning. Improve visibility, site structure insights, and search engine performance in just seconds

Zoro

172

5.0

(3)

Property Finder Scraper

crawlerbros/propertyfinder-scraper

Scrape property listings from Property Finder (UAE, Saudi Arabia, Qatar, Bahrain, Egypt). Extract prices, locations, photos, agent info, and 50+ fields per listing. No proxy required.

Crawler Bros

5.0

(21)

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

One Scales

523

5.0

(3)

Company Domain & Social Links Finder

crawlerbros/company-domain

Given a company name, return the company's official website domain and its social media links (LinkedIn, X/Twitter, Facebook, Instagram, YouTube, TikTok, GitHub).

Crawler Bros

5.0

(14)

Sitemap Finder & Checker Tool

zerobreak/sitemap-finder-checker-tool

Find, validate, and audit XML sitemaps for any website. Deep-checks accessibility, XML validity, response time, file size, encoding, and health score.

ZeroBreak

XML Sitemap Finder & Extractor API

andok/find-sitemap-from-url

Find and extract all XML sitemaps for any domain. Automatically parses robots.txt, scans HTML tags, and recursively follows indexes. Perfect for SEO & web scraping.

Andok

Robots.txt Auditor & Sitemap Finder

andok/robotstxt-auditor

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Andok

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

ando

210

1.0

(1)

LLMs.txt Generator

onescales/the-llms-txt-generator

The most powerful LLMs.txt Generator tool online. Generates LLMs.txt , llms-full.txt and markdown .md files within seconds! Get your website discovered, and recommended by ChatGPT, Claude, Google Gemini, Perplexity, Grok, and every AI. (Great for AEO, AIO, GEO, AI SEO) Made by Hi LLMs

One Scales

106

5.0

(2)