Pricing

from $0.01 / actor run

gau - Get All URLs

Fetch known URLs from the Wayback Machine, Common Crawl, AlienVault OTX, and URLScan for any domain. A wrapper around the gau OSINT tool for attack-surface and data-pipeline use.

Pricing

from $0.01 / actor run

Rating

0.0

(0)

Developer

R.L.

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What does gau - Get All URLs do?

gau - Get All URLs is an Apify Actor that wraps the popular open-source OSINT tool gau (Get All URLs). For any domain you give it, it collects every URL ever seen by major web archives and threat-intelligence feeds — the Wayback Machine, Common Crawl, AlienVault OTX, and URLScan — and returns them as a clean, structured dataset.

Running it on the Apify platform gives you API access, scheduling, monitoring, proxy rotation, and easy integration with the rest of your data pipeline — no need to install Go or manage the binary yourself.

Why use gau - Get All URLs?

Attack-surface mapping — enumerate historical and current endpoints, parameters, and forgotten paths for a target domain during authorized security assessments.
Content & SEO audits — discover every URL that archives know about, including pages no longer linked from the live site.
Pipeline integration — feed the resulting URLs into other Apify Actors (HTTP scrapers, vulnerability scanners, link checkers) straight from the dataset.
No local setup — schedule recurring runs, call it from the API, and connect it to Make, Zapier, n8n, Google Drive, and more.

How to use gau - Get All URLs

Open the Actor and go to the Input tab.
Enter one or more domains (bare hostnames such as example.com — no https:// and no path).
Optionally pick the providers, enable subdomains, set date ranges, or filter by extension / status code / MIME type.
Click Start and watch URLs stream into the Output tab.
Download the dataset in JSON, CSV, or Excel — or fetch a plain newline-delimited URL list directly from the dataset API with ?fields=url&format=csv&clean=true.

Input

Configure the run from the Input tab or via JSON. Key fields:

Field	Type	Description
`domains`	array	Required. Bare domains to query, e.g. `["example.com"]`.
`providers`	array	Sources to use: `wayback`, `commoncrawl`, `otx`, `urlscan`.
`includeSubdomains`	boolean	Include subdomains of the target (`--subs`).
`fromDate` / `toDate`	string	Limit by first-seen month, format `YYYYMM`.
`blacklistExtensions`	array	Extensions to skip, e.g. `["png","jpg","woff"]`.
`matchStatusCodes` / `filterStatusCodes`	array	Keep / drop by archived HTTP status.
`matchMimeTypes` / `filterMimeTypes`	array	Keep / drop by archived MIME type.
`removeDuplicateParams`	boolean	Collapse endpoints that differ only in parameter values (`--fp`).
`threads`, `timeout`, `retries`	integer	HTTP client tuning.
`maxResults`	integer	Stop after N URLs (`0` = unlimited).
`proxyConfiguration`	object	Route requests through an Apify or custom proxy.

Example input:

{
    "domains": ["example.com"],
    "providers": ["wayback", "commoncrawl", "otx"],
    "includeSubdomains": true,
    "blacklistExtensions": ["png", "jpg", "css"],
    "maxResults": 5000
}

Output

Each discovered URL becomes one dataset item, pushed to the dataset the instant gau yields it — so rows stream into the Output tab live while the run is still in progress, ready for downstream consumers to pick up immediately. You can download the dataset in various formats such as JSON, HTML, CSV, or Excel. Need just a plain newline-delimited URL list (e.g. to pipe into httpx or nuclei)? Hit the dataset API with ?fields=url&format=csv&clean=true.

{
    "url": "https://www.example.com/login?next=/account",
    "domain": "example.com",
    "host": "www.example.com",
    "scheme": "https",
    "path": "/login",
    "query": "next=/account",
    "fileExtension": null,
    "provider": "wayback"
}

Data table

Field	Description
`url`	The full discovered URL.
`domain`	Which input domain the URL belongs to.
`host`	Hostname of the URL (may be a subdomain).
`scheme`	`http` or `https`.
`path`	URL path component.
`query`	Raw query string, if any.
`fileExtension`	Lower-cased file extension of the path, if any.
`provider`	Source provider when a single provider is selected, otherwise `null`.

How much does it cost?

The Actor is lightweight — it streams text from public archives and does no browser rendering, and it never buffers the full result set in memory, so it runs comfortably at the modest 512 MB default memory regardless of how many URLs a domain has. Most runs finish in seconds to a few minutes. Cost scales with how many URLs a domain has archived and how many providers you query. Large, popular domains can return hundreds of thousands of URLs; use maxResults, blacklistExtensions, and date ranges to keep runs bounded.

Tips and advanced options

Speed vs. completeness — querying all four providers is the most thorough but slowest; pick a subset for quick runs.
Rate limits — if a provider throttles you, enable proxyConfiguration and/or raise retries and timeout.
Noise reduction — blacklist static asset extensions (png,jpg,css,woff,svg) and use removeDuplicateParams to shrink the result set.
Subdomains — includeSubdomains greatly increases coverage (and volume) for an organization.

FAQ, disclaimers, and support

Is this legal? The Actor only reads from public archives and threat-intel feeds; it does not touch the target's own servers. Use it only against domains you own or are explicitly authorized to test, and comply with the providers' and Apify's Terms of Service.

Why are some URLs dead or old? Results come from historical archives, so they include URLs that may no longer exist. That is by design for OSINT and attack-surface work.

Found a bug or need a custom version? Open an issue from the Actor's Issues tab — feedback and custom-solution requests are welcome.

This Actor wraps the open-source gau tool by @lc, distributed under the MIT license.

Ultimate URL Harvester — All Site URLs from 6 Archives

inexhaustible_glass/url-harvester

Get every known URL of any domain from Wayback Machine, Common Crawl, AlienVault OTX, URLScan, crt.sh and sitemap — merged & deduped. Block-proof, no API key.

Hitman studio

Wayback Machine Scraper

glassventures/wayback-machine-scraper

Scrape Wayback Machine archive snapshots for any URL or domain. Get archived URLs, timestamps, status codes, MIME types. Export to JSON, CSV, Excel.

Glass Ventures

Wayback Machine Scraper

gio21/wayback-machine-scraper

List Internet Archive Wayback Machine snapshots for one or more URLs. Returns timestamp, snapshot URL, HTTP status, MIME type, digest. Useful for tracking website changes over time, OSINT research, content recovery, and brand monitoring.

Gio

Wayback Machine Historical Content Scraper

happyfhantum/wayback-machine-historical-content-scraper

Compare archived website snapshots through the Wayback Machine and extract page-history change signals.

Kelsey Todd

4.0

Wayback Machine URL Extractor - Archived URLs

logiover/wayback-machine-url-extractor

Extract every archived URL of any domain from the Internet Archive's Wayback Machine (CDX API). Recover lost or old pages, build redirect maps and run OSINT, with date and status filters. No API key, export to CSV or JSON.

Logiover

Wayback Machine Search

crawlerbros/wayback-machine-search

Query Internet Archive's Wayback Machine for historical snapshots of any URL or domain. Filter by date, HTTP status, MIME type, and deduplicate. Optionally fetch the archived page text. Free public CDX API, no authentication.

Crawler Bros

Wayback Machine Scraper — Archived Snapshots

hipersoft/wayback-machine-scraper

List every Internet Archive (Wayback Machine) snapshot of a URL or whole domain: timestamp, snapshot URL, status code, MIME type and content digest. Filter by date, status and dedupe. For SEO, OSINT and historical research. No key.

hiper soft

Wayback Machine Extractor - Archive.org Downloader

caulleonard/wayback-extractor

Recover dead or deleted websites. Download HTML, files, and snapshot metadata from the Wayback Machine.

Fatih Şahinbaş

Wayback Machine Search

maximedupre/wayback-machine-search

Search Wayback Machine snapshots for URLs, hosts, and domains. Export archive dates, status codes, MIME types, digests, content text, version timelines, reports, and monitoring alerts.

Maxime Dupré

Common Crawl Scraper

crawlerbros/common-crawl-scraper

Query the Common Crawl URL Index for any domain or URL pattern. Discover a site's archived pages, historical URLs, capture dates, HTTP statuses and MIME types for SEO, domain intelligence and research. Also lists the available monthly crawls.

Crawler Bros