gau - Get All URLs avatar

gau - Get All URLs

Pricing

from $0.01 / actor run

Go to Apify Store
gau - Get All URLs

gau - Get All URLs

Fetch known URLs from the Wayback Machine, Common Crawl, AlienVault OTX, and URLScan for any domain. A wrapper around the gau OSINT tool for attack-surface and data-pipeline use.

Pricing

from $0.01 / actor run

Rating

0.0

(0)

Developer

R.L.

R.L.

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

7 days ago

Last modified

Categories

Share

What does gau - Get All URLs do?

gau - Get All URLs is an Apify Actor that wraps the popular open-source OSINT tool gau (Get All URLs). For any domain you give it, it collects every URL ever seen by major web archives and threat-intelligence feeds — the Wayback Machine, Common Crawl, AlienVault OTX, and URLScan — and returns them as a clean, structured dataset.

Running it on the Apify platform gives you API access, scheduling, monitoring, proxy rotation, and easy integration with the rest of your data pipeline — no need to install Go or manage the binary yourself.

Why use gau - Get All URLs?

  • Attack-surface mapping — enumerate historical and current endpoints, parameters, and forgotten paths for a target domain during authorized security assessments.
  • Content & SEO audits — discover every URL that archives know about, including pages no longer linked from the live site.
  • Pipeline integration — feed the resulting URLs into other Apify Actors (HTTP scrapers, vulnerability scanners, link checkers) straight from the dataset.
  • No local setup — schedule recurring runs, call it from the API, and connect it to Make, Zapier, n8n, Google Drive, and more.

How to use gau - Get All URLs

  1. Open the Actor and go to the Input tab.
  2. Enter one or more domains (bare hostnames such as example.com — no https:// and no path).
  3. Optionally pick the providers, enable subdomains, set date ranges, or filter by extension / status code / MIME type.
  4. Click Start and watch URLs stream into the Output tab.
  5. Download the dataset in JSON, CSV, or Excel — or fetch a plain newline-delimited URL list directly from the dataset API with ?fields=url&format=csv&clean=true.

Input

Configure the run from the Input tab or via JSON. Key fields:

FieldTypeDescription
domainsarrayRequired. Bare domains to query, e.g. ["example.com"].
providersarraySources to use: wayback, commoncrawl, otx, urlscan.
includeSubdomainsbooleanInclude subdomains of the target (--subs).
fromDate / toDatestringLimit by first-seen month, format YYYYMM.
blacklistExtensionsarrayExtensions to skip, e.g. ["png","jpg","woff"].
matchStatusCodes / filterStatusCodesarrayKeep / drop by archived HTTP status.
matchMimeTypes / filterMimeTypesarrayKeep / drop by archived MIME type.
removeDuplicateParamsbooleanCollapse endpoints that differ only in parameter values (--fp).
threads, timeout, retriesintegerHTTP client tuning.
maxResultsintegerStop after N URLs (0 = unlimited).
proxyConfigurationobjectRoute requests through an Apify or custom proxy.

Example input:

{
"domains": ["example.com"],
"providers": ["wayback", "commoncrawl", "otx"],
"includeSubdomains": true,
"blacklistExtensions": ["png", "jpg", "css"],
"maxResults": 5000
}

Output

Each discovered URL becomes one dataset item, pushed to the dataset the instant gau yields it — so rows stream into the Output tab live while the run is still in progress, ready for downstream consumers to pick up immediately. You can download the dataset in various formats such as JSON, HTML, CSV, or Excel. Need just a plain newline-delimited URL list (e.g. to pipe into httpx or nuclei)? Hit the dataset API with ?fields=url&format=csv&clean=true.

{
"url": "https://www.example.com/login?next=/account",
"domain": "example.com",
"host": "www.example.com",
"scheme": "https",
"path": "/login",
"query": "next=/account",
"fileExtension": null,
"provider": "wayback"
}

Data table

FieldDescription
urlThe full discovered URL.
domainWhich input domain the URL belongs to.
hostHostname of the URL (may be a subdomain).
schemehttp or https.
pathURL path component.
queryRaw query string, if any.
fileExtensionLower-cased file extension of the path, if any.
providerSource provider when a single provider is selected, otherwise null.

How much does it cost?

The Actor is lightweight — it streams text from public archives and does no browser rendering, and it never buffers the full result set in memory, so it runs comfortably at the modest 512 MB default memory regardless of how many URLs a domain has. Most runs finish in seconds to a few minutes. Cost scales with how many URLs a domain has archived and how many providers you query. Large, popular domains can return hundreds of thousands of URLs; use maxResults, blacklistExtensions, and date ranges to keep runs bounded.

Tips and advanced options

  • Speed vs. completeness — querying all four providers is the most thorough but slowest; pick a subset for quick runs.
  • Rate limits — if a provider throttles you, enable proxyConfiguration and/or raise retries and timeout.
  • Noise reduction — blacklist static asset extensions (png,jpg,css,woff,svg) and use removeDuplicateParams to shrink the result set.
  • SubdomainsincludeSubdomains greatly increases coverage (and volume) for an organization.

FAQ, disclaimers, and support

Is this legal? The Actor only reads from public archives and threat-intel feeds; it does not touch the target's own servers. Use it only against domains you own or are explicitly authorized to test, and comply with the providers' and Apify's Terms of Service.

Why are some URLs dead or old? Results come from historical archives, so they include URLs that may no longer exist. That is by design for OSINT and attack-surface work.

Found a bug or need a custom version? Open an issue from the Actor's Issues tab — feedback and custom-solution requests are welcome.

This Actor wraps the open-source gau tool by @lc, distributed under the MIT license.