Automated reconnaissance actor for bug bounty hunters avatar

Automated reconnaissance actor for bug bounty hunters

Pricing

from $0.50 / full recon scan

Go to Apify Store
Automated reconnaissance actor for bug bounty hunters

Automated reconnaissance actor for bug bounty hunters

This Apify actor automates bug bounty recon by scraping the Wayback Machine and GitHub for legacy attack surfaces. It extracts historical URLs, public code, and deprecated files, parsing them to uncover hidden subdomains and forgotten API endpoints. The findings are saved into structured JSON files.

Pricing

from $0.50 / full recon scan

Rating

0.0

(0)

Developer

Zaher el siddik

Zaher el siddik

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 days ago

Last modified

Categories

Share

Bug Bounty Recon Actor

Automated reconnaissance for bug bounty hunters. Discovers hidden API endpoints, subdomains, interesting files, and potential secrets from Wayback Machine archives and GitHub code search.

What It Does

  1. Wayback Machine CDX API — Retrieves thousands of historical URLs for a target domain, extracting subdomains, API endpoints, and interesting files (.env, .json, .sql, .bak, etc.)
  2. GitHub Code Search — Finds code referencing the target domain across all public repos, searching for API keys, passwords, tokens, configs, and endpoints
  3. Deep Content Analysis — Fetches archived pages and raw GitHub files, parsing them with 30+ regex patterns for hidden endpoints and 15 secret detectors
  4. Structured Output — Deduplicates and categorizes all findings into a clean JSON dataset

Input

ParameterTypeDefaultDescription
targetDomainstringrequiredRoot domain to recon (e.g. example.com)
githubTokenstringGitHub PAT for authenticated code search (prevents rate limits)
waybackLimitinteger5000Max historical URLs from Wayback CDX
fetchContentbooleantrueFetch & parse archived pages for endpoints/secrets
contentFetchLimitinteger200Max pages to fetch for deep analysis
githubSearchPagesinteger5GitHub search result pages (30 results/page)
debugbooleanfalseEnable verbose logging

Output

Results are pushed to the default dataset as structured JSON:

{
"targetDomain": "example.com",
"scanTimestamp": "2024-01-15T12:00:00.000Z",
"totalSubdomains": 15,
"totalEndpoints": 42,
"totalGithubFindings": 87,
"totalWaybackUrls": 5000,
"totalInterestingFiles": 290,
"totalPotentialSecrets": 3,
"subdomains": ["api.example.com", "staging.example.com", ...],
"endpoints": [
{ "path": "/api/v2/users", "category": "Versioned API" },
{ "path": "/graphql", "category": "GraphQL" },
{ "path": "/admin/config", "category": "Admin" },
{ "path": "/auth/oauth/callback", "category": "Authentication" }
],
"endpointsByCategory": { "API": [...], "Authentication": [...], ... },
"interestingFiles": [
{ "url": "https://example.com/.env", "extension": ".env", ... }
],
"githubFindings": [
{ "repository": "org/repo", "file": "config.js", "url": "..." }
],
"potentialSecrets": [
{ "type": "AWS Access Key", "value": "AKIA12345678***REDACTED***", "fullLength": 20 }
]
}

Endpoint Categories

Discovered endpoints are auto-categorized:

  • API/api/* paths
  • Versioned API/v1/*, /v2/* paths
  • GraphQL/graphql endpoints
  • Authentication/auth/*, /oauth/*, /login/*
  • Admin/admin/* paths
  • Internal/internal/*, /private/*, /debug/*
  • Webhook/webhook/*, /webhooks/*
  • API Documentation/swagger*, /openapi*
  • Sensitive File.json, .env, .sql, .bak, .config, etc.
  • File/Upload/upload/*, /media/*
  • Payment/pay/*, /billing/*, /subscription/*
  • User Management/user/*, /account/*, /profile/*

Secret Detection

Scans fetched content for 15 secret patterns:

  • AWS Access Keys & Secret Keys
  • Generic API Keys, Secrets, Passwords
  • Bearer Tokens, JWTs
  • GitHub, Slack, Google, Stripe, SendGrid, Twilio, Mailgun tokens
  • Private Keys (RSA, EC, DSA)
  • Heroku API Keys

All detected values are redacted in output — only the first 12 chars + type are shown.

GitHub Token

A GitHub Personal Access Token is recommended for the GitHub code search phase. Without it, GitHub's unauthenticated rate limit (10 req/min) will be hit quickly.

Generate one at https://github.com/settings/tokensno special scopes needed (public repo access is sufficient).

Tips

  • Start small — Use waybackLimit: 500 and contentFetchLimit: 50 for a quick scan
  • Scale up — Increase limits for thorough recon on high-value targets
  • GitHub token — Highly recommended for meaningful GitHub results
  • Content fetch — Some domains block Wayback Machine serving; set fetchContent: false if the fetch success rate is very low
  • Schedule runs — Run weekly to catch new archived content

Cost

Typical run costs on Apify:

  • Quick scan (500 URLs, 50 content): ~$0.01, 30-60 seconds
  • Full scan (5000 URLs, 200 content): ~$0.03, 2-5 minutes
  • Deep scan (50000 URLs, 2000 content): ~$0.15, 15-30 minutes

Limitations

  • Wayback Machine CDX only returns URLs that were previously archived — not all subdomains/pages will be present
  • Content fetch success depends on the target; some sites block Wayback from serving archived content
  • GitHub code search is limited to indexed public repositories
  • Secret detection uses pattern matching and may produce false positives on placeholder/example values
  • The Actor filters common vulnerability scanner payloads from Wayback results but some noise may remain