Automated reconnaissance actor for bug bounty hunters
Pricing
from $0.50 / full recon scan
Automated reconnaissance actor for bug bounty hunters
This Apify actor automates bug bounty recon by scraping the Wayback Machine and GitHub for legacy attack surfaces. It extracts historical URLs, public code, and deprecated files, parsing them to uncover hidden subdomains and forgotten API endpoints. The findings are saved into structured JSON files.
Pricing
from $0.50 / full recon scan
Rating
0.0
(0)
Developer
Zaher el siddik
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 days ago
Last modified
Categories
Share
Bug Bounty Recon Actor
Automated reconnaissance for bug bounty hunters. Discovers hidden API endpoints, subdomains, interesting files, and potential secrets from Wayback Machine archives and GitHub code search.
What It Does
- Wayback Machine CDX API — Retrieves thousands of historical URLs for a target domain, extracting subdomains, API endpoints, and interesting files (
.env,.json,.sql,.bak, etc.) - GitHub Code Search — Finds code referencing the target domain across all public repos, searching for API keys, passwords, tokens, configs, and endpoints
- Deep Content Analysis — Fetches archived pages and raw GitHub files, parsing them with 30+ regex patterns for hidden endpoints and 15 secret detectors
- Structured Output — Deduplicates and categorizes all findings into a clean JSON dataset
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
targetDomain | string | required | Root domain to recon (e.g. example.com) |
githubToken | string | — | GitHub PAT for authenticated code search (prevents rate limits) |
waybackLimit | integer | 5000 | Max historical URLs from Wayback CDX |
fetchContent | boolean | true | Fetch & parse archived pages for endpoints/secrets |
contentFetchLimit | integer | 200 | Max pages to fetch for deep analysis |
githubSearchPages | integer | 5 | GitHub search result pages (30 results/page) |
debug | boolean | false | Enable verbose logging |
Output
Results are pushed to the default dataset as structured JSON:
{"targetDomain": "example.com","scanTimestamp": "2024-01-15T12:00:00.000Z","totalSubdomains": 15,"totalEndpoints": 42,"totalGithubFindings": 87,"totalWaybackUrls": 5000,"totalInterestingFiles": 290,"totalPotentialSecrets": 3,"subdomains": ["api.example.com", "staging.example.com", ...],"endpoints": [{ "path": "/api/v2/users", "category": "Versioned API" },{ "path": "/graphql", "category": "GraphQL" },{ "path": "/admin/config", "category": "Admin" },{ "path": "/auth/oauth/callback", "category": "Authentication" }],"endpointsByCategory": { "API": [...], "Authentication": [...], ... },"interestingFiles": [{ "url": "https://example.com/.env", "extension": ".env", ... }],"githubFindings": [{ "repository": "org/repo", "file": "config.js", "url": "..." }],"potentialSecrets": [{ "type": "AWS Access Key", "value": "AKIA12345678***REDACTED***", "fullLength": 20 }]}
Endpoint Categories
Discovered endpoints are auto-categorized:
- API —
/api/*paths - Versioned API —
/v1/*,/v2/*paths - GraphQL —
/graphqlendpoints - Authentication —
/auth/*,/oauth/*,/login/* - Admin —
/admin/*paths - Internal —
/internal/*,/private/*,/debug/* - Webhook —
/webhook/*,/webhooks/* - API Documentation —
/swagger*,/openapi* - Sensitive File —
.json,.env,.sql,.bak,.config, etc. - File/Upload —
/upload/*,/media/* - Payment —
/pay/*,/billing/*,/subscription/* - User Management —
/user/*,/account/*,/profile/*
Secret Detection
Scans fetched content for 15 secret patterns:
- AWS Access Keys & Secret Keys
- Generic API Keys, Secrets, Passwords
- Bearer Tokens, JWTs
- GitHub, Slack, Google, Stripe, SendGrid, Twilio, Mailgun tokens
- Private Keys (RSA, EC, DSA)
- Heroku API Keys
All detected values are redacted in output — only the first 12 chars + type are shown.
GitHub Token
A GitHub Personal Access Token is recommended for the GitHub code search phase. Without it, GitHub's unauthenticated rate limit (10 req/min) will be hit quickly.
Generate one at https://github.com/settings/tokens — no special scopes needed (public repo access is sufficient).
Tips
- Start small — Use
waybackLimit: 500andcontentFetchLimit: 50for a quick scan - Scale up — Increase limits for thorough recon on high-value targets
- GitHub token — Highly recommended for meaningful GitHub results
- Content fetch — Some domains block Wayback Machine serving; set
fetchContent: falseif the fetch success rate is very low - Schedule runs — Run weekly to catch new archived content
Cost
Typical run costs on Apify:
- Quick scan (500 URLs, 50 content): ~$0.01, 30-60 seconds
- Full scan (5000 URLs, 200 content): ~$0.03, 2-5 minutes
- Deep scan (50000 URLs, 2000 content): ~$0.15, 15-30 minutes
Limitations
- Wayback Machine CDX only returns URLs that were previously archived — not all subdomains/pages will be present
- Content fetch success depends on the target; some sites block Wayback from serving archived content
- GitHub code search is limited to indexed public repositories
- Secret detection uses pattern matching and may produce false positives on placeholder/example values
- The Actor filters common vulnerability scanner payloads from Wayback results but some noise may remain


