GitHub Org AI Tool Fingerprinter
Pricing
Pay per usage
GitHub Org AI Tool Fingerprinter
Fingerprint which AI dev tools a GitHub organization is using. Scans public repos for CLAUDE.md, AGENTS.md, .cursorrules, Copilot instructions, Continue, Aider, Windsurf and reports adoption rate + per-repo signals.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Yanlong Mu
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
What does GitHub Org AI Tool Fingerprinter do?
GitHub Org AI Tool Fingerprinter scans a GitHub organization's public repositories and reports which AI dev tools the team has adopted — including Claude Code (CLAUDE.md), Cursor (.cursorrules, .cursor/), GitHub Copilot custom instructions (.github/copilot-instructions.md), Aider, Continue, Windsurf, and the emerging AGENTS.md spec. For each hit you get the file path, file size, last-modified date, repo stars, and primary language, plus a final summary row with adoption percentage across the whole org. Run it ad-hoc, schedule it via the Apify platform, integrate via the API, or pipe it into Zapier / Make / your CRM.
Built by Ian Mu as Actor #9 in his 100-actor portfolio. Code style and verification patterns follow the claude-verify-before-stop playbook — short scripts, immutable data, explicit error handling, and silent 404s.
Why use GitHub Org AI Tool Fingerprinter?
- Sales intelligence: find orgs that already use Claude Code / Cursor and target them with relevant outreach.
- Investor research: gauge AI-tool adoption signal across portfolio companies or a competitor's eng team.
- Recruiting: surface AI-forward engineering orgs to source from.
- Internal audit: scan your own org to see which repos still need a
CLAUDE.mdorAGENTS.md. - Ecosystem analysis: track adoption of new dev-tool standards over time by re-running on a schedule.
How to use GitHub Org AI Tool Fingerprinter
- Open the Actor in Apify Console and click Run.
- Enter a GitHub org or user slug in the Input tab (e.g.
vercel,shopify,anthropics). - Optionally tune maxReposToScan (default 30) and signalsToCheck (default: all 7 signals).
- Hit Start and wait — typical run is 10–60 seconds for 30 repos.
- Open the Dataset tab to see signal-hits and the final
_summaryrow with adoption percentage. - Export to JSON / CSV / Excel, or call the dataset over the Apify API.
If you have a personal GitHub token, set it as the GITHUB_TOKEN env var on the Actor to lift the 60 req/hour anonymous rate limit to 5000/hour.
Input
The Actor accepts three input fields (see the Input tab for the full form):
{"githubOrg": "vercel","maxReposToScan": 30,"signalsToCheck": ["claude-md","agents-md","cursor","continue","copilot-instructions","aider","windsurf"]}
githubOrg(string, required) — org or user slug to scan.maxReposToScan(integer, default 30, max 100) — repo cap. The Actor lists repos sorted by most-recently-pushed and skips forks / archived / private.signalsToCheck(array of strings, default all) — which AI tool fingerprints to look for.
Output
The Actor pushes one row to the dataset per signal-hit, plus a final _summary row. Sample output:
[{"org": "vercel","repo": "next.js","repoUrl": "https://github.com/vercel/next.js","signal": "claude-md","filePath": "CLAUDE.md","fileSize": 4231,"lastModified": "2026-04-12T14:23:00Z","repoStars": 128000,"repoLanguage": "JavaScript"},{"_summary": true,"org": "vercel","reposScanned": 30,"signalsFound": {"claude-md": 3,"agents-md": 1,"cursor": 8,"continue": 0,"copilot-instructions": 2,"aider": 0,"windsurf": 0},"totalReposWithAnyAiSignal": 12,"adoptionPct": 40}]
You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.
Data table
| Field | Type | Description |
|---|---|---|
org | string | The org you scanned |
repo | string | Repo name (short, no owner prefix) |
repoUrl | string | Full GitHub URL |
signal | string | One of: claude-md, agents-md, cursor, continue, copilot-instructions, aider, windsurf |
filePath | string | Path to the file/dir that matched |
fileSize | number | File size in bytes (0 for directories) |
lastModified | string | ISO-8601 date of the latest commit touching the path |
repoStars | number | Stargazer count |
repoLanguage | string | Primary language of the repo |
_summary | boolean | True on the final summary row only |
signalsFound | object | Per-signal count (only on summary row) |
adoptionPct | number | % of scanned repos with at least one AI signal |
Pricing / Cost estimation
This is a lightweight HTTP-only Actor — no headless browser. A 30-repo scan typically costs a few hundredths of a compute unit and finishes inside a minute. How much does it cost to fingerprint a GitHub org? On a free Apify account you can run this comfortably under the monthly free tier; large scans (100 repos × 7 signals = up to ~800 API calls) are still pennies of compute and the bottleneck is GitHub's rate limit, not Apify compute. Set the GITHUB_TOKEN env var to lift that limit.
Tips and advanced options
- Set
GITHUB_TOKENin the Actor's env vars to get 5000 req/hour instead of 60 (anonymous). Use a fine-grained token with read-only public-repo access. - Schedule it monthly via the Apify schedule feature on a list of competitor / portfolio orgs to track adoption trends.
- Pipe to Slack / Notion / Sheets via Apify integrations to keep a live "who's using Claude Code" dashboard.
- Narrow signals to one or two (
["claude-md"]) to scan more repos within the rate-limit budget. - Pair with Apify MCP Server Catalog (Actor #1 in the portfolio) to also surface MCP server adoption inside the same org.
FAQ, disclaimers, and support
Is this legal? Yes — the Actor only calls the public GitHub REST API (api.github.com) at endpoints /orgs/{org}/repos, /repos/{owner}/{repo}/contents/{path}, and /repos/{owner}/{repo}/commits. It respects GitHub's anonymous rate limit (60 req/hr) by stopping early and saving partial results. No scraping of HTML pages, no auth bypass, no ToS issues.
What about private repos? Anonymous scans skip them (they don't appear in the response). If you set GITHUB_TOKEN on the Actor env, private repos your token can read will also be included.
Why is adoptionPct 0%? Either the org doesn't use these tools (yet), or you hit the rate limit before any signal was found — check the _summary.rateLimitHit field and the Actor log.
Found a bug or want a new signal added (e.g. .zed/, roo-cline, cline)? Open an issue on the GitHub repo or contact Ian Mu directly. Custom variants of this Actor (e.g. scanning an entire user's starred-repo network) are available on request.
Limitations:
- Anonymous rate limit (60/hr) restricts you to ~7–8 repos worth of probes per run unless you set
GITHUB_TOKEN. - Detection is filename-based only — a repo with
CLAUDE.mdcontent that's just "TODO" still counts as a hit. Combine with file-size to filter. - The
cursorsignal matches.cursorrulesOR.cursor/OR.cursor/rules; you can refine post-hoc byfilePath.
MIT license. See github.com/ianymu/claude-verify-before-stop for the broader Ian Mu Actor portfolio playbook.