GitHub Org AI Tool Fingerprinter avatar

GitHub Org AI Tool Fingerprinter

Pricing

Pay per usage

Go to Apify Store
GitHub Org AI Tool Fingerprinter

GitHub Org AI Tool Fingerprinter

Fingerprint which AI dev tools a GitHub organization is using. Scans public repos for CLAUDE.md, AGENTS.md, .cursorrules, Copilot instructions, Continue, Aider, Windsurf and reports adoption rate + per-repo signals.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Yanlong Mu

Yanlong Mu

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

What does GitHub Org AI Tool Fingerprinter do?

GitHub Org AI Tool Fingerprinter scans a GitHub organization's public repositories and reports which AI dev tools the team has adopted — including Claude Code (CLAUDE.md), Cursor (.cursorrules, .cursor/), GitHub Copilot custom instructions (.github/copilot-instructions.md), Aider, Continue, Windsurf, and the emerging AGENTS.md spec. For each hit you get the file path, file size, last-modified date, repo stars, and primary language, plus a final summary row with adoption percentage across the whole org. Run it ad-hoc, schedule it via the Apify platform, integrate via the API, or pipe it into Zapier / Make / your CRM.

Built by Ian Mu as Actor #9 in his 100-actor portfolio. Code style and verification patterns follow the claude-verify-before-stop playbook — short scripts, immutable data, explicit error handling, and silent 404s.

Why use GitHub Org AI Tool Fingerprinter?

  • Sales intelligence: find orgs that already use Claude Code / Cursor and target them with relevant outreach.
  • Investor research: gauge AI-tool adoption signal across portfolio companies or a competitor's eng team.
  • Recruiting: surface AI-forward engineering orgs to source from.
  • Internal audit: scan your own org to see which repos still need a CLAUDE.md or AGENTS.md.
  • Ecosystem analysis: track adoption of new dev-tool standards over time by re-running on a schedule.

How to use GitHub Org AI Tool Fingerprinter

  1. Open the Actor in Apify Console and click Run.
  2. Enter a GitHub org or user slug in the Input tab (e.g. vercel, shopify, anthropics).
  3. Optionally tune maxReposToScan (default 30) and signalsToCheck (default: all 7 signals).
  4. Hit Start and wait — typical run is 10–60 seconds for 30 repos.
  5. Open the Dataset tab to see signal-hits and the final _summary row with adoption percentage.
  6. Export to JSON / CSV / Excel, or call the dataset over the Apify API.

If you have a personal GitHub token, set it as the GITHUB_TOKEN env var on the Actor to lift the 60 req/hour anonymous rate limit to 5000/hour.

Input

The Actor accepts three input fields (see the Input tab for the full form):

{
"githubOrg": "vercel",
"maxReposToScan": 30,
"signalsToCheck": [
"claude-md",
"agents-md",
"cursor",
"continue",
"copilot-instructions",
"aider",
"windsurf"
]
}
  • githubOrg (string, required) — org or user slug to scan.
  • maxReposToScan (integer, default 30, max 100) — repo cap. The Actor lists repos sorted by most-recently-pushed and skips forks / archived / private.
  • signalsToCheck (array of strings, default all) — which AI tool fingerprints to look for.

Output

The Actor pushes one row to the dataset per signal-hit, plus a final _summary row. Sample output:

[
{
"org": "vercel",
"repo": "next.js",
"repoUrl": "https://github.com/vercel/next.js",
"signal": "claude-md",
"filePath": "CLAUDE.md",
"fileSize": 4231,
"lastModified": "2026-04-12T14:23:00Z",
"repoStars": 128000,
"repoLanguage": "JavaScript"
},
{
"_summary": true,
"org": "vercel",
"reposScanned": 30,
"signalsFound": {
"claude-md": 3,
"agents-md": 1,
"cursor": 8,
"continue": 0,
"copilot-instructions": 2,
"aider": 0,
"windsurf": 0
},
"totalReposWithAnyAiSignal": 12,
"adoptionPct": 40
}
]

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

Data table

FieldTypeDescription
orgstringThe org you scanned
repostringRepo name (short, no owner prefix)
repoUrlstringFull GitHub URL
signalstringOne of: claude-md, agents-md, cursor, continue, copilot-instructions, aider, windsurf
filePathstringPath to the file/dir that matched
fileSizenumberFile size in bytes (0 for directories)
lastModifiedstringISO-8601 date of the latest commit touching the path
repoStarsnumberStargazer count
repoLanguagestringPrimary language of the repo
_summarybooleanTrue on the final summary row only
signalsFoundobjectPer-signal count (only on summary row)
adoptionPctnumber% of scanned repos with at least one AI signal

Pricing / Cost estimation

This is a lightweight HTTP-only Actor — no headless browser. A 30-repo scan typically costs a few hundredths of a compute unit and finishes inside a minute. How much does it cost to fingerprint a GitHub org? On a free Apify account you can run this comfortably under the monthly free tier; large scans (100 repos × 7 signals = up to ~800 API calls) are still pennies of compute and the bottleneck is GitHub's rate limit, not Apify compute. Set the GITHUB_TOKEN env var to lift that limit.

Tips and advanced options

  • Set GITHUB_TOKEN in the Actor's env vars to get 5000 req/hour instead of 60 (anonymous). Use a fine-grained token with read-only public-repo access.
  • Schedule it monthly via the Apify schedule feature on a list of competitor / portfolio orgs to track adoption trends.
  • Pipe to Slack / Notion / Sheets via Apify integrations to keep a live "who's using Claude Code" dashboard.
  • Narrow signals to one or two (["claude-md"]) to scan more repos within the rate-limit budget.
  • Pair with Apify MCP Server Catalog (Actor #1 in the portfolio) to also surface MCP server adoption inside the same org.

FAQ, disclaimers, and support

Is this legal? Yes — the Actor only calls the public GitHub REST API (api.github.com) at endpoints /orgs/{org}/repos, /repos/{owner}/{repo}/contents/{path}, and /repos/{owner}/{repo}/commits. It respects GitHub's anonymous rate limit (60 req/hr) by stopping early and saving partial results. No scraping of HTML pages, no auth bypass, no ToS issues.

What about private repos? Anonymous scans skip them (they don't appear in the response). If you set GITHUB_TOKEN on the Actor env, private repos your token can read will also be included.

Why is adoptionPct 0%? Either the org doesn't use these tools (yet), or you hit the rate limit before any signal was found — check the _summary.rateLimitHit field and the Actor log.

Found a bug or want a new signal added (e.g. .zed/, roo-cline, cline)? Open an issue on the GitHub repo or contact Ian Mu directly. Custom variants of this Actor (e.g. scanning an entire user's starred-repo network) are available on request.

Limitations:

  • Anonymous rate limit (60/hr) restricts you to ~7–8 repos worth of probes per run unless you set GITHUB_TOKEN.
  • Detection is filename-based only — a repo with CLAUDE.md content that's just "TODO" still counts as a hit. Combine with file-size to filter.
  • The cursor signal matches .cursorrules OR .cursor/ OR .cursor/rules; you can refine post-hoc by filePath.

MIT license. See github.com/ianymu/claude-verify-before-stop for the broader Ian Mu Actor portfolio playbook.