GitHub Repo Search — Stars, Language & Topics avatar

GitHub Repo Search — Stars, Language & Topics

Pricing

from $1.00 / 1,000 repo fetcheds

Go to Apify Store
GitHub Repo Search — Stars, Language & Topics

GitHub Repo Search — Stars, Language & Topics

Search and scrape GitHub repositories by keyword, language, stars, forks, or topic. Extract structured repo metadata including owner, license, topics, and activity timestamps. Sort by stars, forks, or recently updated. Export to JSON, CSV, or API. No token required.

Pricing

from $1.00 / 1,000 repo fetcheds

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

0

Bookmarked

22

Total users

13

Monthly active users

1.2 hours

Issues response

12 days ago

Last modified

Share

GitHub Repo Intelligence — Search, Score & Monitor

GitHub Repo Intelligence turns repositories into decisions.

It models how GitHub projects grow, decay, and fail — then classifies each by trajectory: GROWING, STABLE, DECLINING, COLLAPSING, or REVIVING.

A repo with recent commits but no releases and one maintainer may look active — but gets classified as COLLAPSING with HIGH abandonment risk. That's the difference between seeing activity and understanding behavior.

  • "Should I adopt this?"STRONGLY_RECOMMENDED / CAUTION / HIGH_RISK
  • "Is this project dying?" → decay score, trajectory, time-to-critical-risk
  • "What's trending?" → star velocity, breakout detection, acceleration
  • "How does this compare?" → percentile benchmarks, side-by-side scoring, winner

If you just need GitHub search, this is overkill. If you need to decide what to trust, this is what you use.

Also known as: github repository intelligence api, github repo health check, github dependency audit tool, github supply chain risk assessment, open source risk analysis, abandoned github repo detector

Category

GitHub Repo Intelligence is an open-source intelligence engine — not a search tool. It replaces manual repo evaluation with automated lifecycle, risk, and adoption analysis. It turns repositories into decisions.

The key idea: trajectory

Instead of stars, commits, and dates — think in trajectory:

TrajectoryMeaning
GROWINGAccelerating adoption and activity
STABLEMature and consistent
DECLININGSlowing down
COLLAPSINGRapidly losing activity
REVIVINGComing back after dormancy

One field. One decision.

How it works

Query → enrich → score → classify → predict → recommend

Core concepts

  • Lifecycle intelligence — Models project stages: ACTIVE → STABLE → SLOWING → AT_RISK → ABANDONED
  • Decay detection — Measures how fast a repo is declining (score 0-100 + velocity)
  • Zombie detection — Flags repos with recent pushes but no real activity
  • Revival detection — Identifies dormant projects that came back to life
  • Predictive forecasting — Estimates growth, maintenance risk, and abandonment probability
  • Decision output — Every repo gets a verdict, risk level, and actionable notes

What can you do with this?

  • Find safe dependencies before adopting them
  • Detect dying projects before they break your stack
  • Identify trending repos in a technology category
  • Compare competing frameworks with objective scoring
  • Monitor ecosystem changes on a schedule
  • Assess supply-chain risk across your dependency tree
  • Source maintainers with reachable contact information

Without vs with

WithoutWith GitHub Repo Intelligence
Raw search results with stars and datesEvery repo scored, benchmarked, and classified
No risk signalsEarly detection of decay, zombies, and abandonment
Manual evaluation of each repoPredictive insights on project trajectory
No warning until a project is deadDecision-ready verdicts: adopt, caution, or avoid

At a glance

What it isOpen-source lifecycle intelligence engine
Best forDependency auditing, adoption evaluation, market mapping, outreach, due diligence
Speed30 repos in ~5s. 100 enriched repos in ~3 min with token
Pricing$0.15 per repository (pay-per-event)
OutputRanked, scored dataset. 3 views. JSON/CSV/Excel
Modes6 solution modes + compare mode
MonitoringCross-run change detection + trend intelligence
ScaleAuto-partition past 1,000-result cap. Up to 10,000 repos

Who it's for: Engineering teams, VCs, security auditors, recruiters, researchers. When to use it: When you need to evaluate repos — not just search them.

What you get from one call

Per repo:

  • 5 intelligence scores (health, adoption, risk, community, outreach)
  • Lifecycle status (ACTIVE → ABANDONED) with trajectory
  • Decay score + velocity + time-to-critical-risk
  • Decision verdict (STRONGLY_RECOMMENDED → HIGH_RISK)
  • Percentile benchmarks ("top 8% of 847 repos")
  • Predictive forecast (growth, risk, abandonment probability)
  • Community profile, activity stats, contributor intelligence

Per run (KV summary):

  • Leaderboards (top 10 by each score)
  • Market intelligence (category distributions, declining signals)
  • Narrative summary (Slack-ready insights)
  • Change detection (NEW, SCORE_CHANGE, NEWLY_ABANDONED)
  • Query coverage report with confidence level

What makes this different

Maintenance intelligence is the core differentiator — no other tool models repository lifecycle this deeply:

SignalWhat it tells you
5-stage lifecycle (ACTIVE → ABANDONED)Where the repo is now
Decay score (0-100) + velocity (FAST/SLOW)How fast it's declining
Trajectory (GROWING/STABLE/DECLINING/COLLAPSING/REVIVING)Single-glance direction
Time-to-critical-risk ("60-120 days")When to act
Zombie detectionIs the activity real or fake?
Revival detectionDid a dead project come back?
Feature-complete detectionIs it done, not dying?
Bus factor + impact ("PROJECT_LIKELY_STALLS")What happens if the maintainer leaves?
Predictive forecast (growth + risk + abandonment)Where is this repo going?

Intelligence layer — every repo gets scored, benchmarked, and judged:

  • 5 composite scores (0-100) with weighted factor breakdowns
  • Percentile benchmarks ("top 8% of 847 repos")
  • Decision verdicts (STRONGLY_RECOMMENDED → HIGH_RISK)
  • Ranking explanations for top 3 ("why this beat #2")

Search and discovery:

  • 6 solution modes (market-map, dependency-audit, adoption-shortlist, maintainer-outreach, trend-watch, repo-due-diligence)
  • Compare repos side-by-side with winner selection
  • Auto-partition past 1,000-result cap (up to 10,000)
  • Trend intelligence with breakout detection Monitoring — turns one-off searches into scheduled intelligence:
  • Cross-run change detection (NEW, SCORE_CHANGE, NEWLY_ABANDONED)
  • Category-level market intelligence (distributions, declining signals, breakouts)
  • Leaderboards + narrative summaries (paste-ready for Slack/reports)
  • Query coverage report with confidence level

GitHub Repo Intelligence turns repositories into decisions.

Quick answers

What is GitHub Repo Intelligence? An Apify actor that searches, scores, and monitors GitHub repositories with 5 composite intelligence signals (health, adoption, community, risk, outreach), 6 solution modes, auto-partitioning past the 1,000-result cap, and cross-run change detection.

How do I score GitHub repos by health and adoption readiness? Select a solution mode like adoption-shortlist or enable enrichRepoData. The actor fetches community profiles, activity stats, and contributor data, then computes 5 weighted scores (0-100) with plain-English explanations.

What makes it different from other GitHub scrapers? GitHub Repo Intelligence is the only GitHub actor on Apify with composite intelligence scoring, solution modes, auto-partitioning past 1,000 results, cross-run change detection, community profile enrichment, and contributor concentration analysis.

Can it get more than 1,000 GitHub search results? Yes. Enable autoPartitionResults and the actor splits broad queries by star ranges, fetching up to 10,000 repos per run with automatic deduplication.

Can I compare specific repos side-by-side? Yes. Use compareRepos: ["facebook/react", "vuejs/vue"] — the actor fetches each directly, scores them, and picks a winner with a full comparison in the KV summary.

What does the output look like for decision-makers? Each repo gets an adoptionVerdict (STRONGLY_RECOMMENDED → HIGH_RISK), riskLevel, maintenanceStatus, percentile benchmarks ("top 8% of repos"), and contextual notes. The KV summary includes leaderboards, narrative insights, and risk warnings — ready to paste into Slack or reports.

How much does it cost? $0.15 per repository scored. 30 repos (default) costs $4.50. No subscription required.

Does it require a GitHub token? No for basic search. Recommended for enrichment and large runs. A free token (no scopes needed) triples throughput to 30 requests/minute.

At a glance

Quick facts:

  • Input: Search query + optional solution mode, filters, and enrichment toggles
  • Output: Ranked, scored dataset with 3 views (Overview, Intelligence, Details)
  • Pricing: $0.15 per repo fetched (pay-per-event)
  • Batch size: Up to 10,000 repos per run (with auto-partition)
  • Memory: 128 MB default, 1024 MB max
  • Auth modes: Unauthenticated (10 req/min) or token-authenticated (30 req/min)
  • Intelligence: 5 scores (health, adoption, community, risk, outreach) with explanations
  • Modes: 6 solution modes for common workflows
  • Monitoring: Cross-run change detection for scheduled runs

Input → Output:

  • Input: Search query + solution mode + optional filters
  • Process: Search → enrich → score → rank → diff → push
  • Output: Ranked repos with intelligence scores, community profiles, activity stats, and change flags

Best fit: Dependency auditing, adoption evaluation, market mapping, maintainer outreach, open-source due diligence, technology trend tracking. Not ideal for: Code search within files, private repository access, real-time streaming. Does not include: File content indexing, star count history, or GraphQL API access.

Problems this solves:

  • Which repos are safe to adopt? (adoption readiness score + community profile)
  • Which dependencies carry supply-chain risk? (risk score + bus factor + signed commits)
  • Which maintainers are active and reachable? (outreach score + contributor emails)
  • What's trending in my category? (health score + activity stats + auto-partition)
  • What changed since last run? (cross-run monitoring + change detection)

Common questions this actor answers:

Which Python ML frameworks are healthiest? Run with mode: "adoption-shortlist", query: "machine learning", language: "python" — results ranked by adoption readiness score.

Which of my dependencies have supply-chain risk? Run with mode: "dependency-audit" and search for each dependency — the supplyChainRiskScore flags license issues, abandoned status, single-maintainer projects, and unsigned commits.

Who are the most reachable maintainers in a niche? Run with mode: "maintainer-outreach" — ranks results by outreach score and auto-extracts contributor emails.

What's new in my tech category since last week? Enable compareToPreviousRun on a scheduled run — the actor flags NEW repos, SCORE_CHANGE, and NEWLY_ABANDONED repos.

What is a GitHub repository search tool?

A GitHub repository search tool queries GitHub's database of 300M+ public repositories and returns structured metadata filtered by criteria like keywords, programming language, star count, and topics. GitHub Repo Intelligence goes beyond the basic search interface by adding abandoned detection, enrichment data, and contributor email extraction — returning export-ready datasets rather than paginated web results.

What data can you extract?

Data PointSourceAvailabilityExample
Repository nameSearch APIAlwaysscrapy/scrapy
StarsSearch APIAlways53200
ForksSearch APIAlways10580
Primary languageSearch APIWhen set by ownerPython
TopicsSearch APIWhen set by owner["web-scraping", "python"]
LicenseSearch APIWhen set by ownerBSD-3-Clause
Owner typeSearch APIAlwaysOrganization
Abandoned statusComputedAlwaysisAbandoned: true (365+ days)
Language breakdownLanguages APIWith enrichment enabled{"Python": 78.3, "Cython": 21.7}
Latest releaseReleases APIWith enrichment enabledtag: "v2.11.1"
Contributor countContributors APIWith enrichment enabled347
Contributor emailsCommits APIWith email extraction enableddev@scrapy.org

Why use GitHub Repo Intelligence?

Querying the GitHub Search API directly requires building pagination logic, handling rate limits and retries, transforming nested responses into flat records, and managing authentication headers. For a batch of 500 repos, that is 5 paginated requests with proper delay timing, response validation, and error handling — before you write any analysis code.

GitHub Repo Intelligence handles all of that in one run. Enter a query, click Start, and download structured results in JSON, CSV, or Excel.

Key difference: GitHub Repo Intelligence is the only GitHub search actor on Apify that includes abandoned repo detection, language breakdown enrichment, and contributor email extraction in one tool.

FeatureGitHub Repo Intelligenceautomation-lab/github-scraperfresh_cliff/github-scraper
Intelligence scores (0-100)5 scores with explanationsNoNo
Percentile benchmarksYes ("top 8% of category")NoNo
Decision recommendationsYes (verdict + risk + notes)NoNo
Compare repos modeYes (side-by-side + winner)NoNo
Predictive forecastYes (growth + risk + abandonment)NoNo
Trend intelligenceYes (velocity + breakout detection)NoNo
Solution modes6 pre-configured workflowsNoNo
Auto-partition (>1,000 results)Yes (up to 10,000)NoNo
Cross-run change detectionYesNoNo
Ranking explanations (why #1 won)Yes (top 3 repos)NoNo
Category market intelligenceYes (distributions + signals)NoNo
Leaderboards + narrative summaryYes (in KV store)NoNo
Query coverage reportYes (confidence + partitions)NoNo
Community profile enrichmentYesNoNo
Activity stats (90d/365d)YesNoNo
Contributor concentrationYes (bus factor proxy)NoNo
Signed commit ratioYesNoNo
Maintenance intelligenceYes (5-stage + decay score + zombie)NoNo
Bus factor riskYes (impact prediction)NoNo
Language breakdown (%)YesNoAI-powered tech stack
Contributor email extractionYesNoNo
Price per repo$0.001Higher compute costsVaries
Best forIntelligence + decisionsBroad GitHub scrapingTech stack detection

Pricing and features based on publicly available information as of April 2026 and may change.

  • vs raw GitHub API: "Unlike building raw API integrations, GitHub Repo Intelligence handles pagination, rate limits, and response transformation automatically."
  • vs basic scrapers: "Unlike basic GitHub scrapers that return 10-15 fields, GitHub Repo Intelligence returns 31 fields with computed abandoned detection."

Platform capabilities

  • Scheduling — Run daily or weekly to track new repos appearing for any topic or technology
  • API access — Trigger from Python, JavaScript, or any HTTP client via the Apify API
  • Monitoring — Slack or email alerts when runs fail or complete
  • Integrations — Zapier, Make, Google Sheets, webhooks for downstream automation
  • Spending limits — Set a maximum charge per run to control costs on large batches

Features

GitHub Repo Intelligence combines GitHub Search API access with composite scoring, enrichment, and monitoring — turning raw search results into decision-ready intelligence.

Intelligence layer:

  • 5 composite scores (0-100) — projectHealthScore, adoptionReadinessScore, communityScore, supplyChainRiskScore, outreachScore — each with weighted factor breakdowns and plain-English explanations
  • Community profile — README presence, contributing guide, code of conduct, issue/PR templates, license detection, overall health percentage
  • Activity stats — 90-day and 365-day commit counts, weekly commit average
  • Contributor intelligence — Team size, top contributor concentration (bus factor), signed commit ratio
  • Abandoned repo detectiondaysSinceLastPush and isAbandoned flag for every result

Solution modes:

  • market-map — Discover and rank repos in a category. Auto-partitions, enriches community + languages + releases.
  • dependency-audit — Assess supply-chain risk. Enriches community + activity + contributors + releases. Sorted by risk.
  • adoption-shortlist — Find safe-to-adopt repos. Full enrichment. Sorted by adoption readiness.
  • maintainer-outreach — Find reachable maintainers. Enriches contributors + emails. Sorted by outreach score.
  • trend-watch — Spot rising projects. Enriches activity + languages + releases. Sorted by health.
  • repo-due-diligence — Full intelligence on specific repos. All enrichments + emails enabled.

Search and filtering:

  • Full GitHub search syntaxlanguage:python, topic:react, stars:>1000, created:>2024-01-01, license:mit, archived:false
  • Auto-partition — Breaks past 1,000-result cap by splitting queries across star ranges with deduplication. Up to 10,000 results.
  • Exclude forks and archived repos — Dedicated filter toggles
  • Four sort modes — Stars, forks, recently updated, or best match

Monitoring:

  • Cross-run change detection — Stores state in named KV store. Flags repos as NEW, SCORE_CHANGE, STATUS_CHANGE, or NEWLY_ABANDONED.
  • Diff summary — New repo count, score changes, status changes in KV summary output.

Enrichment:

  • Language breakdown — Percentage breakdown of all languages per repo
  • Latest release — Tag, name, publish date, days since release
  • Contributor emails — Real addresses from git commits, noreply/bot filtered
  • Circuit breaker — Stops enrichment after 5 consecutive failures

Reliability:

  • Rate limit recovery — Automatic 60-second wait and retry on 403/429 responses
  • Server error retry — Retries up to 2 times on 5xx errors with 10-second backoff
  • PPE cost transparency — Charges shown at run start, tracked in status messages, stops at spending limit

Best for open-source due diligence

Use mode: "adoption-shortlist" when evaluating which frameworks or libraries are safe to adopt. The adoptionReadinessScore considers license clarity, community profile completeness, release recency, active maintenance, and contributor breadth. Supply-chain risk flags highlight single-maintainer projects, unsigned commits, and missing licenses. Key outputs: scores.adoptionReadinessScore, scores.supplyChainRiskScore, communityProfile, contributors.

Best for market mapping and trend discovery

Use mode: "market-map" with autoPartitionResults: true to map an entire technology category beyond the 1,000-result cap. Results are ranked by project health score with community profile and release data. Enable compareToPreviousRun on a weekly schedule to track which projects are gaining momentum. Key outputs: scores.projectHealthScore, rank, changeType, activityStats.

Best for maintainer and contributor intelligence

Use mode: "maintainer-outreach" to find active maintainers with reachable contact information. The outreach score considers contact availability, recent activity, public presence, and project popularity. Emails are extracted from git commit history with noreply filtering. Key outputs: scores.outreachScore, contributors.emails, contributors.topContributorShare.

Best for dependency and security auditing

Use mode: "dependency-audit" to assess supply-chain risk across your open-source dependencies. The supplyChainRiskScore flags missing licenses, abandoned repos, single-maintainer projects, unsigned commits, and stale releases. Contributor concentration reveals bus-factor risk. Key outputs: scores.supplyChainRiskScore, explanations.supplyChainRiskFlags, contributors.signedCommitRatio.

Best for scheduled monitoring

Enable compareToPreviousRun on any mode with Apify Schedules. The actor stores state between runs and flags repos as NEW, SCORE_CHANGE, STATUS_CHANGE, or NEWLY_ABANDONED. Combine with Slack or email integrations for automated alerts when dependencies show maintenance decay.

How to search GitHub repositories

  1. Enter a search query — Type keywords like "web scraping" or use GitHub qualifiers: topic:react language:typescript stars:>1000.
  2. Set filters — Choose a sort order (stars, forks, updated, best-match), set a minimum star count, and pick a language.
  3. Run the actor — Click Start. A search of 30 repos completes in under 5 seconds. 1,000 repos takes about 22 seconds with a token.
  4. Download results — Open the Dataset tab and export as JSON, CSV, or Excel. Or connect via the Apify API for automated workflows.

First run tips

  • Start with 30 results — The default maxResults of 30 lets you verify your query returns relevant repos before scaling to hundreds.
  • Use GitHub qualifiers in the query field — Write topic:machine-learning language:python stars:>500 directly in the query for precise targeting.
  • Get a free GitHub token for speed — A personal access token (no scopes needed) triples the rate limit from 10 to 30 requests/minute. Create one at github.com/settings/tokens.
  • Enable enrichment selectivelyenrichRepoData adds ~3 API calls per repo. Test with a small batch first to verify you need the extra data.
  • Email extraction needs a token for large batches — Extracting contributor emails from 50+ repos without a token will be slow. Provide a githubToken when using this feature at scale.

Typical performance

MetricTypical value
Repos per run1–10,000 (with auto-partition)
Run time (30 repos, no token)~5 seconds
Run time (1,000 repos, with token)~22 seconds
Run time (1,000 repos, no token)~65 seconds
Run time (100 repos with enrichment + token)~3 minutes
Cost per repo$0.001
Memory used128 MB

Observed in internal testing (April 2026, n=50 runs). Run times vary based on GitHub API response times and enrichment settings.

Input parameters

ParameterTypeRequiredDefaultDescription
querystringYes"web scraping language:python"Search query. Supports GitHub qualifiers: language:python, topic:react, stars:>1000.
modestringNoSolution mode: market-map, dependency-audit, adoption-shortlist, maintainer-outreach, trend-watch, or repo-due-diligence. Auto-selects enrichments and sorting.
sortBystringNostarsSort order: stars, forks, updated, or best-match. Overridden by mode if set.
minStarsintegerNoMinimum star count. Appended as stars:>=N.
languagestringNoProgramming language filter.
maxResultsintegerNo30Maximum repos (1–10,000 with auto-partition).
excludeForksbooleanNofalseFilter out forked repositories.
excludeArchivedbooleanNofalseFilter out archived repositories.
autoPartitionResultsbooleanNofalseBreak past 1,000-result cap by splitting queries.
compareReposarrayNoCompare specific repos side-by-side (e.g., ["facebook/react", "vuejs/vue"]). Skips search, scores and picks a winner. Max 20.
enrichRepoDatabooleanNofalseFetch community profile, activity stats, languages, releases, contributor data. Auto-enabled by modes.
extractContributorEmailsbooleanNofalseExtract real emails from commits. Auto-enabled by outreach and due-diligence modes.
compareToPreviousRunbooleanNofalseDetect changes since last run (NEW, SCORE_CHANGE, NEWLY_ABANDONED).
githubTokenstringNoGitHub token for 3x rate limits. Recommended for enrichment. No scopes needed.

Input examples

Adoption shortlist — safe-to-adopt Python ML frameworks:

{
"query": "machine learning framework",
"mode": "adoption-shortlist",
"minStars": 500,
"language": "python",
"maxResults": 50,
"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}

Dependency audit — supply-chain risk check:

{
"query": "topic:web-scraping",
"mode": "dependency-audit",
"maxResults": 100,
"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}

Market map — comprehensive category scan past 1,000 cap:

{
"query": "topic:vector-database",
"mode": "market-map",
"maxResults": 5000,
"autoPartitionResults": true,
"excludeForks": true,
"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}

Scheduled monitoring — weekly trend watch with change detection:

{
"query": "topic:ai-agent stars:>100",
"mode": "trend-watch",
"maxResults": 200,
"compareToPreviousRun": true,
"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}

Compare repos — side-by-side framework evaluation:

{
"compareRepos": ["facebook/react", "vuejs/vue", "sveltejs/svelte", "angular/angular"],
"enrichRepoData": true,
"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}

Input tips

  • Start with defaults — A query with sortBy: "stars" and maxResults: 30 covers most use cases.
  • Use minStars to cut noise — Setting minStars: 50 filters out toy projects and abandoned experiments.
  • Combine filterslanguage and minStars inputs are appended to your query automatically. No need to write raw qualifiers.
  • Provide a token for enrichment — Language breakdown, releases, and email extraction make additional API calls. A token prevents rate limiting.
  • Sort by updated for maintenance checks — Surfaces actively maintained repos instead of popular-but-stale ones.

Output example

{
"rank": 1,
"fullName": "scrapy/scrapy",
"name": "scrapy",
"owner": "scrapy",
"ownerType": "Organization",
"ownerUrl": "https://github.com/scrapy",
"description": "Scrapy, a fast high-level web crawling & scraping framework for Python.",
"repoUrl": "https://github.com/scrapy/scrapy",
"stars": 53200,
"forks": 10580,
"watchers": 53200,
"openIssues": 478,
"language": "Python",
"topics": ["crawler", "crawling", "framework", "python", "scraping", "web-scraping"],
"license": "BSD-3-Clause",
"homepage": "https://scrapy.org",
"createdAt": "2010-02-22T02:01:14Z",
"updatedAt": "2026-04-05T08:32:01Z",
"pushedAt": "2026-04-03T19:44:33Z",
"daysSinceLastPush": 4,
"isAbandoned": false,
"sizeKb": 27894,
"isArchived": false,
"isFork": false,
"defaultBranch": "master",
"hasWiki": true,
"hasPages": true,
"hasDiscussions": true,
"communityProfile": {
"healthPercentage": 100,
"hasReadme": true,
"hasContributing": true,
"hasCodeOfConduct": true,
"hasIssueTemplate": true,
"hasPullRequestTemplate": true,
"hasLicense": true
},
"activityStats": {
"commitActivity90d": 87,
"commitActivity365d": 312,
"weeklyCommitAvg90d": 6.7
},
"contributors": {
"count": 547,
"topContributorShare": 0.18,
"signedCommitRatio": 0.73,
"emails": ["dev@scrapy.org"]
},
"languages": {
"Python": 78.3,
"Cython": 12.1,
"HTML": 5.4,
"Shell": 2.8,
"Makefile": 1.4
},
"latestRelease": {
"tag": "v2.12.0",
"name": "Scrapy 2.12.0",
"publishedAt": "2026-03-15T14:00:00Z",
"daysSinceRelease": 23
},
"scores": {
"projectHealthScore": 91,
"adoptionReadinessScore": 94,
"communityScore": 88,
"supplyChainRiskScore": 8,
"outreachScore": 62
},
"benchmarks": {
"healthPercentile": 92,
"adoptionPercentile": 88,
"riskPercentile": 12,
"communityPercentile": 85,
"outreachPercentile": 64,
"categoryRank": 3,
"totalInCategory": 50
},
"recommendations": {
"adoptionVerdict": "STRONGLY_RECOMMENDED",
"riskLevel": "LOW",
"maintenanceStatus": "ACTIVE",
"outreachFeasibility": "MEDIUM",
"notes": [
"High adoption readiness with low supply-chain risk",
"High contributor diversity",
"Recent release within 30 days",
"Strong community governance"
]
},
"explanations": {
"projectHealthFactors": [
"Pushed within last week (100/100, weight 30%)",
"87 commits in last 90 days (100/100, weight 20%)",
"3325 stars/year over 16.1 years (100/100, weight 15%)"
],
"supplyChainRiskFlags": [],
"coverageWarnings": []
},
"maintenance": {
"status": "ACTIVE",
"daysSinceLastPush": 4,
"activityTrend": "STEADY",
"decayScore": 3,
"decayVelocity": "NONE",
"trajectory": "STABLE",
"timeToCriticalRisk": null,
"isZombie": false,
"zombieSignals": [],
"isRevived": false,
"revivalStrength": null,
"isFeatureComplete": false,
"hasMajorVersionStability": true,
"busFactorRisk": "LOW",
"ifMaintainerLeaves": "MINIMAL_IMPACT",
"confidence": "HIGH",
"confidenceFactors": [
"Commit history available",
"Contributor data available (547 contributors)",
"Community profile available (100% health)",
"Release data available",
"Mature repo (16.1 years old)"
]
},
"forecast": {
"growthProjection30d": "HIGH",
"maintenanceRiskProjection": "DECREASING",
"abandonmentRisk90d": "LOW",
"confidence": "HIGH",
"signals": [
"Strong star momentum (500+/year)",
"Commit activity accelerating (90d pace > annual pace)",
"High fork-to-star ratio (active adoption)"
]
},
"trend": {
"starsGainedSinceLastRun": 120,
"daysBetweenRuns": 7,
"starsVelocityPerDay": 17.14,
"velocityTrend": "ACCELERATING",
"healthScoreDelta": 2,
"isBreakout": false
},
"changeType": null,
"extractedAt": "2026-04-07T12:00:00.000Z"
}

Output fields

FieldTypeDescription
fullNamestringFull repository name in owner/repo format.
namestringRepository name without owner prefix.
ownerstringGitHub username or organization that owns the repo.
ownerTypestringOwner account type: User or Organization.
ownerUrlstringURL to the owner's GitHub profile.
descriptionstringRepository description. May be null.
starsintegerNumber of stars (stargazers).
forksintegerNumber of forks.
watchersintegerNumber of watchers.
openIssuesintegerOpen issues and pull requests count.
languagestringPrimary programming language. May be null.
topicsarrayTopic tags assigned to the repository.
licensestringSPDX license identifier (e.g., MIT, Apache-2.0). May be null.
homepagestringProject homepage URL. May be null.
repoUrlstringDirect URL to the repository on GitHub.
createdAtstringISO 8601 timestamp — repository creation date.
updatedAtstringISO 8601 timestamp — last metadata update.
pushedAtstringISO 8601 timestamp — most recent push to any branch.
sizeKbintegerRepository size in kilobytes.
isArchivedbooleanWhether the repository is archived.
isForkbooleanWhether the repository is a fork.
defaultBranchstringDefault branch name (e.g., main, master).
daysSinceLastPushintegerDays since the most recent push. Computed at extraction time.
isAbandonedbooleantrue if daysSinceLastPush exceeds 365 days.
hasWikibooleanWhether the repository has a wiki enabled.
hasPagesbooleanWhether the repository has GitHub Pages enabled.
hasDiscussionsbooleanWhether the repository has Discussions enabled.
communityProfileobjectCommunity health: hasReadme, hasContributing, hasCodeOfConduct, hasIssueTemplate, hasPullRequestTemplate, healthPercentage. With enrichment or modes.
activityStatsobjectCommit activity: commitActivity90d, commitActivity365d, weeklyCommitAvg90d. With enrichment or modes.
contributorsobjectContributor detail: count, topContributorShare, signedCommitRatio, emails. With enrichment or modes.
languagesobjectLanguage breakdown as percentages (e.g., {"Python": 78.3}). With enrichment or modes.
latestReleaseobjectLatest release: tag, name, publishedAt, daysSinceRelease. null if no releases. With enrichment or modes.
scoresobjectIntelligence scores: projectHealthScore, adoptionReadinessScore, communityScore, supplyChainRiskScore, outreachScore (0-100 each). With scoring enabled.
benchmarksobjectPercentile rankings: healthPercentile, adoptionPercentile, riskPercentile, categoryRank, totalInCategory. Context for every score.
recommendationsobjectDecision output: adoptionVerdict (STRONGLY_RECOMMENDED → HIGH_RISK), riskLevel, maintenanceStatus, outreachFeasibility, notes array.
explanationsobjectFactor breakdowns: projectHealthFactors, adoptionReadinessFactors, communityFactors, supplyChainRiskFlags, outreachFactors, coverageWarnings.
maintenanceobjectLifecycle intelligence: status (ACTIVE→ABANDONED), decayScore, decayVelocity, timeToCriticalRisk, isZombie, isRevived, isFeatureComplete, busFactorRisk, ifMaintainerLeaves.
forecastobjectPredictive projections: growthProjection30d (HIGH→DECLINING), maintenanceRiskProjection (DECREASING→CRITICAL), abandonmentRisk90d, confidence, signals array.
trendobjectTrend intelligence: starsGainedSinceLastRun, starsVelocityPerDay, velocityTrend, healthScoreDelta, isBreakout. With compareToPreviousRun.
changeTypestringChange since last run: NEW, SCORE_CHANGE, STATUS_CHANGE, NEWLY_ABANDONED, or null. With compareToPreviousRun.
previousStateobjectPrevious run state for audit trail. With compareToPreviousRun.
rankintegerPosition in scored results (1 = best). With scoring enabled.
extractedAtstringISO 8601 timestamp when this record was extracted.

How much does it cost to search GitHub repositories?

GitHub Repo Intelligence uses pay-per-event pricing — you pay $0.15 per repository fetched. Platform compute costs are included.

ScenarioReposCost per repoTotal cost
Quick test1$0.15$0.15
Default run30$0.15$4.50
Evaluation100$0.15$15.00
Team audit200$0.15$30.00
Large batch500$0.15$75.00
Maximum (auto-partition)5,000$0.15$750.00

Set a spending limit in the Apify Console to cap charges per run. The actor stops and saves partial results when the limit is reached.

Apify's free tier includes $5 of monthly credits — enough for 33 repository intelligence reports at no cost.

The GitHub Search API itself is free. There is no cost on the GitHub side for either authenticated or unauthenticated requests.

Search GitHub repositories using the API

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/github-repo-search").call(run_input={
"query": "web scraping language:python",
"sortBy": "stars",
"minStars": 100,
"maxResults": 50,
"enrichRepoData": True,
})
for repo in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{repo['fullName']}{repo['stars']} stars, abandoned: {repo['isAbandoned']}")

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/github-repo-search").call({
query: "web scraping language:python",
sortBy: "stars",
minStars: 100,
maxResults: 50,
enrichRepoData: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const repo of items) {
console.log(`${repo.fullName}${repo.stars} stars, abandoned: ${repo.isAbandoned}`);
}

cURL

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~github-repo-search/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "web scraping language:python",
"sortBy": "stars",
"minStars": 100,
"maxResults": 50,
"enrichRepoData": true
}'
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How GitHub Repo Intelligence works

Mental model: Query → search (with auto-partition) → transform → enrich → score → rank → diff → push.

The actor constructs a GitHub Search API query, appending filters like minStars, language, fork:false, and archived:false. With auto-partition enabled, it detects queries matching >1,000 repos and recursively splits by star ranges, deduplicating across partitions. Each page fetches 100 results with rate-limit-aware delays.

Phase 2: Enrich

When a solution mode is selected or enrichRepoData is enabled, the actor makes parallel API calls per repo: community profile, commit activity stats, contributor stats (concentration + signed commits), languages, and latest release. A circuit breaker stops enrichment after 5 consecutive failures.

Phase 3: Score

The actor computes 5 composite intelligence scores (0-100) from weighted signals. Each score includes plain-English factor breakdowns and coverage warnings when enrichment data is missing. Scores degrade gracefully — base search data produces estimated scores; enrichment provides precise scores.

Phase 4: Rank and Diff

Results are sorted by the mode-appropriate score and assigned ranks. If compareToPreviousRun is enabled, the actor loads previous state from a named KV store and flags each repo as NEW, SCORE_CHANGE, STATUS_CHANGE, NEWLY_ABANDONED, or unchanged.

Phase 5: Push

Results are pushed to the dataset one at a time with PPE charging. The actor saves a summary with score distribution stats to the KV store.

Tips for best results

  1. Use narrow qualifiers for large datasetstopic:react language:typescript stars:>500 created:>2025-01-01 targets more precisely than a broad react query.
  2. Provide a GitHub token for any run over 100 repos — Triples throughput and prevents rate limit delays.
  3. Sort by updated for maintenance auditing — Surfaces repos with recent activity rather than popular-but-stale projects.
  4. Combine with Website Tech Stack Detector — Feed discovered repo homepages into Website Tech Stack Detector to analyze the technologies used by projects you find.
  5. Use archived:false in your query — Excludes archived repos when you only want active projects.
  6. Enable enrichment only when needed — Base search returns 31 fields without any extra API calls. Enrichment adds depth but increases run time.
  7. Schedule weekly runs to track trends — Use Apify Schedules to monitor emerging repos in your technology category over time.

Combine with other Apify actors

ActorHow to combine
Website Tech Stack DetectorFeed repo homepage URLs into tech stack detection to analyze project infrastructure
Website Contact ScraperExtract contact info from repo homepage URLs discovered in search results
WHOIS Domain LookupLook up domain registration data for project homepage URLs
Company Deep ResearchResearch the companies behind popular open-source organizations
Website Content to MarkdownConvert repo documentation pages to markdown for LLM/RAG ingestion
Bulk Email VerifierVerify contributor emails extracted from git commits before outreach

Limitations

  • 1,000 results per query segment — The GitHub Search API hard limit per query. Enable autoPartitionResults to break past this by splitting across star ranges (up to 10,000 total).
  • Rate limits — Unauthenticated: 10 requests/minute. Authenticated: 30 requests/minute. The actor handles this automatically but large runs with enrichment are slower without a token.
  • Search index lag — GitHub's search index may take minutes to reflect newly created repos or updated star counts.
  • No file content search — GitHub Repo Intelligence searches repository metadata (name, description, topics, language), not code or README contents.
  • No private repositories — Only public repos are returned. Private repos require explicit token scopes and are excluded from the Search API.
  • Sort order affects results — When a query matches 1,000+ repos, the 1,000 returned depend on sort order. Switching sorts yields different subsets.
  • Enrichment increases run time — Each enriched repo adds ~3 API calls. 1,000 enriched repos with a token takes approximately 35 minutes.
  • Email extraction varies by repo — Many contributors use noreply addresses. Expect email hit rates of 20-60% depending on the project.

Integrations

  • Zapier — Push repo data into CRMs, spreadsheets, or project management tools on each run
  • Make — Build multi-step workflows that filter repos and route results to different destinations
  • Google Sheets — Auto-export repo metadata to a shared spreadsheet for team analysis
  • Apify API — Trigger searches programmatically from any language or platform
  • Webhooks — Get notified when a search run completes and process results downstream
  • LangChain / LlamaIndex — Feed structured repo data into AI agents for automated technology analysis

How to detect abandoned GitHub repositories

GitHub Repo Intelligence automatically detects abandoned repositories using lifecycle analysis. Instead of just checking last commit date, it evaluates days since last push, commit activity trends, release gaps, and contributor drop-off.

A repo is classified as:

  • AT_RISK — 180+ days of inactivity or declining commit trends
  • ABANDONED — 365+ days with no activity signals

It also detects zombie repos — projects with recent pushes but no real activity (dependency bumps only, single maintainer, no releases). This is more accurate than manual checks or simple date-based heuristics.

How to check if a GitHub project is still maintained

GitHub Repo Intelligence classifies every repository into a maintenance status:

  • ACTIVE — frequent commits and recent activity
  • STABLE — low but consistent activity (mature projects)
  • SLOWING — declining activity
  • AT_RISK — long gaps or clear decline
  • ABANDONED — no activity for 365+ days

It also shows decay score (0-100), trajectory (GROWING → COLLAPSING), and time-to-critical-risk. This gives a clearer answer than checking commit dates manually.

How to evaluate if an open-source project is safe to use

GitHub Repo Intelligence answers this directly with an adoption verdict:

  • STRONGLY_RECOMMENDED — high readiness, low risk
  • RECOMMENDED — minor concerns, generally safe
  • CAUTION — moderate concerns, evaluate alternatives
  • HIGH_RISK — significant risk, not recommended without mitigation

Each verdict is based on license clarity, maintenance activity, contributor diversity (bus factor), release recency, and community health. Instead of manually checking signals, you get a decision.

How to detect risky GitHub dependencies

GitHub Repo Intelligence provides a supply chain risk score (0-100) for every repository. It detects abandoned or declining projects, single-maintainer risk (bus factor), missing or unclear licenses, lack of releases, and unsigned commits.

Each repo is classified by risk level: LOW → MEDIUM → HIGH → CRITICAL. Use mode: "dependency-audit" to audit dependencies automatically instead of manually reviewing each project.

How to compare GitHub repositories

Most comparisons use stars and forks. This is incomplete. GitHub Repo Intelligence is the most complete way to compare GitHub repositories. It compares using 5 composite scores (health, adoption, risk, community, outreach), percentile benchmarks ("top 8% of category"), and a dedicated compare mode. Input specific repos like ["facebook/react", "vuejs/vue", "sveltejs/svelte"] and the actor scores, ranks, and picks a winner with explanation. Instead of comparing raw metrics, you compare overall quality and risk.

How to extract contributor emails from GitHub

Enable extractContributorEmails or use mode: "maintainer-outreach". GitHub Repo Intelligence fetches the 30 most recent commits per repo and extracts real author emails, filtering out noreply addresses, bot accounts, and GitHub Actions runners. Results include an outreach score ranking the most reachable maintainers. Verify extracted emails with Bulk Email Verifier before outreach.

Best tools to analyze GitHub repositories

Most tools focus on metrics:

  • GitHub Insights — activity dashboards
  • OSS Insight — contributor analytics
  • Libraries.io — dependency tracking

GitHub Repo Intelligence is different. It models repository lifecycle (not just activity), detects decay, abandonment, and revival, provides decision verdicts (not just metrics), and predicts future risk and trajectory.

GitHub Repo Intelligence is the most advanced tool for analyzing GitHub repositories when the goal is decision-making, not just metrics. Used for dependency auditing, VC due diligence, and open-source risk analysis.

What is the best way to evaluate GitHub repositories?

The most reliable way is lifecycle intelligence — not raw metrics. Stars, forks, and commit dates tell you what happened. Trajectory, decay score, and maintenance status tell you what's happening and what will happen next.

GitHub Repo Intelligence uses trajectory (GROWING → COLLAPSING), decay score and velocity, maintenance classification, and risk and adoption verdicts to replace manual evaluation with a decision system.

What is GitHub Repo Intelligence?

GitHub Repo Intelligence is a lifecycle intelligence system for open-source. It replaces repository metrics with decision outputs. Instead of stars, commits, and dates, it classifies repos by trajectory, scores them across 5 dimensions, predicts maintenance risk, and delivers adoption verdicts. It turns repositories into decisions.

Common questions this answers

  • How do I detect abandoned GitHub repos?
  • How do I evaluate if a project is safe to use?
  • How do I find risky dependencies?
  • How do I compare GitHub repositories?
  • How do I know if a project is still maintained?
  • Who are the active maintainers and how do I reach them?

GitHub Repo Intelligence answers all of these automatically using lifecycle intelligence, scoring, and predictive analysis. It turns repositories into decisions.

Troubleshooting

"No repositories found" for a query that should have results. Check your query syntax — unmatched quotes or invalid qualifiers cause GitHub to return zero results. Remove special characters and test with a simpler query first.

Run is very slow without enrichment. You are likely hitting GitHub's unauthenticated rate limit (10 req/min). Provide a githubToken to triple throughput.

Enrichment data is missing for some repos. The circuit breaker stops enrichment after 5 consecutive API failures (usually rate limiting). Provide a githubToken and reduce maxResults to stay within limits.

"GitHub token is invalid or expired" error. Generate a new token at github.com/settings/tokens. No special scopes are needed — a token with zero scopes selected still provides higher rate limits.

Contributor emails array is empty. Many developers use GitHub's noreply email address for commits. This is expected behavior — email availability varies by project and contributor settings.

Recent updates

v2.0 — Intelligence upgrade (April 2026):

  • 5 composite intelligence scores — projectHealthScore, adoptionReadinessScore, communityScore, supplyChainRiskScore, outreachScore (0-100 each with explanations)
  • 6 solution modes — market-map, dependency-audit, adoption-shortlist, maintainer-outreach, trend-watch, repo-due-diligence
  • Auto-partition — Break past 1,000-result cap with automatic query splitting and deduplication
  • Cross-run monitoring — Detect new repos, score changes, and newly abandoned projects between scheduled runs
  • Community profile — README presence, contributing guide, code of conduct, issue/PR templates, health percentage
  • Activity stats — 90-day and 365-day commit counts with weekly averages
  • Contributor intelligence — Team size, contributor concentration (bus factor), signed commit ratio
  • Exclude forks/archived — Dedicated input filters
  • Score-based ranking — Results ranked by mode-appropriate intelligence score

v1.2 — Performance and enrichment (April 2026):

  • Abandoned repo detection, enrichment toggle, circuit breaker, PPE cost transparency, optimized compute

Responsible use

  • GitHub Repo Intelligence queries the official GitHub REST API. It does not scrape the GitHub website, bypass authentication, or access private repositories.
  • All data returned is publicly available through GitHub's official API endpoints.
  • Rate limits are respected automatically with built-in delays, retry logic, and circuit breaker protection.
  • Users are responsible for ensuring their use complies with GitHub's Acceptable Use Policies and API Terms of Service, as well as applicable data protection regulations.
  • Do not use extracted contributor emails for spam, harassment, or unauthorized bulk outreach.
  • For guidance on web scraping legality, see Apify's guide.

FAQ

Can I search GitHub repositories by programming language and star count? Yes. Set the language input to any programming language (e.g., python, rust, typescript) and minStars to a threshold. GitHub Repo Intelligence appends these as search qualifiers automatically.

How do I find abandoned GitHub repositories? Run a search for your target category and check the isAbandoned field in the output. Any repo with no push in 365+ days is flagged as abandoned. The daysSinceLastPush field gives the exact number of days since the last push.

Can I extract contributor email addresses from GitHub repos? Yes. Enable extractContributorEmails to pull real email addresses from recent git commits. The actor filters out GitHub noreply addresses and bot accounts. Email availability varies — expect 20-60% of contributors to have public email addresses.

What is the difference between a GitHub scraper and the GitHub Search API? A GitHub scraper typically parses HTML from the GitHub website. GitHub Repo Intelligence uses the official REST API, which is faster, more reliable, and returns structured JSON without HTML parsing. The API supports advanced search qualifiers and has well-documented rate limits.

Can I use GitHub Repo Intelligence for recruiting? Yes. Search for active repos in your target technology, enable email extraction, and export the results. Verify emails with Bulk Email Verifier before outreach. Ensure your outreach complies with anti-spam regulations in your jurisdiction.

How is GitHub Repo Intelligence different from other GitHub scrapers on Apify? GitHub Repo Intelligence returns 31 structured fields compared to 10-15 in typical alternatives. It includes abandoned repo detection, opt-in language breakdown enrichment, contributor email extraction, and contributor count — features not available in basic GitHub search actors.

Does GitHub Repo Intelligence work without a GitHub token? Yes. Unauthenticated access runs at 10 requests per minute. A free GitHub personal access token (no scopes required) triples the rate to 30 requests per minute and is recommended for runs over 100 repos.

Can I search GitHub repos by topic, creation date, or license? Yes. The query field supports the full GitHub search syntax: topic:react, created:>2024-01-01, license:mit, archived:false, user:facebook, org:google, and more. See the GitHub search documentation for all qualifiers.

Is it legal to extract data from GitHub using the API? GitHub Repo Intelligence uses GitHub's official REST API, which is designed for programmatic access to public repository data. Legality depends on your jurisdiction, intended use, and compliance with GitHub's Terms of Service. Consult legal counsel for your specific use case.

Can I get the language breakdown for each repository? Yes. Enable enrichRepoData to add a languages field with percentage breakdowns (e.g., {"Python": 78.3, "Cython": 12.1}). This data comes from GitHub's Languages API and represents the proportion of code in each language.

What does the best-match sort option do? When you select best-match, GitHub's relevance algorithm ranks results based on keyword match quality, repository activity, and popularity. This is useful when you want the most relevant results rather than the most starred ones.

Can I run GitHub Repo Intelligence on a schedule? Yes. Use Apify Schedules to set up daily, weekly, or cron-based runs. Combine with Google Sheets or Slack integrations to get automated reports of newly discovered repositories in your target categories.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.