GitHub Projects Scraper avatar

GitHub Projects Scraper

Pricing

from $0.01 / actor start

Go to Apify Store
GitHub Projects Scraper

GitHub Projects Scraper

Discover, score, and rank GitHub repositories by quality, stack fit, and reusable feature value. Top Benefits Stack-aware discovery: filters repos by your real stack (React, PostgreSQL, TypeScript, etc.) Actionable ranking: each repo gets score + compatibility + feature intelligence

Pricing

from $0.01 / actor start

Rating

0.0

(0)

Developer

Algirdas Kolesnikovas

Algirdas Kolesnikovas

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

a month ago

Last modified

Share

GitHub Projects Scraper — Apify Actor

Purpose: This actor searches GitHub repositories using the GitHub Search API, enriches each repository with additional metadata (README, topics, languages, contributors, activity), scores build potential (0–100), and saves the best candidates to an Apify dataset.

Publish Description (Apify Store)

Discover high-potential GitHub repositories you can build on.
This Actor searches GitHub by your criteria, enriches each repo with README/topic/language/activity metadata, scores it, and highlights:

  • stackCompatibility (how well it matches your tech stack)
  • stealableFeatures (concrete features worth reusing)
  • topFeature (highest-value detected capability)

Use it for startup validation, boilerplate discovery, competitor research, and feature inspiration.

Credentials / Security

  • Required: githubToken (GitHub Personal Access Token)
  • The Actor does not work without a valid token.
  • Never hardcode tokens in code, README, or committed files.
  • If a token was ever shared in chat/screenshots/logs, revoke it and create a new one.

Input

The input schema is defined in .actor/input_schema.json. Key fields:

  • githubToken (string, required): GitHub Personal Access Token with repo read access.
  • searchQueries (string[]): Search queries, e.g. ["ai agent", "saas boilerplate"].
  • languages (string[]): Language filters, e.g. ["TypeScript", "Python"].
  • minStars (integer): Minimum stars (default: 50).
  • maxStars (integer): Maximum stars (default: 50000).
  • minForks (integer): Minimum forks (default: 5).
  • pushedAfter (string, date): Only repos with last commit after this date (YYYY-MM-DD). Default: 1 year ago.
  • excludeTopics (string[]): Topics to exclude, e.g. ["deprecated", "abandoned"].
  • includeTopics (string[]): Topics that slightly boost score, e.g. ["boilerplate", "starter", "template", "saas"].
  • maxResults (integer): Maximum number of projects to save (default: 100).
  • scoreThreshold (number): Minimum total score (0–100) (default: 10).
  • techStack (string[]): Your technologies for compatibility scoring.
  • stackMatchThreshold (integer): Minimum stack compatibility score (0–100, default: 30).
  • outputDataset (boolean): Save results to dataset (default: true).

Running locally

npm install
apify run --input='{
"githubToken": "YOUR_TOKEN",
"searchQueries": ["saas boilerplate", "ai agent"],
"languages": ["TypeScript", "Python"],
"minStars": 50,
"maxStars": 50000,
"maxResults": 50,
"scoreThreshold": 60
}'

Output dataset

Each item in the dataset has (at least) the following structure:

  • id, name, fullName, url, description
  • stars, forks, watchers, openIssues, language, languages
  • topics, license, createdAt, updatedAt, pushedAt
  • score (0–100) and scoreBreakdown with per-criterion scores
  • readme metadata (exists, wordCount, hasSections, preview)
  • contributors (top 5)
  • buildPotential: "HIGH" | "MEDIUM" | "LOW"
  • suggestedUseCases: string[]
  • stackCompatibility: { score, compatibility, matched, missing }
  • stealableFeatures: ranked feature list with confidence/signals/steal tips
  • topFeature: highest value extracted feature or null

Notes

  • All GitHub API calls use a small retry + exponential backoff strategy.
  • Rate limiting errors (403/429) are retried a few times before failing the individual request.
  • Invalid tokens (401) stop the run with a clear error.
  • Missing repos (404) are skipped but logged in debugging.md as needed.