GitHub Projects Scraper
Pricing
from $0.01 / actor start
GitHub Projects Scraper
Discover, score, and rank GitHub repositories by quality, stack fit, and reusable feature value. Top Benefits Stack-aware discovery: filters repos by your real stack (React, PostgreSQL, TypeScript, etc.) Actionable ranking: each repo gets score + compatibility + feature intelligence
Pricing
from $0.01 / actor start
Rating
0.0
(0)
Developer
Algirdas Kolesnikovas
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
a month ago
Last modified
Categories
Share
GitHub Projects Scraper — Apify Actor
Purpose: This actor searches GitHub repositories using the GitHub Search API, enriches each repository with additional metadata (README, topics, languages, contributors, activity), scores build potential (0–100), and saves the best candidates to an Apify dataset.
Publish Description (Apify Store)
Discover high-potential GitHub repositories you can build on.
This Actor searches GitHub by your criteria, enriches each repo with README/topic/language/activity metadata, scores it, and highlights:
stackCompatibility(how well it matches your tech stack)stealableFeatures(concrete features worth reusing)topFeature(highest-value detected capability)
Use it for startup validation, boilerplate discovery, competitor research, and feature inspiration.
Credentials / Security
- Required:
githubToken(GitHub Personal Access Token) - The Actor does not work without a valid token.
- Never hardcode tokens in code, README, or committed files.
- If a token was ever shared in chat/screenshots/logs, revoke it and create a new one.
Input
The input schema is defined in .actor/input_schema.json. Key fields:
- githubToken (string, required): GitHub Personal Access Token with repo read access.
- searchQueries (string[]): Search queries, e.g.
["ai agent", "saas boilerplate"]. - languages (string[]): Language filters, e.g.
["TypeScript", "Python"]. - minStars (integer): Minimum stars (default: 50).
- maxStars (integer): Maximum stars (default: 50000).
- minForks (integer): Minimum forks (default: 5).
- pushedAfter (string, date): Only repos with last commit after this date (
YYYY-MM-DD). Default: 1 year ago. - excludeTopics (string[]): Topics to exclude, e.g.
["deprecated", "abandoned"]. - includeTopics (string[]): Topics that slightly boost score, e.g.
["boilerplate", "starter", "template", "saas"]. - maxResults (integer): Maximum number of projects to save (default: 100).
- scoreThreshold (number): Minimum total score (0–100) (default: 10).
- techStack (string[]): Your technologies for compatibility scoring.
- stackMatchThreshold (integer): Minimum stack compatibility score (0–100, default: 30).
- outputDataset (boolean): Save results to dataset (default: true).
Running locally
npm installapify run --input='{"githubToken": "YOUR_TOKEN","searchQueries": ["saas boilerplate", "ai agent"],"languages": ["TypeScript", "Python"],"minStars": 50,"maxStars": 50000,"maxResults": 50,"scoreThreshold": 60}'
Output dataset
Each item in the dataset has (at least) the following structure:
id,name,fullName,url,descriptionstars,forks,watchers,openIssues,language,languagestopics,license,createdAt,updatedAt,pushedAtscore(0–100) andscoreBreakdownwith per-criterion scoresreadmemetadata (exists, wordCount, hasSections, preview)contributors(top 5)buildPotential:"HIGH" | "MEDIUM" | "LOW"suggestedUseCases: string[]stackCompatibility:{ score, compatibility, matched, missing }stealableFeatures: ranked feature list with confidence/signals/steal tipstopFeature: highest value extracted feature or null
Notes
- All GitHub API calls use a small retry + exponential backoff strategy.
- Rate limiting errors (403/429) are retried a few times before failing the individual request.
- Invalid tokens (
401) stop the run with a clear error. - Missing repos (
404) are skipped but logged indebugging.mdas needed.