Github Repositry Scraper
Pricing
from $0.01 / 1,000 results
Github Repositry Scraper
Scrape GitHub repos by URL, search, or trending. Extract stars, forks, topics, languages, contributors & more. No login needed.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer
Amna Iftikhar
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
GitHub Repository Scraper
Extract comprehensive data from GitHub repositories — by direct URL, keyword search, or trending. No login, no API keys required.
Perfect for competitive analysis, lead generation, market research, AI training data, and developer tooling pipelines.
🚀 3 Modes — Pick One
⚠️ Each mode uses different input fields. Only fill in fields for the mode you choose.
| Mode | When to use it |
|---|---|
repos | You already have specific GitHub URLs you want to scrape |
search | You want to discover repos by keyword or language |
trending | You want GitHub's trending repos right now |
⚙️ Input by Mode
Mode: repos — Scrape specific repositories
{"mode": "repos","repoUrls": ["https://github.com/facebook/react","https://github.com/vercel/next.js"],"maxResults": 10,"includeReadme": false}
| Field | Required | Description |
|---|---|---|
mode | ✅ | Set to "repos" |
repoUrls | ✅ | List of GitHub repo URLs to scrape |
maxResults | optional | Max repos to scrape (default: 10) |
includeReadme | optional | Also fetch README content (default: false) |
Mode: search — Find repos by keyword
{"mode": "search","searchQuery": "machine learning","searchLanguage": "Python","searchSort": "stars","maxResults": 50,"includeReadme": false}
| Field | Required | Description |
|---|---|---|
mode | ✅ | Set to "search" |
searchQuery | ✅ | Keywords to search (e.g. "web scraper") |
searchLanguage | optional | Filter by language e.g. "Python", "JavaScript" |
searchSort | optional | Sort by "stars", "forks", or "updated" (default: "stars") |
maxResults | optional | Max repos to return, up to 300 (default: 10) |
includeReadme | optional | Also fetch README content (default: false) |
Mode: trending — Get GitHub's trending repos
{"mode": "trending","trendingLanguage": "python","trendingPeriod": "weekly","maxResults": 25,"includeReadme": false}
| Field | Required | Description |
|---|---|---|
mode | ✅ | Set to "trending" |
trendingLanguage | optional | Filter by language e.g. "python", "rust" — leave empty for all |
trendingPeriod | optional | "daily", "weekly", or "monthly" (default: "daily") |
maxResults | optional | Max repos to return (default: 10) |
includeReadme | optional | Also fetch README content (default: false) |
📦 Output Fields
Each scraped repository returns:
{"url": "https://github.com/facebook/react","fullName": "facebook/react","owner": "facebook","name": "react","repoId": "10270250","description": "The library for web and native user interfaces.","website": "https://react.dev","topics": ["react", "javascript", "library", "ui", "frontend"],"primaryLanguage": "JavaScript","languages": { "JavaScript": "68.1%", "TypeScript": "29.0%" },"license": "MIT","stars": 243937,"starsDisplay": "244k","forks": 50761,"watchers": 6700,"openIssues": 809,"openPullRequests": 355,"commits": 21425,"contributors": 1734,"totalReleases": 118,"latestRelease": "19.2.4","defaultBranch": "main","lastCommitAt": "2026-01-26T18:29:43Z","scrapedAt": "2026-03-13T10:00:00.000Z"}
Enable includeReadme: true to also get readmeText and readmeHtml fields — useful for AI/LLM pipelines.
🎯 Use Cases
- Market research — Track star growth and activity across competing repos
- Lead generation — Find active contributors in a technology stack
- AI training data — Bulk-collect repo descriptions, READMEs, and topics
- Investment research — Monitor open-source adoption signals
- Competitive intelligence — Benchmark your repo vs competitors
💰 Pricing
Pay Per Result — you only pay for repos successfully scraped.
| Volume | Cost |
|---|---|
| 10 repos | ~$0.02 |
| 100 repos | ~$0.20 |
| 1,000 repos | ~$2.00 |
⚡ Performance
- Uses Cheerio — no heavy browser, very low compute cost
- Up to 3 concurrent requests
- ~50–100 repos/minute
- No proxies needed for normal volumes
❓ FAQ
Can I use all input fields at once?
No. Each mode uses its own fields. Set mode first, then only fill in fields for that mode — other fields are ignored.
Does this require a GitHub account or API key? No. Scrapes only public GitHub data, no login needed.
Can I scrape private repos? No — public repos only.
Can I schedule this to run daily? Yes. Use Apify's built-in scheduler with a cron expression.
Will I get blocked? Unlikely for normal volumes. The Actor uses proper headers and rate limiting. For 1000+ repos, enable Apify proxy.
Built with Apify SDK + Crawlee. Issues or feature requests? Leave a comment on the Actor page.