Github Repositry Scraper avatar

Github Repositry Scraper

Under maintenance

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Github Repositry Scraper

Github Repositry Scraper

Under maintenance

Scrape GitHub repos by URL, search, or trending. Extract stars, forks, topics, languages, contributors & more. No login needed.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Amna Iftikhar

Amna Iftikhar

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Categories

Share

GitHub Repository Scraper

Extract comprehensive data from GitHub repositories — by direct URL, keyword search, or trending. No login, no API keys required.

Perfect for competitive analysis, lead generation, market research, AI training data, and developer tooling pipelines.


🚀 3 Modes — Pick One

⚠️ Each mode uses different input fields. Only fill in fields for the mode you choose.

ModeWhen to use it
reposYou already have specific GitHub URLs you want to scrape
searchYou want to discover repos by keyword or language
trendingYou want GitHub's trending repos right now

⚙️ Input by Mode

Mode: repos — Scrape specific repositories

{
"mode": "repos",
"repoUrls": [
"https://github.com/facebook/react",
"https://github.com/vercel/next.js"
],
"maxResults": 10,
"includeReadme": false
}
FieldRequiredDescription
modeSet to "repos"
repoUrlsList of GitHub repo URLs to scrape
maxResultsoptionalMax repos to scrape (default: 10)
includeReadmeoptionalAlso fetch README content (default: false)

Mode: search — Find repos by keyword

{
"mode": "search",
"searchQuery": "machine learning",
"searchLanguage": "Python",
"searchSort": "stars",
"maxResults": 50,
"includeReadme": false
}
FieldRequiredDescription
modeSet to "search"
searchQueryKeywords to search (e.g. "web scraper")
searchLanguageoptionalFilter by language e.g. "Python", "JavaScript"
searchSortoptionalSort by "stars", "forks", or "updated" (default: "stars")
maxResultsoptionalMax repos to return, up to 300 (default: 10)
includeReadmeoptionalAlso fetch README content (default: false)

{
"mode": "trending",
"trendingLanguage": "python",
"trendingPeriod": "weekly",
"maxResults": 25,
"includeReadme": false
}
FieldRequiredDescription
modeSet to "trending"
trendingLanguageoptionalFilter by language e.g. "python", "rust" — leave empty for all
trendingPeriodoptional"daily", "weekly", or "monthly" (default: "daily")
maxResultsoptionalMax repos to return (default: 10)
includeReadmeoptionalAlso fetch README content (default: false)

📦 Output Fields

Each scraped repository returns:

{
"url": "https://github.com/facebook/react",
"fullName": "facebook/react",
"owner": "facebook",
"name": "react",
"repoId": "10270250",
"description": "The library for web and native user interfaces.",
"website": "https://react.dev",
"topics": ["react", "javascript", "library", "ui", "frontend"],
"primaryLanguage": "JavaScript",
"languages": { "JavaScript": "68.1%", "TypeScript": "29.0%" },
"license": "MIT",
"stars": 243937,
"starsDisplay": "244k",
"forks": 50761,
"watchers": 6700,
"openIssues": 809,
"openPullRequests": 355,
"commits": 21425,
"contributors": 1734,
"totalReleases": 118,
"latestRelease": "19.2.4",
"defaultBranch": "main",
"lastCommitAt": "2026-01-26T18:29:43Z",
"scrapedAt": "2026-03-13T10:00:00.000Z"
}

Enable includeReadme: true to also get readmeText and readmeHtml fields — useful for AI/LLM pipelines.


🎯 Use Cases

  • Market research — Track star growth and activity across competing repos
  • Lead generation — Find active contributors in a technology stack
  • AI training data — Bulk-collect repo descriptions, READMEs, and topics
  • Investment research — Monitor open-source adoption signals
  • Competitive intelligence — Benchmark your repo vs competitors

💰 Pricing

Pay Per Result — you only pay for repos successfully scraped.

VolumeCost
10 repos~$0.02
100 repos~$0.20
1,000 repos~$2.00

⚡ Performance

  • Uses Cheerio — no heavy browser, very low compute cost
  • Up to 3 concurrent requests
  • ~50–100 repos/minute
  • No proxies needed for normal volumes

❓ FAQ

Can I use all input fields at once? No. Each mode uses its own fields. Set mode first, then only fill in fields for that mode — other fields are ignored.

Does this require a GitHub account or API key? No. Scrapes only public GitHub data, no login needed.

Can I scrape private repos? No — public repos only.

Can I schedule this to run daily? Yes. Use Apify's built-in scheduler with a cron expression.

Will I get blocked? Unlikely for normal volumes. The Actor uses proper headers and rate limiting. For 1000+ repos, enable Apify proxy.


Built with Apify SDK + Crawlee. Issues or feature requests? Leave a comment on the Actor page.