GitHub Scraper - Repos, Developers & Contact Leads
Pricing
from $4.00 / 1,000 repository records
GitHub Scraper - Repos, Developers & Contact Leads
Scrape GitHub via the official API: search repositories & developers, get full repo metadata, README, languages, topics, stars & activity, plus developer/org profiles and contributor & stargazer leads with emails. Developer lead-gen + monitoring. No browser.
Pricing
from $4.00 / 1,000 repository records
Rating
0.0
(0)
Developer
Scrape Sage
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
GitHub Scraper — Repositories, Developers & Contact Leads
Extract complete data from GitHub using the official API — search repositories by keyword, language, stars and topic; search developers by location, language and followers; and pull full repo metadata, READMEs, language breakdowns, developer and organization profiles, and contributor & stargazer leads with recovered emails. Built for developer lead generation, recruiting, open-source intelligence and tech market research.
No login, no browser — fast extraction straight from api.github.com, with 99%+ reliability. Add a free GitHub token for high-volume runs.
Why this GitHub scraper?
Most GitHub scrapers do one thing — search repos, or scrape one profile. This actor combines repository intelligence and developer lead generation in a single tool, and ships the richest record in the category:
| Data | Typical scrapers | This actor |
|---|---|---|
| Repo metadata (stars, forks, language, topics, license) | ✅ | ✅ |
| Repo activity (created / updated / pushed, active flag) + popularity score | partial | ✅ |
| README text + word count, language breakdown, latest release | ❌ | ✅ (opt-in) |
| Developer profiles (name, company, location, bio, followers, hireable) | partial | ✅ |
| Developer email (profile + recovered from public commits) | ❌ | ✅ the lead wedge |
| Website crawl for extra emails, phone & socials | ❌ | ✅ (opt-in) |
| Organization profiles & leads | ❌ | ✅ |
| Contributor and stargazer leads from any repo | partial | ✅ |
| Lead score (0–100) per developer | ❌ | ✅ |
| Search developers by location / language / followers | ❌ | ✅ |
| Monitor mode (only new repos / devs / stars) | ❌ | ✅ |
Use cases
- Developer lead generation — find developers by location, language, topic or by who contributes to / stars a repo, then export them with email, company, blog and social links straight into your CRM. Perfect for DevTool, API and infrastructure companies selling to developers.
- Technical recruiting — search active developers in a city using a given language, filter to
hireableOnly, and get their contact details and top repositories in one run. - Open-source & competitive intelligence — track who is building (and starring) in a technology niche, the leading repos by stars and activity, and which companies are most active.
- Market & trend research — map a topic (e.g.
llm-agent,vector-database) across repos, languages and maintainers; build datasets for analysis or LLM training. - Ecosystem / DevRel outreach — pull a project's contributors and stargazers as a warm audience for community, sponsorship or partnership outreach.
- Due diligence — assess a company's open-source footprint: repos, maintainers, activity and popularity.
How to use
- Sign up for Apify — the free plan is enough to try this actor.
- Open the GitHub Scraper, choose a mode, enter search queries / repos / usernames, and click Start.
- (Recommended) Paste a GitHub token in
githubTokenfor 5,000 requests/hour instead of the unauthenticated ~60/hour. - Watch records stream into the dataset table, then export as JSON, CSV, Excel, XML, or RSS — or pull them via the Apify API.
Input
{"mode": "searchUsers","searchQueries": ["machine learning"],"userLocation": "Berlin","userLanguage": "Python","minFollowers": 100,"extractCommitEmails": true,"enrichContactEmails": true,"hireableOnly": false,"maxResults": 50,"githubToken": "ghp_xxx"}
- mode —
searchRepositories(keyword + filters),searchUsers(find developers),repositoryDetails(full records for the names inrepositories),userProfiles(developer leads fromusernames),organizationProfiles(org leads fromorganizations), orrepositoryContributors(contributor/stargazer leads from the repos inrepositories). - searchQueries — keywords/phrases; each runs separately and combines with the filters.
- repositories / usernames / organizations — inputs for the detail / profile / contributor modes (
"facebook/react","torvalds","vercel"). - Repository filters —
language,minStars/maxStars,topics,repoLicense,repoCreatedAfter,repoPushedAfter,includeForks,onlyActive. - Developer filters —
userLocation,userLanguage,minFollowers,minRepos,userType,extraQualifiers(raw GitHub qualifiers). - sortBy / sortOrder —
stars/forks/updatedfor repos,followers/repositories/joinedfor users, orbest-match. - extractCommitEmails (default true) — recover a developer's public commit email (the lead wedge); GitHub no-reply addresses are filtered out.
- enrichContactEmails (default false) — crawl the developer's/org's website for extra emails, phone and socials.
- includeReadme / includeLanguages / includeLatestRelease / includeContributorsCount / includeOwnerProfile — extra repository detail (one request each).
- includeUserRepos — attach a developer's top repos + derived top languages.
- withEmailOnly / hireableOnly — output filters for lead lists.
- monitorMode (default false) — remember records from previous runs and emit only new ones. Pair with Schedules.
- githubToken — optional but strongly recommended for speed and volume (5,000 req/hour).
- maxResults / maxResultsPerQuery — limits.
Output
A repository record (type: "repository"):
{"type": "repository","fullName": "vercel/next.js","name": "next.js","ownerLogin": "vercel","ownerType": "Organization","description": "The React Framework","homepage": "https://nextjs.org","language": "JavaScript","topics": ["react", "nextjs", "ssr", "vercel"],"stars": 128000,"forks": 27000,"openIssues": 2600,"license": { "key": "mit", "name": "MIT License", "spdxId": "MIT" },"isArchived": false,"defaultBranch": "canary","createdAt": "2016-10-05T00:00:00.000Z","pushedAt": "2026-06-15T00:00:00.000Z","daysSinceLastPush": 0,"isActive": true,"languages": [{ "name": "JavaScript", "bytes": 4200000, "percent": 88.4 }],"latestRelease": { "tagName": "v15.0.0", "publishedAt": "2026-05-01T00:00:00.000Z" },"popularityScore": 96,"repoUrl": "https://github.com/vercel/next.js","scrapedAt": "2026-06-15T12:00:00.000Z"}
A developer record (type: "user") — a ready-to-use lead:
{"type": "user","login": "gaearon","name": "dan","company": "@bsky","blog": "https://danabra.mov","location": "London, UK","bio": "i build user interfaces.","hireable": null,"twitterUsername": "dan_abramov2","publicRepos": 280,"followers": 92000,"emails": ["dan.abramov@example.com"],"primaryEmail": "dan.abramov@example.com","emailSources": ["commits"],"websiteEmails": [],"socialLinks": { "twitter": "https://twitter.com/dan_abramov2" },"topRepos": [{ "name": "overreacted.io", "stars": 7000, "language": "JavaScript" }],"topLanguages": ["JavaScript", "TypeScript"],"sourceRepo": null,"sourceRole": null,"leadScore": 71,"userUrl": "https://github.com/gaearon","scrapedAt": "2026-06-15T12:00:00.000Z"}
Fields are null (or arrays empty) only when the data genuinely doesn't exist — never because the scraper skipped them.
Automate & schedule
Run this actor on autopilot and pull results into your own stack:
- Apify API — start runs, fetch datasets, and manage schedules over REST.
- apify-client for JavaScript and apify-client for Python — official SDKs.
- Schedules — run it daily/weekly with
monitorModeto capture new repos in a topic, or new contributors/stargazers of a project. - Webhooks — trigger downstream actions (CRM import, Slack alert) the moment a run finishes.
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'MY_APIFY_TOKEN' });const run = await client.actor('scrapesage/github-scraper').call({mode: 'repositoryContributors',repositories: ['langchain-ai/langchain'],peopleSource: 'contributors',extractCommitEmails: true,maxResults: 100,githubToken: 'ghp_xxx',});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Got ${items.length} developer leads`);
Integrate with any app
Connect the dataset to 5,000+ apps — no code required:
- Make — multi-step automation scenarios.
- Zapier — push new developer leads straight into your CRM.
- Slack — get notified when a monitored topic gets new repos.
- Google Drive / Sheets — auto-export every run to a spreadsheet.
- Airbyte — pipe results into your data warehouse.
- GitHub — trigger runs from commits or releases.
Use with AI assistants (MCP)
The output is clean, LLM-ready JSON. Call this actor from Claude, ChatGPT, or any agent framework through the Apify MCP server — ask your assistant to "find the top Rust web-framework repos and the developers behind them" and let it run this scraper for you.
More scrapers from scrapesage
Build a full developer, product & tech market-intelligence stack:
- Product Hunt Scraper — product launches and the makers behind them.
- Y Combinator Scraper — startups, founders and jobs.
- Chrome Web Store Scraper — extensions and developer leads.
- Google Play Scraper — apps, reviews and developer leads.
- Apple App Store Scraper — apps, reviews and charts.
- Steam Scraper — games, prices, reviews and charts.
- Levels.fyi Scraper — tech salaries and compensation by company and level.
- Google Patents Scraper — patents, citations and assignee intelligence.
- SEC EDGAR Scraper — filings, financials and company profiles.
Tips
- Add a token: paste a free GitHub token in
githubTokenfor 5,000 requests/hour (vs ~60 unauthenticated). Read-only scope is enough. - Going past 1,000 results: GitHub search serves up to 1,000 results per query. To exhaust a big topic, window it with
minStars/maxStars(e.g. 100–500, 500–2000, 2000+) orrepoCreatedAfterdates. - Best email hit-rate: keep
extractCommitEmailson and addenrichContactEmailsto crawl personal sites. Many developers expose a real email in their public commits even when their profile email is hidden. - Monitoring: combine Schedules +
monitorModeto capture only new repos/contributors/stargazers each run.
FAQ
Do I need a GitHub API key? No, but it's strongly recommended. Without a token GitHub allows ~60 requests/hour per IP; with a free token you get 5,000/hour and faster, larger runs.
How do I find developers to email? Use searchUsers with a userLocation/userLanguage/minFollowers filter, or repositoryContributors on a relevant repo. Keep extractCommitEmails on to recover commit emails, and add enrichContactEmails for website crawling. Filter with withEmailOnly.
Are the emails real? They come from public GitHub profiles and public commit metadata (which developers publish themselves). GitHub users.noreply.github.com privacy addresses are filtered out. Use the data in line with applicable laws and outreach regulations.
Can I get a repo's contributors or stargazers? Yes — use repositoryContributors mode with peopleSource set to contributors, stargazers, or both. Each becomes a developer-lead record.
Can I export to Google Sheets, CSV, or Excel? Yes — one click in the dataset view, or automatically on every run via the Google Drive integration.
Is scraping GitHub legal? This actor reads publicly available data through GitHub's official API. You are responsible for using the data in compliance with applicable laws and GitHub's terms.
Need help?
Open an issue on the actor's Issues tab, or visit the Apify help center. Feature requests are welcome — this actor is actively maintained.