GitHub Scraper - Repos, Developers & Contact Leads avatar

GitHub Scraper - Repos, Developers & Contact Leads

Pricing

from $4.00 / 1,000 repository records

Go to Apify Store
GitHub Scraper - Repos, Developers & Contact Leads

GitHub Scraper - Repos, Developers & Contact Leads

Scrape GitHub via the official API: search repositories & developers, get full repo metadata, README, languages, topics, stars & activity, plus developer/org profiles and contributor & stargazer leads with emails. Developer lead-gen + monitoring. No browser.

Pricing

from $4.00 / 1,000 repository records

Rating

0.0

(0)

Developer

Scrape Sage

Scrape Sage

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

GitHub Scraper — Repositories, Developers & Contact Leads

Extract complete data from GitHub using the official API — search repositories by keyword, language, stars and topic; search developers by location, language and followers; and pull full repo metadata, READMEs, language breakdowns, developer and organization profiles, and contributor & stargazer leads with recovered emails. Built for developer lead generation, recruiting, open-source intelligence and tech market research.

No login, no browser — fast extraction straight from api.github.com, with 99%+ reliability. Add a free GitHub token for high-volume runs.

Why this GitHub scraper?

Most GitHub scrapers do one thing — search repos, or scrape one profile. This actor combines repository intelligence and developer lead generation in a single tool, and ships the richest record in the category:

DataTypical scrapersThis actor
Repo metadata (stars, forks, language, topics, license)
Repo activity (created / updated / pushed, active flag) + popularity scorepartial
README text + word count, language breakdown, latest release✅ (opt-in)
Developer profiles (name, company, location, bio, followers, hireable)partial
Developer email (profile + recovered from public commits)✅ the lead wedge
Website crawl for extra emails, phone & socials✅ (opt-in)
Organization profiles & leads
Contributor and stargazer leads from any repopartial
Lead score (0–100) per developer
Search developers by location / language / followers
Monitor mode (only new repos / devs / stars)

Use cases

  • Developer lead generation — find developers by location, language, topic or by who contributes to / stars a repo, then export them with email, company, blog and social links straight into your CRM. Perfect for DevTool, API and infrastructure companies selling to developers.
  • Technical recruiting — search active developers in a city using a given language, filter to hireableOnly, and get their contact details and top repositories in one run.
  • Open-source & competitive intelligence — track who is building (and starring) in a technology niche, the leading repos by stars and activity, and which companies are most active.
  • Market & trend research — map a topic (e.g. llm-agent, vector-database) across repos, languages and maintainers; build datasets for analysis or LLM training.
  • Ecosystem / DevRel outreach — pull a project's contributors and stargazers as a warm audience for community, sponsorship or partnership outreach.
  • Due diligence — assess a company's open-source footprint: repos, maintainers, activity and popularity.

How to use

  1. Sign up for Apify — the free plan is enough to try this actor.
  2. Open the GitHub Scraper, choose a mode, enter search queries / repos / usernames, and click Start.
  3. (Recommended) Paste a GitHub token in githubToken for 5,000 requests/hour instead of the unauthenticated ~60/hour.
  4. Watch records stream into the dataset table, then export as JSON, CSV, Excel, XML, or RSS — or pull them via the Apify API.

Input

{
"mode": "searchUsers",
"searchQueries": ["machine learning"],
"userLocation": "Berlin",
"userLanguage": "Python",
"minFollowers": 100,
"extractCommitEmails": true,
"enrichContactEmails": true,
"hireableOnly": false,
"maxResults": 50,
"githubToken": "ghp_xxx"
}
  • modesearchRepositories (keyword + filters), searchUsers (find developers), repositoryDetails (full records for the names in repositories), userProfiles (developer leads from usernames), organizationProfiles (org leads from organizations), or repositoryContributors (contributor/stargazer leads from the repos in repositories).
  • searchQueries — keywords/phrases; each runs separately and combines with the filters.
  • repositories / usernames / organizations — inputs for the detail / profile / contributor modes ("facebook/react", "torvalds", "vercel").
  • Repository filterslanguage, minStars / maxStars, topics, repoLicense, repoCreatedAfter, repoPushedAfter, includeForks, onlyActive.
  • Developer filtersuserLocation, userLanguage, minFollowers, minRepos, userType, extraQualifiers (raw GitHub qualifiers).
  • sortBy / sortOrderstars/forks/updated for repos, followers/repositories/joined for users, or best-match.
  • extractCommitEmails (default true) — recover a developer's public commit email (the lead wedge); GitHub no-reply addresses are filtered out.
  • enrichContactEmails (default false) — crawl the developer's/org's website for extra emails, phone and socials.
  • includeReadme / includeLanguages / includeLatestRelease / includeContributorsCount / includeOwnerProfile — extra repository detail (one request each).
  • includeUserRepos — attach a developer's top repos + derived top languages.
  • withEmailOnly / hireableOnly — output filters for lead lists.
  • monitorMode (default false) — remember records from previous runs and emit only new ones. Pair with Schedules.
  • githubToken — optional but strongly recommended for speed and volume (5,000 req/hour).
  • maxResults / maxResultsPerQuery — limits.

Output

A repository record (type: "repository"):

{
"type": "repository",
"fullName": "vercel/next.js",
"name": "next.js",
"ownerLogin": "vercel",
"ownerType": "Organization",
"description": "The React Framework",
"homepage": "https://nextjs.org",
"language": "JavaScript",
"topics": ["react", "nextjs", "ssr", "vercel"],
"stars": 128000,
"forks": 27000,
"openIssues": 2600,
"license": { "key": "mit", "name": "MIT License", "spdxId": "MIT" },
"isArchived": false,
"defaultBranch": "canary",
"createdAt": "2016-10-05T00:00:00.000Z",
"pushedAt": "2026-06-15T00:00:00.000Z",
"daysSinceLastPush": 0,
"isActive": true,
"languages": [{ "name": "JavaScript", "bytes": 4200000, "percent": 88.4 }],
"latestRelease": { "tagName": "v15.0.0", "publishedAt": "2026-05-01T00:00:00.000Z" },
"popularityScore": 96,
"repoUrl": "https://github.com/vercel/next.js",
"scrapedAt": "2026-06-15T12:00:00.000Z"
}

A developer record (type: "user") — a ready-to-use lead:

{
"type": "user",
"login": "gaearon",
"name": "dan",
"company": "@bsky",
"blog": "https://danabra.mov",
"location": "London, UK",
"bio": "i build user interfaces.",
"hireable": null,
"twitterUsername": "dan_abramov2",
"publicRepos": 280,
"followers": 92000,
"emails": ["dan.abramov@example.com"],
"primaryEmail": "dan.abramov@example.com",
"emailSources": ["commits"],
"websiteEmails": [],
"socialLinks": { "twitter": "https://twitter.com/dan_abramov2" },
"topRepos": [{ "name": "overreacted.io", "stars": 7000, "language": "JavaScript" }],
"topLanguages": ["JavaScript", "TypeScript"],
"sourceRepo": null,
"sourceRole": null,
"leadScore": 71,
"userUrl": "https://github.com/gaearon",
"scrapedAt": "2026-06-15T12:00:00.000Z"
}

Fields are null (or arrays empty) only when the data genuinely doesn't exist — never because the scraper skipped them.

Automate & schedule

Run this actor on autopilot and pull results into your own stack:

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'MY_APIFY_TOKEN' });
const run = await client.actor('scrapesage/github-scraper').call({
mode: 'repositoryContributors',
repositories: ['langchain-ai/langchain'],
peopleSource: 'contributors',
extractCommitEmails: true,
maxResults: 100,
githubToken: 'ghp_xxx',
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} developer leads`);

Integrate with any app

Connect the dataset to 5,000+ apps — no code required:

  • Make — multi-step automation scenarios.
  • Zapier — push new developer leads straight into your CRM.
  • Slack — get notified when a monitored topic gets new repos.
  • Google Drive / Sheets — auto-export every run to a spreadsheet.
  • Airbyte — pipe results into your data warehouse.
  • GitHub — trigger runs from commits or releases.

Use with AI assistants (MCP)

The output is clean, LLM-ready JSON. Call this actor from Claude, ChatGPT, or any agent framework through the Apify MCP server — ask your assistant to "find the top Rust web-framework repos and the developers behind them" and let it run this scraper for you.

More scrapers from scrapesage

Build a full developer, product & tech market-intelligence stack:

Tips

  • Add a token: paste a free GitHub token in githubToken for 5,000 requests/hour (vs ~60 unauthenticated). Read-only scope is enough.
  • Going past 1,000 results: GitHub search serves up to 1,000 results per query. To exhaust a big topic, window it with minStars/maxStars (e.g. 100–500, 500–2000, 2000+) or repoCreatedAfter dates.
  • Best email hit-rate: keep extractCommitEmails on and add enrichContactEmails to crawl personal sites. Many developers expose a real email in their public commits even when their profile email is hidden.
  • Monitoring: combine Schedules + monitorMode to capture only new repos/contributors/stargazers each run.

FAQ

Do I need a GitHub API key? No, but it's strongly recommended. Without a token GitHub allows ~60 requests/hour per IP; with a free token you get 5,000/hour and faster, larger runs.

How do I find developers to email? Use searchUsers with a userLocation/userLanguage/minFollowers filter, or repositoryContributors on a relevant repo. Keep extractCommitEmails on to recover commit emails, and add enrichContactEmails for website crawling. Filter with withEmailOnly.

Are the emails real? They come from public GitHub profiles and public commit metadata (which developers publish themselves). GitHub users.noreply.github.com privacy addresses are filtered out. Use the data in line with applicable laws and outreach regulations.

Can I get a repo's contributors or stargazers? Yes — use repositoryContributors mode with peopleSource set to contributors, stargazers, or both. Each becomes a developer-lead record.

Can I export to Google Sheets, CSV, or Excel? Yes — one click in the dataset view, or automatically on every run via the Google Drive integration.

Is scraping GitHub legal? This actor reads publicly available data through GitHub's official API. You are responsible for using the data in compliance with applicable laws and GitHub's terms.

Need help?

Open an issue on the actor's Issues tab, or visit the Apify help center. Feature requests are welcome — this actor is actively maintained.