# GitHub Scraper - Repos, Developers & Contact Leads (`scrapesage/github-scraper`) Actor

Scrape GitHub via the official API: search repositories & developers, get full repo metadata, README, languages, topics, stars & activity, plus developer/org profiles and contributor & stargazer leads with emails. Developer lead-gen + monitoring. No browser.

- **URL**: https://apify.com/scrapesage/github-scraper.md
- **Developed by:** [Scrape Sage](https://apify.com/scrapesage) (community)
- **Categories:** Developer tools, Lead generation, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $4.00 / 1,000 repository records

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## GitHub Scraper — Repositories, Developers & Contact Leads

Extract **complete data from GitHub** using the official API — search **repositories** by keyword, language, stars and topic; search **developers** by location, language and followers; and pull **full repo metadata, READMEs, language breakdowns, developer and organization profiles, and contributor & stargazer leads with recovered emails**. Built for developer **lead generation**, recruiting, open-source intelligence and tech market research.

No login, no browser — fast extraction straight from `api.github.com`, with 99%+ reliability. Add a free GitHub token for high-volume runs.

### Why this GitHub scraper?

Most GitHub scrapers do one thing — search repos, *or* scrape one profile. This actor combines **repository intelligence and developer lead generation in a single tool**, and ships the **richest record in the category**:

| Data | Typical scrapers | This actor |
|---|---|---|
| Repo metadata (stars, forks, language, topics, license) | ✅ | ✅ |
| Repo activity (created / updated / pushed, active flag) + popularity score | partial | ✅ |
| README text + word count, language breakdown, latest release | ❌ | ✅ (opt-in) |
| Developer profiles (name, company, location, bio, followers, hireable) | partial | ✅ |
| **Developer email** (profile + recovered from public commits) | ❌ | ✅ the lead wedge |
| Website crawl for extra emails, phone & socials | ❌ | ✅ (opt-in) |
| Organization profiles & leads | ❌ | ✅ |
| Contributor **and** stargazer leads from any repo | partial | ✅ |
| Lead score (0–100) per developer | ❌ | ✅ |
| Search developers by location / language / followers | ❌ | ✅ |
| Monitor mode (only new repos / devs / stars) | ❌ | ✅ |

### Use cases

- **Developer lead generation** — find developers by location, language, topic or by who contributes to / stars a repo, then export them with email, company, blog and social links straight into your CRM. Perfect for DevTool, API and infrastructure companies selling to developers.
- **Technical recruiting** — search active developers in a city using a given language, filter to `hireableOnly`, and get their contact details and top repositories in one run.
- **Open-source & competitive intelligence** — track who is building (and starring) in a technology niche, the leading repos by stars and activity, and which companies are most active.
- **Market & trend research** — map a topic (e.g. `llm-agent`, `vector-database`) across repos, languages and maintainers; build datasets for analysis or LLM training.
- **Ecosystem / DevRel outreach** — pull a project's contributors and stargazers as a warm audience for community, sponsorship or partnership outreach.
- **Due diligence** — assess a company's open-source footprint: repos, maintainers, activity and popularity.

### How to use

1. [Sign up for Apify](https://console.apify.com/sign-up) — the free plan is enough to try this actor.
2. Open the **GitHub Scraper**, choose a mode, enter search queries / repos / usernames, and click **Start**.
3. (Recommended) Paste a **GitHub token** in `githubToken` for 5,000 requests/hour instead of the unauthenticated ~60/hour.
4. Watch records stream into the dataset table, then **export** as JSON, CSV, Excel, XML, or RSS — or pull them via the [Apify API](https://docs.apify.com/api/v2).

### Input

```json
{
    "mode": "searchUsers",
    "searchQueries": ["machine learning"],
    "userLocation": "Berlin",
    "userLanguage": "Python",
    "minFollowers": 100,
    "extractCommitEmails": true,
    "enrichContactEmails": true,
    "hireableOnly": false,
    "maxResults": 50,
    "githubToken": "ghp_xxx"
}
````

- **mode** — `searchRepositories` (keyword + filters), `searchUsers` (find developers), `repositoryDetails` (full records for the names in `repositories`), `userProfiles` (developer leads from `usernames`), `organizationProfiles` (org leads from `organizations`), or `repositoryContributors` (contributor/stargazer leads from the repos in `repositories`).
- **searchQueries** — keywords/phrases; each runs separately and combines with the filters.
- **repositories / usernames / organizations** — inputs for the detail / profile / contributor modes (`"facebook/react"`, `"torvalds"`, `"vercel"`).
- **Repository filters** — `language`, `minStars` / `maxStars`, `topics`, `repoLicense`, `repoCreatedAfter`, `repoPushedAfter`, `includeForks`, `onlyActive`.
- **Developer filters** — `userLocation`, `userLanguage`, `minFollowers`, `minRepos`, `userType`, `extraQualifiers` (raw GitHub qualifiers).
- **sortBy / sortOrder** — `stars`/`forks`/`updated` for repos, `followers`/`repositories`/`joined` for users, or `best-match`.
- **extractCommitEmails** *(default true)* — recover a developer's public commit email (the lead wedge); GitHub no-reply addresses are filtered out.
- **enrichContactEmails** *(default false)* — crawl the developer's/org's website for extra emails, phone and socials.
- **includeReadme / includeLanguages / includeLatestRelease / includeContributorsCount / includeOwnerProfile** — extra repository detail (one request each).
- **includeUserRepos** — attach a developer's top repos + derived top languages.
- **withEmailOnly / hireableOnly** — output filters for lead lists.
- **monitorMode** *(default false)* — remember records from previous runs and emit only **new** ones. Pair with [Schedules](https://docs.apify.com/platform/schedules).
- **githubToken** — optional but strongly recommended for speed and volume (5,000 req/hour).
- **maxResults / maxResultsPerQuery** — limits.

### Output

A repository record (`type: "repository"`):

```json
{
    "type": "repository",
    "fullName": "vercel/next.js",
    "name": "next.js",
    "ownerLogin": "vercel",
    "ownerType": "Organization",
    "description": "The React Framework",
    "homepage": "https://nextjs.org",
    "language": "JavaScript",
    "topics": ["react", "nextjs", "ssr", "vercel"],
    "stars": 128000,
    "forks": 27000,
    "openIssues": 2600,
    "license": { "key": "mit", "name": "MIT License", "spdxId": "MIT" },
    "isArchived": false,
    "defaultBranch": "canary",
    "createdAt": "2016-10-05T00:00:00.000Z",
    "pushedAt": "2026-06-15T00:00:00.000Z",
    "daysSinceLastPush": 0,
    "isActive": true,
    "languages": [{ "name": "JavaScript", "bytes": 4200000, "percent": 88.4 }],
    "latestRelease": { "tagName": "v15.0.0", "publishedAt": "2026-05-01T00:00:00.000Z" },
    "popularityScore": 96,
    "repoUrl": "https://github.com/vercel/next.js",
    "scrapedAt": "2026-06-15T12:00:00.000Z"
}
```

A developer record (`type: "user"`) — a ready-to-use lead:

```json
{
    "type": "user",
    "login": "gaearon",
    "name": "dan",
    "company": "@bsky",
    "blog": "https://danabra.mov",
    "location": "London, UK",
    "bio": "i build user interfaces.",
    "hireable": null,
    "twitterUsername": "dan_abramov2",
    "publicRepos": 280,
    "followers": 92000,
    "emails": ["dan.abramov@example.com"],
    "primaryEmail": "dan.abramov@example.com",
    "emailSources": ["commits"],
    "websiteEmails": [],
    "socialLinks": { "twitter": "https://twitter.com/dan_abramov2" },
    "topRepos": [{ "name": "overreacted.io", "stars": 7000, "language": "JavaScript" }],
    "topLanguages": ["JavaScript", "TypeScript"],
    "sourceRepo": null,
    "sourceRole": null,
    "leadScore": 71,
    "userUrl": "https://github.com/gaearon",
    "scrapedAt": "2026-06-15T12:00:00.000Z"
}
```

Fields are `null` (or arrays empty) only when the data genuinely doesn't exist — never because the scraper skipped them.

### Automate & schedule

Run this actor on autopilot and pull results into your own stack:

- **[Apify API](https://docs.apify.com/api/v2)** — start runs, fetch datasets, and manage schedules over REST.
- **[apify-client for JavaScript](https://docs.apify.com/api/client/js/)** and **[apify-client for Python](https://docs.apify.com/api/client/python/)** — official SDKs.
- **[Schedules](https://docs.apify.com/platform/schedules)** — run it daily/weekly with `monitorMode` to capture new repos in a topic, or new contributors/stargazers of a project.
- **[Webhooks](https://docs.apify.com/platform/integrations/webhooks)** — trigger downstream actions (CRM import, Slack alert) the moment a run finishes.

```js
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'MY_APIFY_TOKEN' });

const run = await client.actor('scrapesage/github-scraper').call({
    mode: 'repositoryContributors',
    repositories: ['langchain-ai/langchain'],
    peopleSource: 'contributors',
    extractCommitEmails: true,
    maxResults: 100,
    githubToken: 'ghp_xxx',
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} developer leads`);
```

### Integrate with any app

Connect the dataset to 5,000+ apps — no code required:

- **[Make](https://docs.apify.com/platform/integrations/make)** — multi-step automation scenarios.
- **[Zapier](https://docs.apify.com/platform/integrations/zapier)** — push new developer leads straight into your CRM.
- **[Slack](https://docs.apify.com/platform/integrations/slack)** — get notified when a monitored topic gets new repos.
- **[Google Drive / Sheets](https://docs.apify.com/platform/integrations/drive)** — auto-export every run to a spreadsheet.
- **[Airbyte](https://docs.apify.com/platform/integrations/airbyte)** — pipe results into your data warehouse.
- **[GitHub](https://docs.apify.com/platform/integrations/github)** — trigger runs from commits or releases.

### Use with AI assistants (MCP)

The output is clean, LLM-ready JSON. Call this actor from Claude, ChatGPT, or any agent framework through the **[Apify MCP server](https://docs.apify.com/platform/integrations/mcp)** — ask your assistant to "find the top Rust web-framework repos and the developers behind them" and let it run this scraper for you.

### More scrapers from scrapesage

Build a full **developer, product & tech market-intelligence stack**:

- **[Product Hunt Scraper](https://apify.com/scrapesage/product-hunt-scraper)** — product launches and the makers behind them.
- **[Y Combinator Scraper](https://apify.com/scrapesage/ycombinator-scraper)** — startups, founders and jobs.
- **[Chrome Web Store Scraper](https://apify.com/scrapesage/chrome-web-store-scraper)** — extensions and developer leads.
- **[Google Play Scraper](https://apify.com/scrapesage/google-play-scraper)** — apps, reviews and developer leads.
- **[Apple App Store Scraper](https://apify.com/scrapesage/app-store-scraper)** — apps, reviews and charts.
- **[Steam Scraper](https://apify.com/scrapesage/steam-scraper)** — games, prices, reviews and charts.
- **[Levels.fyi Scraper](https://apify.com/scrapesage/levels-fyi-scraper)** — tech salaries and compensation by company and level.
- **[Google Patents Scraper](https://apify.com/scrapesage/google-patents-scraper)** — patents, citations and assignee intelligence.
- **[SEC EDGAR Scraper](https://apify.com/scrapesage/sec-edgar-scraper)** — filings, financials and company profiles.

### Tips

- **Add a token**: paste a free [GitHub token](https://github.com/settings/tokens) in `githubToken` for 5,000 requests/hour (vs ~60 unauthenticated). Read-only scope is enough.
- **Going past 1,000 results**: GitHub search serves up to 1,000 results per query. To exhaust a big topic, window it with `minStars`/`maxStars` (e.g. 100–500, 500–2000, 2000+) or `repoCreatedAfter` dates.
- **Best email hit-rate**: keep `extractCommitEmails` on and add `enrichContactEmails` to crawl personal sites. Many developers expose a real email in their public commits even when their profile email is hidden.
- **Monitoring**: combine [Schedules](https://docs.apify.com/platform/schedules) + `monitorMode` to capture only new repos/contributors/stargazers each run.

### FAQ

**Do I need a GitHub API key?** No, but it's strongly recommended. Without a token GitHub allows ~60 requests/hour per IP; with a free token you get 5,000/hour and faster, larger runs.

**How do I find developers to email?** Use `searchUsers` with a `userLocation`/`userLanguage`/`minFollowers` filter, or `repositoryContributors` on a relevant repo. Keep `extractCommitEmails` on to recover commit emails, and add `enrichContactEmails` for website crawling. Filter with `withEmailOnly`.

**Are the emails real?** They come from public GitHub profiles and public commit metadata (which developers publish themselves). GitHub `users.noreply.github.com` privacy addresses are filtered out. Use the data in line with applicable laws and outreach regulations.

**Can I get a repo's contributors or stargazers?** Yes — use `repositoryContributors` mode with `peopleSource` set to `contributors`, `stargazers`, or `both`. Each becomes a developer-lead record.

**Can I export to Google Sheets, CSV, or Excel?** Yes — one click in the dataset view, or automatically on every run via the [Google Drive integration](https://docs.apify.com/platform/integrations/drive).

**Is scraping GitHub legal?** This actor reads publicly available data through GitHub's official API. You are responsible for using the data in compliance with applicable laws and GitHub's terms.

### Need help?

Open an issue on the actor's **Issues** tab, or visit the [Apify help center](https://help.apify.com/). Feature requests are welcome — this actor is actively maintained.

# Actor input Schema

## `mode` (type: `string`):

What to scrape. searchRepositories = find repos by keyword/filters. searchUsers = find developers by location/language/followers. repositoryDetails = full records for specific repos. userProfiles = developer leads by username. organizationProfiles = org leads. repositoryContributors = contributor/stargazer leads from a repo. Start URLs override the mode.

## `searchQueries` (type: `array`):

Keywords / phrases for searchRepositories or searchUsers mode. Each query runs separately and combines with the filters below. Examples: "llm agent framework", "vector database", "react native".

## `repositories` (type: `array`):

Repository names ("owner/repo") or URLs. Primary input for repositoryDetails and repositoryContributors modes. Examples: "facebook/react", "https://github.com/vercel/next.js".

## `usernames` (type: `array`):

GitHub usernames for userProfiles mode. Examples: "torvalds", "gaearon", "sindresorhus".

## `organizations` (type: `array`):

GitHub organization logins for organizationProfiles mode. Examples: "vercel", "openai", "microsoft".

## `startUrls` (type: `array`):

GitHub URLs to scrape directly (auto-routed). Repo URLs (github.com/owner/repo), user URLs (github.com/torvalds) and org URLs (github.com/orgs/vercel) are all supported.

## `language` (type: `string`):

Repository language filter (searchRepositories). Examples: "JavaScript", "Python", "Rust", "Go".

## `minStars` (type: `integer`):

Only repositories with at least this many stars.

## `maxStars` (type: `integer`):

Only repositories with at most this many stars (combine with minStars to window large topics past the 1,000-result cap).

## `topics` (type: `array`):

Require these GitHub topics, e.g. "machine-learning", "hacktoberfest", "cli".

## `repoLicense` (type: `string`):

Require a license (SPDX-ish key), e.g. "mit", "apache-2.0", "gpl-3.0".

## `repoCreatedAfter` (type: `string`):

Only repositories created on/after this date.

## `repoPushedAfter` (type: `string`):

Only repositories with a commit pushed on/after this date — great for finding actively maintained projects.

## `includeForks` (type: `boolean`):

Include forked repositories in repository search results.

## `onlyActive` (type: `boolean`):

Exclude archived repositories.

## `userLocation` (type: `string`):

Location filter for searchUsers, e.g. "San Francisco", "Berlin", "London". Matches the free-text GitHub location field.

## `userLanguage` (type: `string`):

Primary language filter for searchUsers (their most-used repo language), e.g. "Python", "TypeScript".

## `minFollowers` (type: `integer`):

Only developers/orgs with at least this many followers.

## `minRepos` (type: `integer`):

Only developers/orgs with at least this many public repositories.

## `userType` (type: `string`):

Restrict searchUsers to individual developers or organizations.

## `extraQualifiers` (type: `string`):

Advanced: raw GitHub search qualifiers appended to the query verbatim, e.g. "sponsorable:true" or "created:>2023-01-01". See GitHub search syntax docs.

## `sortBy` (type: `string`):

Result ordering. Repos: stars / forks / updated / help-wanted-issues. Users: followers / repositories / joined. "best-match" is GitHub's relevance ranking.

## `sortOrder` (type: `string`):

Direction for the sort field above (ignored for best-match).

## `maxResults` (type: `integer`):

Total cap on records emitted across all queries/inputs in this run.

## `maxResultsPerQuery` (type: `integer`):

Optional cap per individual query (0 = no per-query cap). GitHub search serves up to 1,000 results per query — window with minStars/maxStars or dates to go deeper.

## `peopleSource` (type: `string`):

For repositoryContributors mode: pull a repo's contributors, its stargazers, or both — each as a developer-lead record.

## `extractCommitEmails` (type: `boolean`):

For developer/contributor records, read the user's public commit activity (and a repo's recent commits) to recover their public commit email — the core lead wedge. GitHub no-reply addresses are filtered out. One extra request per developer.

## `enrichContactEmails` (type: `boolean`):

Crawl each developer's/org's website (their GitHub 'blog' field) home + /contact + /about for extra emails, phone and social links. Opt-in (charged as contact enrichment).

## `includeUserRepos` (type: `boolean`):

For developer/org records, attach their top repositories (by stars) and derived top languages. One extra request per developer.

## `maxReposPerUser` (type: `integer`):

Cap on a developer's top repos attached when 'Include developer's top repos' is on.

## `includeReadme` (type: `boolean`):

For repository records, fetch and attach the README text (truncated to ~8,000 chars) + word count. One extra request per repo.

## `includeLanguages` (type: `boolean`):

For repository records, attach the full language breakdown with byte counts and percentages. One extra request per repo.

## `includeLatestRelease` (type: `boolean`):

For repository records, attach the latest release (tag, name, date). One extra request per repo.

## `includeContributorsCount` (type: `boolean`):

For repository records, compute the total contributor count. One extra request per repo.

## `includeOwnerProfile` (type: `boolean`):

For repository records, attach the owner's profile lead fields (name, company, blog, location, email, followers). One extra request per repo.

## `withEmailOnly` (type: `boolean`):

Emit only developer/org records for which at least one email was found.

## `hireableOnly` (type: `boolean`):

Emit only developers whose profile is flagged 'available for hire'. Great for technical recruiting.

## `monitorMode` (type: `boolean`):

Remember repos/developers/orgs seen in previous runs (in a named key-value store) and emit only NEW ones. Pair with Apify Schedules to watch a topic, company, or repo for new projects, contributors or stargazers. Does not conflict with scheduling.

## `monitorStoreName` (type: `string`):

Named key-value store that holds the 'already seen' keys for monitor mode. Use a distinct name per saved task so their histories stay separate.

## `githubToken` (type: `string`):

A GitHub personal access token (classic or fine-grained, read-only is enough). Without it GitHub allows only ~60 requests/hour per IP; with it you get 5,000/hour and 30 searches/minute — much faster, larger runs. Create one at github.com/settings/tokens.

## `maxConcurrency` (type: `integer`):

Maximum API requests in parallel.

## `proxyConfiguration` (type: `object`):

Proxy settings. The GitHub API has no anti-bot; a proxy is used to spread the per-IP rate limit across IPs when running without a token. The default Apify proxy is fine.

## Actor input object example

```json
{
  "mode": "searchRepositories",
  "searchQueries": [
    "llm agent framework"
  ],
  "includeForks": false,
  "onlyActive": false,
  "userType": "user",
  "sortBy": "best-match",
  "sortOrder": "desc",
  "maxResults": 50,
  "maxResultsPerQuery": 0,
  "peopleSource": "contributors",
  "extractCommitEmails": true,
  "enrichContactEmails": false,
  "includeUserRepos": false,
  "maxReposPerUser": 10,
  "includeReadme": false,
  "includeLanguages": false,
  "includeLatestRelease": false,
  "includeContributorsCount": false,
  "includeOwnerProfile": false,
  "withEmailOnly": false,
  "hireableOnly": false,
  "monitorMode": false,
  "monitorStoreName": "github-monitor",
  "maxConcurrency": 6,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `results` (type: `string`):

All scraped repository, developer and organization records as JSON items in the default dataset.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQueries": [
        "llm agent framework"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("scrapesage/github-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "searchQueries": ["llm agent framework"] }

# Run the Actor and wait for it to finish
run = client.actor("scrapesage/github-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQueries": [
    "llm agent framework"
  ]
}' |
apify call scrapesage/github-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scrapesage/github-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "GitHub Scraper - Repos, Developers & Contact Leads",
        "description": "Scrape GitHub via the official API: search repositories & developers, get full repo metadata, README, languages, topics, stars & activity, plus developer/org profiles and contributor & stargazer leads with emails. Developer lead-gen + monitoring. No browser.",
        "version": "0.1",
        "x-build-id": "i69p6UOdRUnUDd8ib"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scrapesage~github-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scrapesage-github-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scrapesage~github-scraper/runs": {
            "post": {
                "operationId": "runs-sync-scrapesage-github-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scrapesage~github-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-scrapesage-github-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "searchRepositories",
                            "searchUsers",
                            "repositoryDetails",
                            "userProfiles",
                            "organizationProfiles",
                            "repositoryContributors"
                        ],
                        "type": "string",
                        "description": "What to scrape. searchRepositories = find repos by keyword/filters. searchUsers = find developers by location/language/followers. repositoryDetails = full records for specific repos. userProfiles = developer leads by username. organizationProfiles = org leads. repositoryContributors = contributor/stargazer leads from a repo. Start URLs override the mode.",
                        "default": "searchRepositories"
                    },
                    "searchQueries": {
                        "title": "Search queries",
                        "type": "array",
                        "description": "Keywords / phrases for searchRepositories or searchUsers mode. Each query runs separately and combines with the filters below. Examples: \"llm agent framework\", \"vector database\", \"react native\".",
                        "items": {
                            "type": "string"
                        }
                    },
                    "repositories": {
                        "title": "Repositories",
                        "type": "array",
                        "description": "Repository names (\"owner/repo\") or URLs. Primary input for repositoryDetails and repositoryContributors modes. Examples: \"facebook/react\", \"https://github.com/vercel/next.js\".",
                        "items": {
                            "type": "string"
                        }
                    },
                    "usernames": {
                        "title": "Usernames (developers)",
                        "type": "array",
                        "description": "GitHub usernames for userProfiles mode. Examples: \"torvalds\", \"gaearon\", \"sindresorhus\".",
                        "items": {
                            "type": "string"
                        }
                    },
                    "organizations": {
                        "title": "Organizations",
                        "type": "array",
                        "description": "GitHub organization logins for organizationProfiles mode. Examples: \"vercel\", \"openai\", \"microsoft\".",
                        "items": {
                            "type": "string"
                        }
                    },
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "GitHub URLs to scrape directly (auto-routed). Repo URLs (github.com/owner/repo), user URLs (github.com/torvalds) and org URLs (github.com/orgs/vercel) are all supported.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "language": {
                        "title": "Programming language",
                        "type": "string",
                        "description": "Repository language filter (searchRepositories). Examples: \"JavaScript\", \"Python\", \"Rust\", \"Go\"."
                    },
                    "minStars": {
                        "title": "Minimum stars",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only repositories with at least this many stars."
                    },
                    "maxStars": {
                        "title": "Maximum stars",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only repositories with at most this many stars (combine with minStars to window large topics past the 1,000-result cap)."
                    },
                    "topics": {
                        "title": "Topics",
                        "type": "array",
                        "description": "Require these GitHub topics, e.g. \"machine-learning\", \"hacktoberfest\", \"cli\".",
                        "items": {
                            "type": "string"
                        }
                    },
                    "repoLicense": {
                        "title": "License",
                        "type": "string",
                        "description": "Require a license (SPDX-ish key), e.g. \"mit\", \"apache-2.0\", \"gpl-3.0\"."
                    },
                    "repoCreatedAfter": {
                        "title": "Created after (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Only repositories created on/after this date."
                    },
                    "repoPushedAfter": {
                        "title": "Pushed after (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Only repositories with a commit pushed on/after this date — great for finding actively maintained projects."
                    },
                    "includeForks": {
                        "title": "Include forks",
                        "type": "boolean",
                        "description": "Include forked repositories in repository search results.",
                        "default": false
                    },
                    "onlyActive": {
                        "title": "Only active (not archived)",
                        "type": "boolean",
                        "description": "Exclude archived repositories.",
                        "default": false
                    },
                    "userLocation": {
                        "title": "Developer location",
                        "type": "string",
                        "description": "Location filter for searchUsers, e.g. \"San Francisco\", \"Berlin\", \"London\". Matches the free-text GitHub location field."
                    },
                    "userLanguage": {
                        "title": "Developer language",
                        "type": "string",
                        "description": "Primary language filter for searchUsers (their most-used repo language), e.g. \"Python\", \"TypeScript\"."
                    },
                    "minFollowers": {
                        "title": "Minimum followers",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only developers/orgs with at least this many followers."
                    },
                    "minRepos": {
                        "title": "Minimum public repos",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only developers/orgs with at least this many public repositories."
                    },
                    "userType": {
                        "title": "Account type",
                        "enum": [
                            "user",
                            "org",
                            "any"
                        ],
                        "type": "string",
                        "description": "Restrict searchUsers to individual developers or organizations.",
                        "default": "user"
                    },
                    "extraQualifiers": {
                        "title": "Extra search qualifiers",
                        "type": "string",
                        "description": "Advanced: raw GitHub search qualifiers appended to the query verbatim, e.g. \"sponsorable:true\" or \"created:>2023-01-01\". See GitHub search syntax docs."
                    },
                    "sortBy": {
                        "title": "Sort by",
                        "enum": [
                            "best-match",
                            "stars",
                            "forks",
                            "updated",
                            "help-wanted-issues",
                            "followers",
                            "repositories",
                            "joined"
                        ],
                        "type": "string",
                        "description": "Result ordering. Repos: stars / forks / updated / help-wanted-issues. Users: followers / repositories / joined. \"best-match\" is GitHub's relevance ranking.",
                        "default": "best-match"
                    },
                    "sortOrder": {
                        "title": "Sort order",
                        "enum": [
                            "desc",
                            "asc"
                        ],
                        "type": "string",
                        "description": "Direction for the sort field above (ignored for best-match).",
                        "default": "desc"
                    },
                    "maxResults": {
                        "title": "Max records (total)",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Total cap on records emitted across all queries/inputs in this run.",
                        "default": 50
                    },
                    "maxResultsPerQuery": {
                        "title": "Max records per query",
                        "minimum": 0,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Optional cap per individual query (0 = no per-query cap). GitHub search serves up to 1,000 results per query — window with minStars/maxStars or dates to go deeper.",
                        "default": 0
                    },
                    "peopleSource": {
                        "title": "People to extract (contributors mode)",
                        "enum": [
                            "contributors",
                            "stargazers",
                            "both"
                        ],
                        "type": "string",
                        "description": "For repositoryContributors mode: pull a repo's contributors, its stargazers, or both — each as a developer-lead record.",
                        "default": "contributors"
                    },
                    "extractCommitEmails": {
                        "title": "Extract developer emails from commits",
                        "type": "boolean",
                        "description": "For developer/contributor records, read the user's public commit activity (and a repo's recent commits) to recover their public commit email — the core lead wedge. GitHub no-reply addresses are filtered out. One extra request per developer.",
                        "default": true
                    },
                    "enrichContactEmails": {
                        "title": "Enrich from personal website",
                        "type": "boolean",
                        "description": "Crawl each developer's/org's website (their GitHub 'blog' field) home + /contact + /about for extra emails, phone and social links. Opt-in (charged as contact enrichment).",
                        "default": false
                    },
                    "includeUserRepos": {
                        "title": "Include developer's top repos",
                        "type": "boolean",
                        "description": "For developer/org records, attach their top repositories (by stars) and derived top languages. One extra request per developer.",
                        "default": false
                    },
                    "maxReposPerUser": {
                        "title": "Max repos per developer",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Cap on a developer's top repos attached when 'Include developer's top repos' is on.",
                        "default": 10
                    },
                    "includeReadme": {
                        "title": "Include README",
                        "type": "boolean",
                        "description": "For repository records, fetch and attach the README text (truncated to ~8,000 chars) + word count. One extra request per repo.",
                        "default": false
                    },
                    "includeLanguages": {
                        "title": "Include language breakdown",
                        "type": "boolean",
                        "description": "For repository records, attach the full language breakdown with byte counts and percentages. One extra request per repo.",
                        "default": false
                    },
                    "includeLatestRelease": {
                        "title": "Include latest release",
                        "type": "boolean",
                        "description": "For repository records, attach the latest release (tag, name, date). One extra request per repo.",
                        "default": false
                    },
                    "includeContributorsCount": {
                        "title": "Include contributors count",
                        "type": "boolean",
                        "description": "For repository records, compute the total contributor count. One extra request per repo.",
                        "default": false
                    },
                    "includeOwnerProfile": {
                        "title": "Include repo owner profile",
                        "type": "boolean",
                        "description": "For repository records, attach the owner's profile lead fields (name, company, blog, location, email, followers). One extra request per repo.",
                        "default": false
                    },
                    "withEmailOnly": {
                        "title": "Only records with an email",
                        "type": "boolean",
                        "description": "Emit only developer/org records for which at least one email was found.",
                        "default": false
                    },
                    "hireableOnly": {
                        "title": "Only hireable developers",
                        "type": "boolean",
                        "description": "Emit only developers whose profile is flagged 'available for hire'. Great for technical recruiting.",
                        "default": false
                    },
                    "monitorMode": {
                        "title": "Monitor mode (only new records)",
                        "type": "boolean",
                        "description": "Remember repos/developers/orgs seen in previous runs (in a named key-value store) and emit only NEW ones. Pair with Apify Schedules to watch a topic, company, or repo for new projects, contributors or stargazers. Does not conflict with scheduling.",
                        "default": false
                    },
                    "monitorStoreName": {
                        "title": "Monitor store name",
                        "type": "string",
                        "description": "Named key-value store that holds the 'already seen' keys for monitor mode. Use a distinct name per saved task so their histories stay separate.",
                        "default": "github-monitor"
                    },
                    "githubToken": {
                        "title": "GitHub token (optional, strongly recommended)",
                        "type": "string",
                        "description": "A GitHub personal access token (classic or fine-grained, read-only is enough). Without it GitHub allows only ~60 requests/hour per IP; with it you get 5,000/hour and 30 searches/minute — much faster, larger runs. Create one at github.com/settings/tokens."
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "maximum": 16,
                        "type": "integer",
                        "description": "Maximum API requests in parallel.",
                        "default": 6
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Proxy settings. The GitHub API has no anti-bot; a proxy is used to spread the per-IP rate limit across IPs when running without a token. The default Apify proxy is fine.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
