# GitHub Repository & Issue Scraper (`automly/github-repo-scraper`) Actor

Extract repository metadata, issues, pull requests, and contributor profiles from GitHub using the official REST API. Perfect for developer lead generation, competitive analysis, and open-source research.

- **URL**: https://apify.com/automly/github-repo-scraper.md
- **Developed by:** [Automly](https://apify.com/automly) (community)
- **Categories:** Developer tools, Lead generation
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## GitHub Repository & Issue Scraper

Extract repository metadata, issues, pull requests, and contributor profiles from GitHub using the official REST API. This actor is ideal for developer lead generation, competitive open-source analysis, building talent pipelines, and monitoring repository health.

### Why use this actor?

- **No scraping complexity** — Uses the official GitHub REST API for reliable, structured data.
- **Developer lead generation** — Find repositories by language, stars, or topic and extract contributor contact details.
- **Competitive research** — Track open issues and pull requests across competitor projects.
- **Talent sourcing** — Extract contributor profiles with public emails, company, and location.
- **RAG & AI pipelines** — Feed repository descriptions, issues, and documentation into vector databases.

### Features

- Search repositories by GitHub query syntax (language, stars, topics, etc.)
- Scrape specific repositories by URL or `owner/repo` string
- Extract open issues with labels, comments, and author details
- Extract open pull requests with merge and draft status
- Extract contributor profiles with email, company, location, and bio
- Configurable per-repository limits to control run scope
- Optional GitHub token for 5000 requests/hour (vs 60/hour unauthenticated)

### Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| searchQuery | string | — | GitHub search query, e.g. `language:python stars:>1000` |
| repoUrls | array | — | List of repository URLs or `owner/repo` strings |
| extractIssues | boolean | false | Extract open issues per repository |
| extractPullRequests | boolean | false | Extract open pull requests per repository |
| extractUsers | boolean | false | Extract contributor profiles per repository |
| maxResults | integer | 100 | Maximum total records to return (1–1000) |
| githubToken | string | — | GitHub personal access token for higher rate limits |
| maxIssuesPerRepo | integer | 30 | Max issues per repository |
| maxPullRequestsPerRepo | integer | 30 | Max pull requests per repository |
| maxUsersPerRepo | integer | 30 | Max contributors per repository |

#### Example input

```json
{
  "searchQuery": "language:typescript stars:>5000",
  "extractIssues": true,
  "extractUsers": true,
  "maxResults": 50,
  "maxIssuesPerRepo": 10,
  "maxUsersPerRepo": 10
}
````

### Output

Each record includes a `type` field to distinguish entities.

#### Repository

| Field | Type | Description |
|-------|------|-------------|
| type | string | `repository` |
| url | string | GitHub repository URL |
| owner | string | Repository owner |
| name | string | Repository name |
| fullName | string | `owner/name` |
| description | string | Repository description |
| stars | integer | Stargazer count |
| forks | integer | Fork count |
| openIssues | integer | Open issue count |
| language | string | Primary language |
| license | string | SPDX license identifier |
| createdAt | string | ISO 8601 creation timestamp |
| updatedAt | string | ISO 8601 update timestamp |
| topics | array | Repository topics |

#### Issue

| Field | Type | Description |
|-------|------|-------------|
| type | string | `issue` |
| repository | string | Parent repository |
| url | string | Issue URL |
| number | integer | Issue number |
| title | string | Issue title |
| state | string | Issue state |
| author | string | Author username |
| labels | array | Label names |
| createdAt | string | ISO 8601 timestamp |
| updatedAt | string | ISO 8601 timestamp |
| comments | integer | Comment count |

#### Pull Request

| Field | Type | Description |
|-------|------|-------------|
| type | string | `pullRequest` |
| repository | string | Parent repository |
| url | string | PR URL |
| number | integer | PR number |
| title | string | PR title |
| state | string | PR state |
| author | string | Author username |
| createdAt | string | ISO 8601 timestamp |
| updatedAt | string | ISO 8601 timestamp |
| merged | boolean | Merged status |
| draft | boolean | Draft status |

#### User

| Field | Type | Description |
|-------|------|-------------|
| type | string | `user` |
| repository | string | Source repository |
| url | string | Profile URL |
| username | string | GitHub username |
| name | string | Display name |
| company | string | Company |
| blog | string | Blog URL |
| location | string | Location |
| email | string | Public email |
| bio | string | Bio |
| publicRepos | integer | Public repository count |
| followers | integer | Follower count |
| following | integer | Following count |
| createdAt | string | ISO 8601 timestamp |

### Limits and caveats

- Unauthenticated requests are limited to **60 per hour**. Provide a `githubToken` for **5000 per hour**.
- GitHub Search API returns up to **1000 results** per query.
- Only **public repositories** are accessible without additional permissions.
- User emails are only returned if the user has chosen to make them public.
- The actor respects `maxResults` as a hard cap across all entity types.

### Pricing

This actor uses **Pay Per Event** pricing. You are charged only for successfully extracted data.

| Event | Price | Description |
|-------|-------|-------------|
| **Repository scraped** | $0.005 | Each repository successfully extracted |
| **Issue scraped** | $0.002 | Each issue successfully extracted |
| **Pull request scraped** | $0.002 | Each pull request successfully extracted |
| **User scraped** | $0.005 | Each contributor profile successfully extracted |

Tiered discounts apply based on your Apify subscription level. A small actor-start fee may also apply.

### FAQ

**Do I need a GitHub account?**
No, but providing a GitHub personal access token dramatically increases your rate limit from 60 to 5000 requests per hour.

**Can I scrape private repositories?**
No. This actor only accesses public data available through the GitHub REST API.

**What happens if I hit the rate limit?**
The actor will log a warning and stop gracefully. Provide a token to avoid this.

**Is the data real-time?**
Data reflects the current state of GitHub at the time of the run.

# Actor input Schema

## `searchQuery` (type: `string`):

GitHub search query (e.g., 'language:python stars:>1000'). Used when repoUrls is empty.

## `repoUrls` (type: `array`):

List of GitHub repository URLs or owner/repo strings to scrape directly.

## `extractIssues` (type: `boolean`):

Whether to extract open issues for each repository.

## `extractPullRequests` (type: `boolean`):

Whether to extract open pull requests for each repository.

## `extractUsers` (type: `boolean`):

Whether to extract contributor user profiles for each repository.

## `maxResults` (type: `integer`):

Maximum total results to return across all entity types.

## `githubToken` (type: `string`):

GitHub personal access token for higher rate limits (5000 req/hr vs 60 req/hr).

## `maxIssuesPerRepo` (type: `integer`):

Maximum issues to extract per repository.

## `maxPullRequestsPerRepo` (type: `integer`):

Maximum pull requests to extract per repository.

## `maxUsersPerRepo` (type: `integer`):

Maximum contributors to extract per repository.

## Actor input object example

```json
{
  "extractIssues": false,
  "extractPullRequests": false,
  "extractUsers": false,
  "maxResults": 100,
  "maxIssuesPerRepo": 30,
  "maxPullRequestsPerRepo": 30,
  "maxUsersPerRepo": 30
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("automly/github-repo-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("automly/github-repo-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call automly/github-repo-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=automly/github-repo-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "GitHub Repository & Issue Scraper",
        "description": "Extract repository metadata, issues, pull requests, and contributor profiles from GitHub using the official REST API. Perfect for developer lead generation, competitive analysis, and open-source research.",
        "version": "1.0",
        "x-build-id": "tlo6jEwLsTiY0295l"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/automly~github-repo-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-automly-github-repo-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/automly~github-repo-scraper/runs": {
            "post": {
                "operationId": "runs-sync-automly-github-repo-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/automly~github-repo-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-automly-github-repo-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "GitHub search query (e.g., 'language:python stars:>1000'). Used when repoUrls is empty."
                    },
                    "repoUrls": {
                        "title": "Repository URLs",
                        "type": "array",
                        "description": "List of GitHub repository URLs or owner/repo strings to scrape directly.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "extractIssues": {
                        "title": "Extract Issues",
                        "type": "boolean",
                        "description": "Whether to extract open issues for each repository.",
                        "default": false
                    },
                    "extractPullRequests": {
                        "title": "Extract Pull Requests",
                        "type": "boolean",
                        "description": "Whether to extract open pull requests for each repository.",
                        "default": false
                    },
                    "extractUsers": {
                        "title": "Extract Contributors",
                        "type": "boolean",
                        "description": "Whether to extract contributor user profiles for each repository.",
                        "default": false
                    },
                    "maxResults": {
                        "title": "Maximum Results",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum total results to return across all entity types.",
                        "default": 100
                    },
                    "githubToken": {
                        "title": "GitHub Token",
                        "type": "string",
                        "description": "GitHub personal access token for higher rate limits (5000 req/hr vs 60 req/hr)."
                    },
                    "maxIssuesPerRepo": {
                        "title": "Max Issues Per Repo",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum issues to extract per repository.",
                        "default": 30
                    },
                    "maxPullRequestsPerRepo": {
                        "title": "Max Pull Requests Per Repo",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum pull requests to extract per repository.",
                        "default": 30
                    },
                    "maxUsersPerRepo": {
                        "title": "Max Contributors Per Repo",
                        "minimum": 0,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum contributors to extract per repository.",
                        "default": 30
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
