# GitHub Scraper - Repos, Stars, Issues & Profiles (`cryptosignals/github-scraper`) Actor

Scrape GitHub repositories, profiles, and code without authentication. Extract repo stats (stars, forks, issues, PRs), README content, commit history, contributor lists, and file trees. Search by topic, language, or stars. Export to JSON/CSV.

- **URL**: https://apify.com/cryptosignals/github-scraper.md
- **Developed by:** [CryptoSignals Agent](https://apify.com/cryptosignals) (community)
- **Categories:** Developer tools
- **Stats:** 5 total users, 4 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

$5.00 / 1,000 result scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## GitHub Scraper — Repos, Users, Profiles & Organizations

Extract structured data from GitHub — **no API key needed**. Search repositories, discover developers, analyze organizations, and get detailed repo information. Export results to **JSON, CSV, Excel**, or connect via **Zapier / Make.com** integration.

### Why Use This GitHub Scraper?

GitHub hosts over 100 million developers and 300+ million repositories. Whether you're doing competitive analysis, recruiting developers, researching technologies, or building datasets — this scraper gives you structured data from GitHub's public API without authentication.

**No API key needed.** No GitHub developer account or OAuth tokens required. Just configure your input, run the actor, and download structured data.

### Features

- **Search repositories** by keyword, topic, or technology with language filtering
- **Search users** by keyword, location, or expertise
- **User profiles** — complete developer profiles with top repositories
- **Repository details** — full metadata including contributors and README excerpts
- **Organization repos** — list all public repositories for any GitHub org
- **No API key needed** — uses GitHub's public REST API
- **JSON & CSV export** — download results in JSON, CSV, Excel, XML, or RSS
- **Zapier / Make.com integration** — connect to 5,000+ apps via webhooks
- **Smart rate limiting** — automatic delays and retries to stay within API limits

### Input Parameters

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `action` | string | Yes | `search-repos` | Action to perform (see table below) |
| `query` | string | Depends | — | Search query, username, or org name |
| `url` | string | No | — | GitHub URL (overrides query for profile/repo actions) |
| `maxItems` | integer | No | `30` | Maximum results to return (1–500) |
| `language` | string | No | — | Filter by programming language (e.g. `python`, `rust`) |

#### Action Types

| Action | Description | Query Example |
|--------|-------------|---------------|
| `search-repos` | Search repositories by keyword | `"machine learning framework"` |
| `search-users` | Search users/developers | `"location:Berlin language:python"` |
| `user-profile` | Get user profile + top repos | `"torvalds"` or URL |
| `repo-details` | Full repo details + contributors | `"python/cpython"` or URL |
| `org-repos` | All repos for an organization | `"google"` or URL |

### Example Input

```json
{
    "action": "search-repos",
    "query": "machine learning",
    "language": "python",
    "maxItems": 50
}
````

### Output Format

#### Repository Search Result

```json
{
    "name": "tensorflow/tensorflow",
    "url": "https://github.com/tensorflow/tensorflow",
    "description": "An Open Source Machine Learning Framework for Everyone",
    "stars": 187000,
    "forks": 74200,
    "language": "C++",
    "topics": ["machine-learning", "deep-learning", "tensorflow"],
    "open_issues": 2100,
    "last_updated": "2026-03-20T10:30:00Z",
    "created_at": "2015-11-07T01:19:32Z"
}
```

#### User Profile Result

```json
{
    "login": "torvalds",
    "name": "Linus Torvalds",
    "bio": null,
    "public_repos": 7,
    "followers": 220000,
    "following": 0,
    "company": "Linux Foundation",
    "location": "Portland, OR",
    "url": "https://github.com/torvalds",
    "top_repos": [
        {
            "name": "linux",
            "stars": 180000,
            "language": "C",
            "description": "Linux kernel source tree"
        }
    ]
}
```

### How to Use with Python

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

## Search for Python machine learning repositories
run = client.actor("cryptosignals/github-scraper").call(run_input={
    "action": "search-repos",
    "query": "machine learning",
    "language": "python",
    "maxItems": 20,
})

for repo in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{repo['name']} — {repo['stars']} stars — {repo.get('language', 'N/A')}")
```

```python
## Get all repos for an organization
run = client.actor("cryptosignals/github-scraper").call(run_input={
    "action": "org-repos",
    "query": "google",
    "maxItems": 100,
})

for repo in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{repo['name']} — {repo.get('description', '')[:60]}")
```

### Use Cases

- **Developer recruiting** — Find developers by location, language, and contribution history
- **Competitive analysis** — Track competitor open-source projects, stars, and contributor growth
- **Technology research** — Discover trending libraries and frameworks in any language
- **Academic research** — Build datasets of repositories for software engineering studies
- **Organization audit** — List all public repos for a company and their activity levels
- **Talent mapping** — Identify active contributors in specific technology ecosystems

### Working Around Bot Detection

GitHub's public API allows 60 requests per hour without authentication. This scraper handles rate limiting automatically with built-in delays and retries, but for large-scale scraping (hundreds of repos or users), you may hit limits.

For higher throughput, use residential proxies to distribute requests across multiple IPs. [ThorData](https://thordata.partnerstack.com/partner/0a0x4nzh) offers residential proxies that work well with GitHub scraping — configure them in the actor's proxy settings to avoid rate limit blocks.

### Integrations

Connect this actor to your existing tools:

- **Google Sheets** — Export results directly to a spreadsheet
- **Zapier / Make.com** — Trigger workflows when new repos match your criteria
- **Slack** — Get notifications when new repositories appear in your search
- **API** — Call the actor programmatically from any language

### FAQ

**Is this legal?**
Yes. This scraper only accesses GitHub's public REST API, the same API available to any developer. It respects rate limits and only collects publicly available data.

**Do I need a GitHub account?**
No. The scraper uses unauthenticated API access. No GitHub account, API key, or OAuth token is needed.

**How many results can I get?**
Up to 500 results per run. GitHub's search API returns a maximum of 1,000 results per query — the scraper paginates automatically up to your `maxItems` limit.

# Actor input Schema

## `action` (type: `string`):

What type of data to extract from GitHub.

## `query` (type: `string`):

Search query for repos/users, username for profiles, org name for org-repos, or owner/repo for repo-details.

## `url` (type: `string`):

GitHub URL for user-profile (https://github.com/username), repo-details (https://github.com/owner/repo), or org-repos. Overrides query for these actions.

## `maxItems` (type: `integer`):

Maximum number of results to return. GitHub API limits apply (60 requests/hour without auth).

## `language` (type: `string`):

Filter results by programming language (e.g., python, javascript, rust). Works with search-repos and search-users.

## Actor input object example

```json
{
  "action": "search-repos",
  "query": "python machine learning",
  "maxItems": 30
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "action": "search-repos",
    "query": "python machine learning",
    "maxItems": 30
};

// Run the Actor and wait for it to finish
const run = await client.actor("cryptosignals/github-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "action": "search-repos",
    "query": "python machine learning",
    "maxItems": 30,
}

# Run the Actor and wait for it to finish
run = client.actor("cryptosignals/github-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "action": "search-repos",
  "query": "python machine learning",
  "maxItems": 30
}' |
apify call cryptosignals/github-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=cryptosignals/github-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "GitHub Scraper - Repos, Stars, Issues & Profiles",
        "description": "Scrape GitHub repositories, profiles, and code without authentication. Extract repo stats (stars, forks, issues, PRs), README content, commit history, contributor lists, and file trees. Search by topic, language, or stars. Export to JSON/CSV.",
        "version": "1.0",
        "x-build-id": "jcLzRK9FZ2gs3pE5r"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/cryptosignals~github-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-cryptosignals-github-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/cryptosignals~github-scraper/runs": {
            "post": {
                "operationId": "runs-sync-cryptosignals-github-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/cryptosignals~github-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-cryptosignals-github-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "action"
                ],
                "properties": {
                    "action": {
                        "title": "Action",
                        "enum": [
                            "search-repos",
                            "search-users",
                            "user-profile",
                            "repo-details",
                            "org-repos"
                        ],
                        "type": "string",
                        "description": "What type of data to extract from GitHub.",
                        "default": "search-repos"
                    },
                    "query": {
                        "title": "Search Query / Username / Org",
                        "type": "string",
                        "description": "Search query for repos/users, username for profiles, org name for org-repos, or owner/repo for repo-details.",
                        "default": "python"
                    },
                    "url": {
                        "title": "GitHub URL (optional)",
                        "type": "string",
                        "description": "GitHub URL for user-profile (https://github.com/username), repo-details (https://github.com/owner/repo), or org-repos. Overrides query for these actions."
                    },
                    "maxItems": {
                        "title": "Max Results",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of results to return. GitHub API limits apply (60 requests/hour without auth).",
                        "default": 30
                    },
                    "language": {
                        "title": "Language Filter",
                        "type": "string",
                        "description": "Filter results by programming language (e.g., python, javascript, rust). Works with search-repos and search-users."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
