GitHub Scraper - Repos, Stars, Issues & Profiles avatar

GitHub Scraper - Repos, Stars, Issues & Profiles

Pricing

$5.00 / 1,000 result scrapeds

Go to Apify Store
GitHub Scraper - Repos, Stars, Issues & Profiles

GitHub Scraper - Repos, Stars, Issues & Profiles

Scrape GitHub repositories, profiles, and code without authentication. Extract repo stats (stars, forks, issues, PRs), README content, commit history, contributor lists, and file trees. Search by topic, language, or stars. Export to JSON/CSV.

Pricing

$5.00 / 1,000 result scrapeds

Rating

0.0

(0)

Developer

CryptoSignals Agent

CryptoSignals Agent

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

4

Monthly active users

8 hours ago

Last modified

Categories

Share

GitHub Scraper — Repos, Users, Profiles & Organizations

Extract structured data from GitHub — no API key needed. Search repositories, discover developers, analyze organizations, and get detailed repo information. Export results to JSON, CSV, Excel, or connect via Zapier / Make.com integration.

Why Use This GitHub Scraper?

GitHub hosts over 100 million developers and 300+ million repositories. Whether you're doing competitive analysis, recruiting developers, researching technologies, or building datasets — this scraper gives you structured data from GitHub's public API without authentication.

No API key needed. No GitHub developer account or OAuth tokens required. Just configure your input, run the actor, and download structured data.

Features

  • Search repositories by keyword, topic, or technology with language filtering
  • Search users by keyword, location, or expertise
  • User profiles — complete developer profiles with top repositories
  • Repository details — full metadata including contributors and README excerpts
  • Organization repos — list all public repositories for any GitHub org
  • No API key needed — uses GitHub's public REST API
  • JSON & CSV export — download results in JSON, CSV, Excel, XML, or RSS
  • Zapier / Make.com integration — connect to 5,000+ apps via webhooks
  • Smart rate limiting — automatic delays and retries to stay within API limits

Input Parameters

ParameterTypeRequiredDefaultDescription
actionstringYessearch-reposAction to perform (see table below)
querystringDependsSearch query, username, or org name
urlstringNoGitHub URL (overrides query for profile/repo actions)
maxItemsintegerNo30Maximum results to return (1–500)
languagestringNoFilter by programming language (e.g. python, rust)

Action Types

ActionDescriptionQuery Example
search-reposSearch repositories by keyword"machine learning framework"
search-usersSearch users/developers"location:Berlin language:python"
user-profileGet user profile + top repos"torvalds" or URL
repo-detailsFull repo details + contributors"python/cpython" or URL
org-reposAll repos for an organization"google" or URL

Example Input

{
"action": "search-repos",
"query": "machine learning",
"language": "python",
"maxItems": 50
}

Output Format

Repository Search Result

{
"name": "tensorflow/tensorflow",
"url": "https://github.com/tensorflow/tensorflow",
"description": "An Open Source Machine Learning Framework for Everyone",
"stars": 187000,
"forks": 74200,
"language": "C++",
"topics": ["machine-learning", "deep-learning", "tensorflow"],
"open_issues": 2100,
"last_updated": "2026-03-20T10:30:00Z",
"created_at": "2015-11-07T01:19:32Z"
}

User Profile Result

{
"login": "torvalds",
"name": "Linus Torvalds",
"bio": null,
"public_repos": 7,
"followers": 220000,
"following": 0,
"company": "Linux Foundation",
"location": "Portland, OR",
"url": "https://github.com/torvalds",
"top_repos": [
{
"name": "linux",
"stars": 180000,
"language": "C",
"description": "Linux kernel source tree"
}
]
}

How to Use with Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
# Search for Python machine learning repositories
run = client.actor("cryptosignals/github-scraper").call(run_input={
"action": "search-repos",
"query": "machine learning",
"language": "python",
"maxItems": 20,
})
for repo in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{repo['name']}{repo['stars']} stars — {repo.get('language', 'N/A')}")
# Get all repos for an organization
run = client.actor("cryptosignals/github-scraper").call(run_input={
"action": "org-repos",
"query": "google",
"maxItems": 100,
})
for repo in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{repo['name']}{repo.get('description', '')[:60]}")

Use Cases

  • Developer recruiting — Find developers by location, language, and contribution history
  • Competitive analysis — Track competitor open-source projects, stars, and contributor growth
  • Technology research — Discover trending libraries and frameworks in any language
  • Academic research — Build datasets of repositories for software engineering studies
  • Organization audit — List all public repos for a company and their activity levels
  • Talent mapping — Identify active contributors in specific technology ecosystems

Working Around Bot Detection

GitHub's public API allows 60 requests per hour without authentication. This scraper handles rate limiting automatically with built-in delays and retries, but for large-scale scraping (hundreds of repos or users), you may hit limits.

For higher throughput, use residential proxies to distribute requests across multiple IPs. ThorData offers residential proxies that work well with GitHub scraping — configure them in the actor's proxy settings to avoid rate limit blocks.

Integrations

Connect this actor to your existing tools:

  • Google Sheets — Export results directly to a spreadsheet
  • Zapier / Make.com — Trigger workflows when new repos match your criteria
  • Slack — Get notifications when new repositories appear in your search
  • API — Call the actor programmatically from any language

FAQ

Is this legal? Yes. This scraper only accesses GitHub's public REST API, the same API available to any developer. It respects rate limits and only collects publicly available data.

Do I need a GitHub account? No. The scraper uses unauthenticated API access. No GitHub account, API key, or OAuth token is needed.

How many results can I get? Up to 500 results per run. GitHub's search API returns a maximum of 1,000 results per query — the scraper paginates automatically up to your maxItems limit.