GitHub Organization Scraper
Pricing
Pay per event
GitHub Organization Scraper
Pull GitHub organization metadata and its public repo list via the GitHub API — display name, description, location, blog, members count, plus per-repo summary (name, stars, language, last push) — export to JSON or CSV. Free REST API, optional token for higher limits.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
an hour ago
Last modified
Categories
Share
🎯 What this scrapes
GitHub exposes every org at api.github.com/orgs/{slug} and its public repos at /orgs/{slug}/repos. This Actor takes a list of org slugs (or full github.com/<org> URLs), fans them out concurrently, and writes one row per organisation — with an optional public-repo summary attached.
Need orgs at scale? Provide thousands of slugs; the Actor pages through pagination and handles the rate-limit dance so you don't have to.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or rate-limit signal.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per request,Retry-Afterheader honoured. - 🧱 Rate-limit-aware pacing — we back off gracefully when GitHub pushes back, so your run completes instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated output, ISO-8601 timestamps, stable IDs; export to JSON, CSV, or Excel straight from Apify Console.
- 💰 Pay-Per-Event pricing — you pay only when a result lands in your dataset. No data, no charge (beyond the tiny actor-start fee).
💡 Use cases
- Lead generation — pull a list of dev-tool companies and surface contact info (email, blog, Twitter/X handle) from their public org profiles.
- M&A / competitive intel — quantify the open-source surface of a target company: repo count, total stars, last-push cadence, verified-org status.
- DevRel benchmarking — compare your org's public-repo activity against competitors; feed the data into your BI tool or Sheets dashboard.
- Recruitment targeting — rank organisations by location, follower count, and activity to prioritise engineering-heavy outreach targets.
- Dependency mapping — combine with the GitHub Repo Scraper to inventory every repo a company maintains.
⚙️ How to use it
- Click Try for free at the top of the Store page.
- Paste your list of GitHub org slugs (e.g.
apify,anthropics) or full profile URLs. - Optionally add a GitHub personal-access token to raise the rate limit from 60 to 5 000 requests/hour.
- Click Start. Output streams into the run's dataset in real time.
- Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify REST API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
orgs | array | yes | ["apify", "anthropics"] | GitHub org slugs or full github.com/<org> URLs. |
githubToken | string | no | — | Personal-access token. Lifts rate limit from 60/hour to 5 000/hour. Read-only public scope is sufficient. |
includeRepos | boolean | no | true | Adds up to maxReposPerOrg recently-updated repos per org. One extra API call per org. |
maxReposPerOrg | integer | no | 30 | Cap on repos returned per org. Hard ceiling 100 per GitHub page. |
concurrency | integer | no | 4 | Parallel API requests. Raise for large batch jobs. |
proxyConfiguration | object | no | {"useApifyProxy": false} | Apify Proxy config. Optional — enable residential proxies for high-volume runs. |
Example input
{"orgs": ["apify", "anthropics"],"githubToken": "","includeRepos": true,"maxReposPerOrg": 30,"concurrency": 4,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
One dataset row per GitHub organisation. When includeRepos is true, the repos field carries a per-repo summary array.
| Field | Type | Notes |
|---|---|---|
login | string | Organisation slug (unique identifier). |
name | string | null | Display name. |
description | string | null | Org bio text. |
company | string | null | Self-declared company string. |
blog | string | null | Homepage / blog URL. |
location | string | null | Location string (user-supplied). |
email | string | null | Public contact email. |
twitter_username | string | null | X / Twitter handle. |
public_repos | integer | Number of public repos. |
public_gists | integer | Number of public gists. |
followers | integer | Follower count. |
html_url | string | GitHub org profile URL. |
avatar_url | string | Avatar / logo URL. |
members_url_template | string | GitHub API template for member listings. |
type | string | Always "Organization". |
is_verified | boolean | null | GitHub verified-org flag. |
created_at | string | Org creation timestamp (ISO-8601). |
updated_at | string | Last profile-update timestamp (ISO-8601). |
repos | array | null | Per-repo summary list (when includeRepos=true). |
scraped_at | string | When this row was written (ISO-8601). |
Example output
{"login": "apify","name": "Apify","description": "Web scraping and automation platform.","blog": "https://apify.com","location": "Prague, Czechia","email": null,"twitter_username": "apify","public_repos": 412,"followers": 3800,"html_url": "https://github.com/apify","is_verified": true,"created_at": "2013-01-15T10:22:31Z","updated_at": "2026-05-20T08:14:07Z","scraped_at": "2026-06-01T12:00:00Z","type": "Organization","repos": [{"name": "apify-sdk-python","full_name": "apify/apify-sdk-python","stargazers_count": 1100,"language": "Python","pushed_at": "2026-05-30T18:00:00Z"}]}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.003 | Per organisation row written to dataset |
Example: scraping 1 000 organisations ≈ $3.00 all-in. No subscription, no minimum spend, no credit card required to try — Apify gives every new account $5 of free credit.
🚧 Limitations
- Member lists are out of scope — the public API exposes member count but not individual member profiles (use the GitHub User Scraper for that).
- Private repos and security advisories are never returned; this is a public-read-only integration.
- Rate limits without a token: 60 requests/hour per IP; 5 000/hour with a personal-access token.
- Nullable fields —
location,email,blog, andcompanyare user-supplied and are frequentlynull. - Org size: very large organisations with 1 000+ repos will hit the
maxReposPerOrgcap — adjust the input or paginate across multiple runs.
❓ FAQ
Which Actor should I use — GitHub Organization Scraper, GitHub User Scraper, or GitHub Repo Scraper?
Use this Actor when your input is a list of organisations (companies, open-source foundations, teams) and you want org-level metadata plus their public repo list. Use the GitHub User Scraper when your input is individual developer handles. Use the GitHub Repo Scraper when you have specific repo URLs and need commit/contributor detail.
How is this different from the GitHub REST API?
It isn't — under the hood we call the same public GitHub REST API endpoints. What we add is batch orchestration (fan out across hundreds of orgs in one run), automatic pagination, rate-limit pacing, retry logic, and a clean typed dataset ready for export. If you only need data for one or two orgs, the raw API is fine. For anything bigger, this Actor saves the plumbing.
What's a github org scraper good for in a real workflow?
DevRel teams use it to track competitor org activity weekly; sales teams feed it into enrichment pipelines to qualify OSS-heavy prospects; competitive-intel platforms monitor repo-count and follower churn across a curated watchlist.
How do I get github company data reliably at scale?
Provide a githubToken to unlock 5 000 requests/hour, set concurrency to 8–10, and cap maxReposPerOrg at 10 if you only need the headline stats. For bulk runs (1 000+ orgs) that approach finishes in under 15 minutes.
Can I use this as a github org repo list API alternative?
Yes. Set includeRepos: true and maxReposPerOrg: 100; the repos array on each output row gives you the full public repo list (up to 100) with name, stars, language, and last-push timestamp. Combine the outputs across multiple runs for a richer dataset.
Why is email empty for most orgs?
GitHub lets orgs hide their email from the public profile. We surface whatever the API returns — we never guess or infer.
Are private members listed?
No — this uses the public read-only API. Even with a token, private-member data requires org-admin scope inside the org itself, which is out of scope here.
Can I get GitHub Advanced Security findings?
That's a paid GitHub Advanced Security API — outside the scope of what public org endpoints expose.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab in Apify Console — we ship fixes weekly and we read every report.