GitHub Organization Scraper avatar

GitHub Organization Scraper

Pricing

Pay per event

Go to Apify Store
GitHub Organization Scraper

GitHub Organization Scraper

Pull GitHub organization metadata and its public repo list via the GitHub API — display name, description, location, blog, members count, plus per-repo summary (name, stars, language, last push) — export to JSON or CSV. Free REST API, optional token for higher limits.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

an hour ago

Last modified

Share


🎯 What this scrapes

GitHub exposes every org at api.github.com/orgs/{slug} and its public repos at /orgs/{slug}/repos. This Actor takes a list of org slugs (or full github.com/<org> URLs), fans them out concurrently, and writes one row per organisation — with an optional public-repo summary attached.

Need orgs at scale? Provide thousands of slugs; the Actor pages through pagination and handles the rate-limit dance so you don't have to.

🔥 What we handle for you

  • 🛡️ Browser fingerprint rotationcurl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or rate-limit signal.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per request, Retry-After header honoured.
  • 🧱 Rate-limit-aware pacing — we back off gracefully when GitHub pushes back, so your run completes instead of getting banned.
  • 🧊 Clean, typed dataset rows — Pydantic-validated output, ISO-8601 timestamps, stable IDs; export to JSON, CSV, or Excel straight from Apify Console.
  • 💰 Pay-Per-Event pricing — you pay only when a result lands in your dataset. No data, no charge (beyond the tiny actor-start fee).

💡 Use cases

  • Lead generation — pull a list of dev-tool companies and surface contact info (email, blog, Twitter/X handle) from their public org profiles.
  • M&A / competitive intel — quantify the open-source surface of a target company: repo count, total stars, last-push cadence, verified-org status.
  • DevRel benchmarking — compare your org's public-repo activity against competitors; feed the data into your BI tool or Sheets dashboard.
  • Recruitment targeting — rank organisations by location, follower count, and activity to prioritise engineering-heavy outreach targets.
  • Dependency mapping — combine with the GitHub Repo Scraper to inventory every repo a company maintains.

⚙️ How to use it

  1. Click Try for free at the top of the Store page.
  2. Paste your list of GitHub org slugs (e.g. apify, anthropics) or full profile URLs.
  3. Optionally add a GitHub personal-access token to raise the rate limit from 60 to 5 000 requests/hour.
  4. Click Start. Output streams into the run's dataset in real time.
  5. Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify REST API.

📥 Input

FieldTypeRequiredDefaultNotes
orgsarrayyes["apify", "anthropics"]GitHub org slugs or full github.com/<org> URLs.
githubTokenstringnoPersonal-access token. Lifts rate limit from 60/hour to 5 000/hour. Read-only public scope is sufficient.
includeReposbooleannotrueAdds up to maxReposPerOrg recently-updated repos per org. One extra API call per org.
maxReposPerOrgintegerno30Cap on repos returned per org. Hard ceiling 100 per GitHub page.
concurrencyintegerno4Parallel API requests. Raise for large batch jobs.
proxyConfigurationobjectno{"useApifyProxy": false}Apify Proxy config. Optional — enable residential proxies for high-volume runs.

Example input

{
"orgs": ["apify", "anthropics"],
"githubToken": "",
"includeRepos": true,
"maxReposPerOrg": 30,
"concurrency": 4,
"proxyConfiguration": {
"useApifyProxy": false
}
}

📤 Output

One dataset row per GitHub organisation. When includeRepos is true, the repos field carries a per-repo summary array.

FieldTypeNotes
loginstringOrganisation slug (unique identifier).
namestring | nullDisplay name.
descriptionstring | nullOrg bio text.
companystring | nullSelf-declared company string.
blogstring | nullHomepage / blog URL.
locationstring | nullLocation string (user-supplied).
emailstring | nullPublic contact email.
twitter_usernamestring | nullX / Twitter handle.
public_reposintegerNumber of public repos.
public_gistsintegerNumber of public gists.
followersintegerFollower count.
html_urlstringGitHub org profile URL.
avatar_urlstringAvatar / logo URL.
members_url_templatestringGitHub API template for member listings.
typestringAlways "Organization".
is_verifiedboolean | nullGitHub verified-org flag.
created_atstringOrg creation timestamp (ISO-8601).
updated_atstringLast profile-update timestamp (ISO-8601).
reposarray | nullPer-repo summary list (when includeRepos=true).
scraped_atstringWhen this row was written (ISO-8601).

Example output

{
"login": "apify",
"name": "Apify",
"description": "Web scraping and automation platform.",
"blog": "https://apify.com",
"location": "Prague, Czechia",
"email": null,
"twitter_username": "apify",
"public_repos": 412,
"followers": 3800,
"html_url": "https://github.com/apify",
"is_verified": true,
"created_at": "2013-01-15T10:22:31Z",
"updated_at": "2026-05-20T08:14:07Z",
"scraped_at": "2026-06-01T12:00:00Z",
"type": "Organization",
"repos": [
{
"name": "apify-sdk-python",
"full_name": "apify/apify-sdk-python",
"stargazers_count": 1100,
"language": "Python",
"pushed_at": "2026-05-30T18:00:00Z"
}
]
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

EventUSDWhat it is
actor-start$0.005One-off warm-up charge per run
result$0.003Per organisation row written to dataset

Example: scraping 1 000 organisations ≈ $3.00 all-in. No subscription, no minimum spend, no credit card required to try — Apify gives every new account $5 of free credit.

🚧 Limitations

  • Member lists are out of scope — the public API exposes member count but not individual member profiles (use the GitHub User Scraper for that).
  • Private repos and security advisories are never returned; this is a public-read-only integration.
  • Rate limits without a token: 60 requests/hour per IP; 5 000/hour with a personal-access token.
  • Nullable fieldslocation, email, blog, and company are user-supplied and are frequently null.
  • Org size: very large organisations with 1 000+ repos will hit the maxReposPerOrg cap — adjust the input or paginate across multiple runs.

❓ FAQ

Which Actor should I use — GitHub Organization Scraper, GitHub User Scraper, or GitHub Repo Scraper?

Use this Actor when your input is a list of organisations (companies, open-source foundations, teams) and you want org-level metadata plus their public repo list. Use the GitHub User Scraper when your input is individual developer handles. Use the GitHub Repo Scraper when you have specific repo URLs and need commit/contributor detail.

How is this different from the GitHub REST API?

It isn't — under the hood we call the same public GitHub REST API endpoints. What we add is batch orchestration (fan out across hundreds of orgs in one run), automatic pagination, rate-limit pacing, retry logic, and a clean typed dataset ready for export. If you only need data for one or two orgs, the raw API is fine. For anything bigger, this Actor saves the plumbing.

What's a github org scraper good for in a real workflow?

DevRel teams use it to track competitor org activity weekly; sales teams feed it into enrichment pipelines to qualify OSS-heavy prospects; competitive-intel platforms monitor repo-count and follower churn across a curated watchlist.

How do I get github company data reliably at scale?

Provide a githubToken to unlock 5 000 requests/hour, set concurrency to 8–10, and cap maxReposPerOrg at 10 if you only need the headline stats. For bulk runs (1 000+ orgs) that approach finishes in under 15 minutes.

Can I use this as a github org repo list API alternative?

Yes. Set includeRepos: true and maxReposPerOrg: 100; the repos array on each output row gives you the full public repo list (up to 100) with name, stars, language, and last-push timestamp. Combine the outputs across multiple runs for a richer dataset.

Why is email empty for most orgs?

GitHub lets orgs hide their email from the public profile. We surface whatever the API returns — we never guess or infer.

Are private members listed?

No — this uses the public read-only API. Even with a token, private-member data requires org-admin scope inside the org itself, which is out of scope here.

Can I get GitHub Advanced Security findings?

That's a paid GitHub Advanced Security API — outside the scope of what public org endpoints expose.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab in Apify Console — we ship fixes weekly and we read every report.