GitHub Scraper
Pricing
Pay per usage
GitHub Scraper
Scrape GitHub repositories, users, organizations, and repository search results without a login. Pull a repo's full metadata by owner/repo, a user's or org's profile and repositories, or repository search results. Walks pagination up to your chosen limit.
Pricing
Pay per usage
Rating
5.0
(1)
Developer
Goutam Soni
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Extract public GitHub data without a login or API key. Scrape any repository's full metadata, any user's or organization's profile and repositories, and repository search results, with pagination walked automatically up to the limit you set.
What it does
- Repository scraper. Pass any
owner/repo(or a repo link) and get stars, forks, watchers, open issues, primary language, topics, license, repo size, default branch, and created / updated / last-push timestamps. - User and organization scraper. Pass a login and get the profile (name, bio, company, location, blog, email, followers, following, public repo and gist counts) and, optionally, every public repository the account owns.
- Repository search. Run any search query with the standard GitHub search syntax (language, stars, topic, etc.) and collect the matching repositories.
- No login, no API key required. Runs out of the box. An optional access token can be supplied to raise the request budget on very large runs, but it is never required.
- Automatic pagination. Multi-page user repository lists and search results are walked page by page until your limit is reached or the source is exhausted.
- Clean, normalized output. Every row is a flat object with a stable schema and a
typefield (repooruser), ready for spreadsheets, databases, or downstream automation.
Use cases
- Developer lead generation. Build lists of maintainers and organizations in a niche (for example, repositories matching
language:rust stars:>1000) and enrich them with profile and contact fields. - Open-source market research. Track stars, forks, and activity timestamps across a set of competing projects to see which tools are gaining traction.
- Tech recruiting and sourcing. Pull contributor organizations and their public repositories to find active engineers by language and topic.
- Portfolio and ecosystem monitoring. Re-run on a schedule to watch how a list of repositories or accounts changes over time.
- Dataset building. Assemble a structured dataset of repositories by topic or search query for analysis, dashboards, or training data.
Input
| Field | Type | Description |
|---|---|---|
repos | array | Repositories to fetch full metadata for, as owner/repo or a repo link. |
users | array | User or organization logins to fetch, with or without @, or a profile link. |
searchQueries | array | Repository search queries using the standard GitHub search syntax. |
maxItemsPerSource | integer | Cap per user (repositories) or per search query. Default 100. Search is capped at 1000 by the source. |
includeUserRepos | boolean | When on, each user or organization also emits its public repositories after the profile row. Default true. |
searchSort | string | Sort search results: best match (default), stars, forks, or updated. |
githubToken | string | Optional access token to raise the request budget for very large runs. Not required. |
concurrency | integer | How many sources to process in parallel. Default 5. |
proxyConfig | object | Optional proxy configuration. Not required for public data. |
Example input
{"repos": ["apify/crawlee", "example_user/example-repo"],"users": ["example_user"],"searchQueries": ["web scraping language:python stars:>1000"],"maxItemsPerSource": 200,"searchSort": "stars"}
Output
Each result is pushed as one row to the dataset. A repository row:
{"type": "repo","id": 66670819,"fullName": "example_user/example-repo","name": "example-repo","owner": "example_user","ownerType": "Organization","url": "https://github.com/example_user/example-repo","homepage": "https://example.com","stars": 23801,"forks": 1433,"watchers": 131,"openIssues": 172,"sizeKb": 166212,"description": "An example repository.","language": "TypeScript","topics": ["example", "scraper"],"license": "Apache-2.0","defaultBranch": "main","avatarUrl": "https://avatars.githubusercontent.com/u/000000?v=4","isFork": false,"isArchived": false,"createdAt": "2016-08-26T18:35:03Z","updatedAt": "2026-06-17T16:38:29Z","pushedAt": "2026-06-17T14:49:49Z","scrapedAt": "2026-06-18T08:15:00Z"}
A user or organization row (type: "user") carries login, name, accountType, url, followers, following, publicRepos, publicGists, bio, company, location, blog, email, twitter, avatarUrl, createdAt, and updatedAt.
Key fields. Columns are ordered by importance: identity first (type, id, fullName/login, name, owner, url), then metrics (stars, forks, watchers, openIssues, followers), then content (description, language, topics, license), then media and timestamps. watchers is the true count of accounts watching a repository and is filled in for single-repository lookups. Profile fields such as company, email, and twitter are present only when the account has set them publicly; they are null otherwise.
FAQ
Is it free? How is it priced? The actor is pay-per-result, so you are billed per row returned. There is no separate per-run start fee. Check the current rate on the actor's pricing tab before you run.
Do I need a GitHub login or API key? No. It runs without any account, password, or API key. You can optionally provide your own access token to raise the request budget on very large runs, but it is never required.
How many results can I get?
Repository and user-profile lookups return one row each. A user's or organization's repositories and each search query return up to maxItemsPerSource rows (search is capped at 1000 per query by the source). Pagination is walked automatically across multiple pages to reach your target.
How fast is it?
Sources are processed in parallel (set by concurrency), and each result is streamed to the dataset as it is collected, so partial output is available while a large run is still in progress.
Can I scrape a private repository? No. Only public repositories, users, and organizations are accessible. Supplying an access token raises the request budget but does not unlock private data.
What input formats are accepted for repos and users?
Repositories accept owner/repo or a full repo link. Users and organizations accept a bare login, an @login, or a full profile link. Inputs are cleaned automatically.