GitHub Scraper avatar

GitHub Scraper

Pricing

Pay per usage

Go to Apify Store
GitHub Scraper

GitHub Scraper

Scrape GitHub repositories, users, organizations, and repository search results without a login. Pull a repo's full metadata by owner/repo, a user's or org's profile and repositories, or repository search results. Walks pagination up to your chosen limit.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Goutam Soni

Goutam Soni

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Extract public GitHub data without a login or API key. Scrape any repository's full metadata, any user's or organization's profile and repositories, and repository search results, with pagination walked automatically up to the limit you set.

What it does

  • Repository scraper. Pass any owner/repo (or a repo link) and get stars, forks, watchers, open issues, primary language, topics, license, repo size, default branch, and created / updated / last-push timestamps.
  • User and organization scraper. Pass a login and get the profile (name, bio, company, location, blog, email, followers, following, public repo and gist counts) and, optionally, every public repository the account owns.
  • Repository search. Run any search query with the standard GitHub search syntax (language, stars, topic, etc.) and collect the matching repositories.
  • No login, no API key required. Runs out of the box. An optional access token can be supplied to raise the request budget on very large runs, but it is never required.
  • Automatic pagination. Multi-page user repository lists and search results are walked page by page until your limit is reached or the source is exhausted.
  • Clean, normalized output. Every row is a flat object with a stable schema and a type field (repo or user), ready for spreadsheets, databases, or downstream automation.

Use cases

  • Developer lead generation. Build lists of maintainers and organizations in a niche (for example, repositories matching language:rust stars:>1000) and enrich them with profile and contact fields.
  • Open-source market research. Track stars, forks, and activity timestamps across a set of competing projects to see which tools are gaining traction.
  • Tech recruiting and sourcing. Pull contributor organizations and their public repositories to find active engineers by language and topic.
  • Portfolio and ecosystem monitoring. Re-run on a schedule to watch how a list of repositories or accounts changes over time.
  • Dataset building. Assemble a structured dataset of repositories by topic or search query for analysis, dashboards, or training data.

Input

FieldTypeDescription
reposarrayRepositories to fetch full metadata for, as owner/repo or a repo link.
usersarrayUser or organization logins to fetch, with or without @, or a profile link.
searchQueriesarrayRepository search queries using the standard GitHub search syntax.
maxItemsPerSourceintegerCap per user (repositories) or per search query. Default 100. Search is capped at 1000 by the source.
includeUserReposbooleanWhen on, each user or organization also emits its public repositories after the profile row. Default true.
searchSortstringSort search results: best match (default), stars, forks, or updated.
githubTokenstringOptional access token to raise the request budget for very large runs. Not required.
concurrencyintegerHow many sources to process in parallel. Default 5.
proxyConfigobjectOptional proxy configuration. Not required for public data.

Example input

{
"repos": ["apify/crawlee", "example_user/example-repo"],
"users": ["example_user"],
"searchQueries": ["web scraping language:python stars:>1000"],
"maxItemsPerSource": 200,
"searchSort": "stars"
}

Output

Each result is pushed as one row to the dataset. A repository row:

{
"type": "repo",
"id": 66670819,
"fullName": "example_user/example-repo",
"name": "example-repo",
"owner": "example_user",
"ownerType": "Organization",
"url": "https://github.com/example_user/example-repo",
"homepage": "https://example.com",
"stars": 23801,
"forks": 1433,
"watchers": 131,
"openIssues": 172,
"sizeKb": 166212,
"description": "An example repository.",
"language": "TypeScript",
"topics": ["example", "scraper"],
"license": "Apache-2.0",
"defaultBranch": "main",
"avatarUrl": "https://avatars.githubusercontent.com/u/000000?v=4",
"isFork": false,
"isArchived": false,
"createdAt": "2016-08-26T18:35:03Z",
"updatedAt": "2026-06-17T16:38:29Z",
"pushedAt": "2026-06-17T14:49:49Z",
"scrapedAt": "2026-06-18T08:15:00Z"
}

A user or organization row (type: "user") carries login, name, accountType, url, followers, following, publicRepos, publicGists, bio, company, location, blog, email, twitter, avatarUrl, createdAt, and updatedAt.

Key fields. Columns are ordered by importance: identity first (type, id, fullName/login, name, owner, url), then metrics (stars, forks, watchers, openIssues, followers), then content (description, language, topics, license), then media and timestamps. watchers is the true count of accounts watching a repository and is filled in for single-repository lookups. Profile fields such as company, email, and twitter are present only when the account has set them publicly; they are null otherwise.

FAQ

Is it free? How is it priced? The actor is pay-per-result, so you are billed per row returned. There is no separate per-run start fee. Check the current rate on the actor's pricing tab before you run.

Do I need a GitHub login or API key? No. It runs without any account, password, or API key. You can optionally provide your own access token to raise the request budget on very large runs, but it is never required.

How many results can I get? Repository and user-profile lookups return one row each. A user's or organization's repositories and each search query return up to maxItemsPerSource rows (search is capped at 1000 per query by the source). Pagination is walked automatically across multiple pages to reach your target.

How fast is it? Sources are processed in parallel (set by concurrency), and each result is streamed to the dataset as it is collected, so partial output is available while a large run is still in progress.

Can I scrape a private repository? No. Only public repositories, users, and organizations are accessible. Supplying an access token raises the request budget but does not unlock private data.

What input formats are accepted for repos and users? Repositories accept owner/repo or a full repo link. Users and organizations accept a bare login, an @login, or a full profile link. Inputs are cleaned automatically.