GitHub Repository Search
Pricing
from $1.00 / 1,000 repo fetcheds
GitHub Repository Search
Search and scrape GitHub repositories by keyword, language, stars, forks, or topic. Extract structured repo metadata including owner, license, topics, and activity timestamps. Sort by stars, forks, or recently updated. Export to JSON, CSV, or API. No token required.
Pricing
from $1.00 / 1,000 repo fetcheds
Rating
0.0
(0)
Developer

ryan clinton
Actor stats
0
Bookmarked
10
Total users
6
Monthly active users
3 hours ago
Last modified
Categories
Share
Search and extract structured data from millions of GitHub repositories using the official GitHub Search API. Find open-source projects by keyword, programming language, star count, topic, and more -- then export clean, structured results to JSON, CSV, Excel, or connect directly to your workflow via the Apify API.
This actor handles pagination, rate limiting, and response transformation automatically, so you get a ready-to-use dataset of up to 1,000 repositories per search without writing any API integration code. Each result includes 23 fields of metadata: stars, forks, language, topics, license, owner details, timestamps, and more.
Why use GitHub Repository Search?
- No infrastructure required. The actor runs on Apify's cloud platform. You do not need to provision servers, write pagination logic, or implement rate limit handling.
- Structured, exportable data. Every run produces an Apify dataset you can download as JSON, CSV, Excel, or XML in one click, or access programmatically via the API.
- Automatic rate limit recovery. When GitHub returns a 403 rate limit response, the actor waits 60 seconds and retries automatically so you never lose data mid-run.
- Scheduled monitoring. Set up daily or weekly schedules to track new repositories appearing for any topic or technology.
- Built-in integrations. Push results to Google Sheets, Slack, webhooks, Zapier, Make, or any downstream tool without writing glue code.
- Works without a GitHub token. Unauthenticated access is supported out of the box. Provide a personal access token only if you need faster throughput.
Key features
- Full GitHub search syntax -- Supports qualifiers like
language:python,topic:react,stars:>1000,created:>2024-01-01,user:facebook,license:mit,archived:false, and Boolean operators. - Four sort modes -- Sort by stars, forks, recently updated, or best match (GitHub's relevance ranking).
- Minimum star filter -- Exclude low-activity repositories by setting a star threshold directly in the input, no query modification needed.
- Language filter -- Restrict results to a specific programming language via a dedicated input field.
- Up to 1,000 results per query -- Automatically paginates through up to 10 pages of 100 results each, the maximum the GitHub Search API allows.
- Authenticated and unauthenticated modes -- Works at 10 requests/min without a token, or 30 requests/min with a GitHub personal access token.
- 23-field structured output -- Each repository includes owner info, topics array, license, timestamps, size, archive status, fork status, and more.
How to use GitHub Repository Search
- Go to the GitHub Repository Search actor page on Apify and click Try for free.
- Enter a Search Query such as
web scraping,machine learning framework language:python, ortopic:nextjs stars:>500. - Optionally select a Sort By method -- Most Stars, Most Forks, Recently Updated, or Best Match.
- Optionally set a Min Stars value to exclude repositories below a popularity threshold.
- Optionally type a Language Filter (e.g.,
python,rust,typescript) to restrict results to one language. - Adjust Max Results to control how many repositories are returned (1 to 1,000; default is 30).
- Optionally paste a GitHub Token (personal access token) for 3x faster rate limits.
- Click Start and wait for results to appear in the Dataset tab.
- Download results as JSON, CSV, or Excel, or access them via the Apify API.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | -- | Search query supporting GitHub qualifiers like language:python, topic:react, stars:>1000. |
sortBy | string | No | stars | Sort order: stars, forks, updated, or best-match. |
minStars | integer | No | -- | Minimum star count. Appended to the query as stars:>=N. |
language | string | No | -- | Programming language filter. Appended to the query as language:X. |
maxResults | integer | No | 30 | Maximum number of repositories to return (1--1,000). |
githubToken | string | No | -- | GitHub personal access token for higher rate limits (30 req/min vs 10). |
Input example
{"query": "machine learning framework","sortBy": "stars","minStars": 500,"language": "python","maxResults": 50,"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}
Tips for best results
- Use GitHub search qualifiers directly in your query for precise targeting. For example,
topic:machine-learning language:python stars:>500 created:>2024-01-01finds popular, recent ML repositories. - Provide a GitHub personal access token when fetching more than 100 results. This triples the rate limit and cuts run time significantly.
- Combine the
minStarsandlanguageinput fields with your query for cleaner configuration without manually writing qualifiers. - Sort by
updatedto find actively maintained repositories rather than popular but potentially abandoned projects. - Start with a smaller
maxResultsvalue (e.g., 30) to verify your query returns relevant results before scaling to 1,000. - Use
archived:falsein your query to exclude repositories that are no longer actively maintained.
Output
Output example
{"fullName": "scrapy/scrapy","name": "scrapy","owner": "scrapy","ownerType": "Organization","ownerUrl": "https://github.com/scrapy","description": "Scrapy, a fast high-level web crawling & scraping framework for Python.","stars": 53200,"forks": 10580,"watchers": 53200,"openIssues": 478,"language": "Python","topics": ["crawler","crawling","framework","hacktoberfest","python","scraping","web-crawling","web-scraping"],"license": "BSD-3-Clause","homepage": "https://scrapy.org","repoUrl": "https://github.com/scrapy/scrapy","createdAt": "2010-02-22T02:01:14Z","updatedAt": "2025-05-15T08:32:01Z","pushedAt": "2025-05-14T19:44:33Z","sizeKb": 27894,"isArchived": false,"isFork": false,"defaultBranch": "master","extractedAt": "2025-05-15T12:00:00.000Z"}
Output fields
| Field | Type | Description |
|---|---|---|
fullName | string | Full repository name in owner/repo format. |
name | string | Repository name without the owner prefix. |
owner | string | GitHub username or organization that owns the repository. |
ownerType | string | Owner account type: User or Organization. |
ownerUrl | string | URL to the owner's GitHub profile page. |
description | string | Repository description as set by the owner. May be null. |
stars | integer | Number of stars (stargazers) the repository has received. |
forks | integer | Number of times the repository has been forked. |
watchers | integer | Number of watchers subscribed to the repository. |
openIssues | integer | Number of currently open issues and pull requests. |
language | string | Primary programming language detected by GitHub. May be null. |
topics | array | List of topic tags assigned to the repository. |
license | string | SPDX license identifier (e.g., MIT, Apache-2.0). May be null. |
homepage | string | Project homepage URL if set by the owner. May be null. |
repoUrl | string | Direct URL to the repository on GitHub. |
createdAt | string | ISO 8601 timestamp when the repository was created. |
updatedAt | string | ISO 8601 timestamp of the last repository metadata update. |
pushedAt | string | ISO 8601 timestamp of the most recent push to any branch. |
sizeKb | integer | Repository size in kilobytes. |
isArchived | boolean | Whether the repository has been archived by its owner. |
isFork | boolean | Whether the repository is a fork of another repository. |
defaultBranch | string | Name of the default branch (e.g., main, master). |
extractedAt | string | ISO 8601 timestamp when this record was extracted by the actor. |
Use cases
- Open-source discovery -- Find trending repositories in a specific language or topic area, sorted by stars or recent activity, to stay on top of the ecosystem.
- Competitive analysis -- Search for repositories related to your product category (e.g.,
topic:headless-cms language:javascript stars:>100) to understand the competitive landscape and identify emerging alternatives. - Technology research -- Identify the most popular frameworks, libraries, or tools for a given technology stack before making architectural decisions. Compare star counts, fork activity, and maintenance frequency across candidates.
- Recruitment and talent sourcing -- Find prolific open-source contributors by searching for active repositories in niche technologies and reviewing owner profiles linked in the output.
- Security and compliance auditing -- Search for repositories matching your organization's dependencies to track license types (MIT, Apache-2.0, GPL, etc.), maintenance status, and open issue counts.
- Academic research -- Build datasets of repositories for software engineering research, such as analyzing adoption patterns, licensing trends, or language popularity over time. Export directly to CSV for use with R, pandas, or other analysis tools.
API & Integration
You can call GitHub Repository Search programmatically using the Apify API or any of the official client libraries.
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_API_TOKEN")run = client.actor("SN4iHGft2Oos3a4Ud").call(run_input={"query": "web scraping","sortBy": "stars","minStars": 100,"language": "python","maxResults": 50,})for repo in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{repo['fullName']} - {repo['stars']} stars")
JavaScript
import { ApifyClient } from "apify-client";const client = new ApifyClient({ token: "YOUR_APIFY_API_TOKEN" });const run = await client.actor("SN4iHGft2Oos3a4Ud").call({query: "web scraping",sortBy: "stars",minStars: 100,language: "python",maxResults: 50,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach((repo) => {console.log(`${repo.fullName} - ${repo.stars} stars`);});
cURL
curl "https://api.apify.com/v2/acts/SN4iHGft2Oos3a4Ud/runs" \-X POST \-H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_APIFY_API_TOKEN" \-d '{"query": "web scraping","sortBy": "stars","minStars": 100,"language": "python","maxResults": 50}'
Integrations
- Google Sheets -- Automatically push repository data to a spreadsheet for tracking and analysis.
- Slack / Microsoft Teams -- Receive notifications when new repositories match your search criteria.
- Webhooks -- Trigger downstream workflows whenever a run completes.
- Zapier / Make -- Incorporate GitHub repository data into any automation pipeline.
- Apify Schedules -- Run the actor on a recurring basis to monitor emerging repositories over time.
How it works
Input query + filters|v+-------------------+| Build search URL | Append minStars, language qualifiers+-------------------+|v+-------------------+| GitHub Search API | GET api.github.com/search/repositories+-------------------+ 100 results per page, up to 10 pages||--- 403 rate limit? ---> Wait 60s, retry|v+-------------------+| Transform results | Map 23 fields per repository+-------------------+|v+-------------------+| Push to dataset | Apify dataset (JSON, CSV, Excel, XML)+-------------------+|vRun complete
- The actor reads your input and constructs a GitHub Search API query URL, appending any
minStarsandlanguagequalifiers to the base query string. - It sends paginated GET requests to
api.github.com/search/repositorieswith theAccept: application/vnd.github.v3+jsonheader. Each page returns up to 100 results. - Between each request, the actor enforces a delay of 6.5 seconds (unauthenticated) or 2.1 seconds (authenticated) to stay within GitHub's rate limits.
- If a 403 rate limit response is received, the actor waits 60 seconds and retries the same page. No data is lost during rate limit events.
- Pagination continues until
maxResultsis reached, no more results are available, or 10 pages have been fetched (the GitHub Search API hard limit). - Each raw API response is transformed into a clean 23-field object with consistent camelCase naming, stripping unnecessary nested structures.
- Results are pushed to the Apify dataset one record at a time as they are processed.
- A summary log is printed showing total repos returned, aggregate star count, top languages, and license distribution.
Performance & cost
| Scenario | Results | Pages | Approx. Time (no token) | Approx. Time (with token) | Compute Units |
|---|---|---|---|---|---|
| Quick search | 30 | 1 | ~5 seconds | ~3 seconds | ~0.005 |
| Medium batch | 100 | 1 | ~5 seconds | ~3 seconds | ~0.005 |
| Large batch | 500 | 5 | ~35 seconds | ~12 seconds | ~0.01 |
| Maximum | 1,000 | 10 | ~65 seconds | ~22 seconds | ~0.02 |
- The actor runs with 256 MB of memory (Apify minimum) since it only makes REST API calls with no browser rendering.
- The GitHub Search API itself is free for both authenticated and unauthenticated requests. There is no cost on the GitHub side.
- Unauthenticated mode enforces a 6.5-second delay between requests (10 req/min). Authenticated mode reduces this to 2.1 seconds (30 req/min).
- Total cost for even the largest run is typically under $0.01 in Apify compute units.
- The actor uses the GitHub REST API v3 with the
application/vnd.github.v3+jsonmedia type and identifies itself with theApifyGitHubSearch/1.0User-Agent header. - Run time is dominated by the inter-request delay, not data processing. Providing a GitHub token is the single most impactful way to speed up large runs.
Limitations
- GitHub API maximum of 1,000 results. The GitHub Search API caps results at 1,000 per query regardless of how many repositories match. Use narrower qualifiers to target specific subsets if you need different slices of a large result set.
- Rate limits. Unauthenticated access allows 10 requests per minute; authenticated access allows 30. The actor handles this automatically but large runs will take longer without a token.
- Search index lag. GitHub's search index may take a few minutes to reflect very recent repository changes such as newly created repos or updated star counts.
- No file content search. This actor searches repository metadata only (name, description, topics, language). It does not search within file contents or README text. Use GitHub's code search for that.
- No private repositories. Only public repositories are returned. Private repositories are not accessible through the GitHub Search API, even with a token, unless the token has explicit access to those repositories.
- Sort order affects which 1,000 results you get. When a query matches more than 1,000 repositories, the 1,000 returned depend on the sort order. Switching between
stars,forks,updated, andbest-matchmay return different subsets of the total result pool.
Responsible use
This actor queries the official GitHub REST API and complies with GitHub's rate limiting and Terms of Service. It identifies itself with the ApifyGitHubSearch/1.0 User-Agent string. The actor does not scrape the GitHub website, bypass authentication, or access private data.
- All data returned is publicly available through GitHub's official API.
- Rate limits are respected automatically with built-in delays and retry logic.
- No personal or private user data is collected beyond what GitHub exposes in public repository metadata.
Please ensure your usage complies with GitHub's Acceptable Use Policies and API Terms of Service.
FAQ
Do I need a GitHub account or token to use this actor? No. The actor works without any authentication using GitHub's public API access at 10 requests per minute. A GitHub personal access token is optional and only needed for faster throughput (30 req/min) when fetching large result sets.
What is the maximum number of repositories I can retrieve? The GitHub Search API enforces a hard limit of 1,000 results per query (10 pages of 100 results). If your search matches more than 1,000 repositories, you will get the top 1,000 sorted by your chosen sort order. Use narrower qualifiers to target different subsets.
Can I search by topic, creation date, or license?
Yes. The query field accepts the full GitHub search syntax. You can use qualifiers such as topic:react, created:>2024-01-01, license:mit, archived:false, user:facebook, org:google, and many more. See the GitHub search documentation for the complete list.
How fresh is the data? The actor queries the GitHub API in real time. Star counts, fork counts, open issues, and all other metrics reflect the latest values at the moment the actor runs.
Can I export results to CSV or Google Sheets? Yes. After a run completes, you can download the dataset in JSON, CSV, Excel, or XML format from the Apify Console. You can also connect the output to Google Sheets, Slack, or other tools using Apify's built-in integrations or webhooks.
How do I create a GitHub personal access token? Go to GitHub Settings > Developer Settings > Personal Access Tokens and generate a new token. No special scopes are required for searching public repositories -- a token with no scopes selected will work and provides the higher rate limit.
What does the best-match sort option do?
When you select best-match, the actor omits the sort parameter and lets GitHub's own relevance algorithm rank results. GitHub considers factors like keyword match quality, repository activity, and popularity. This is useful when you want the most relevant results rather than simply the most starred ones.
Can I run this actor on a schedule? Yes. In the Apify Console, navigate to the actor's page and click Schedules to set up daily, weekly, or custom cron-based runs. Combine with integrations like Google Sheets or Slack to get automated reports of newly discovered repositories.
Related actors
| Actor | Description | Link |
|---|---|---|
| Stack Overflow & StackExchange Search | Search programming Q&A to complement GitHub research. | View actor |
| Hacker News Search | Find discussions about open-source projects on Hacker News. | View actor |
| Website Tech Stack Detector | Identify technologies used by websites built with projects you discover. | View actor |
| NVD CVE Vulnerability Search | Check for known security vulnerabilities in repository dependencies. | View actor |
| CISA KEV Catalog | Search the CISA Known Exploited Vulnerabilities catalog. | View actor |
| WHOIS Domain Lookup | Look up domain registration data for project homepages you discover. | View actor |