GitHub Repository Search avatar

GitHub Repository Search

Pricing

from $1.00 / 1,000 repo fetcheds

Go to Apify Store
GitHub Repository Search

GitHub Repository Search

Search and scrape GitHub repositories by keyword, language, stars, forks, or topic. Extract structured repo metadata including owner, license, topics, and activity timestamps. Sort by stars, forks, or recently updated. Export to JSON, CSV, or API. No token required.

Pricing

from $1.00 / 1,000 repo fetcheds

Rating

0.0

(0)

Developer

ryan clinton

ryan clinton

Maintained by Community

Actor stats

0

Bookmarked

10

Total users

6

Monthly active users

3 hours ago

Last modified

Categories

Share

Search and extract structured data from millions of GitHub repositories using the official GitHub Search API. Find open-source projects by keyword, programming language, star count, topic, and more -- then export clean, structured results to JSON, CSV, Excel, or connect directly to your workflow via the Apify API.

This actor handles pagination, rate limiting, and response transformation automatically, so you get a ready-to-use dataset of up to 1,000 repositories per search without writing any API integration code. Each result includes 23 fields of metadata: stars, forks, language, topics, license, owner details, timestamps, and more.


  • No infrastructure required. The actor runs on Apify's cloud platform. You do not need to provision servers, write pagination logic, or implement rate limit handling.
  • Structured, exportable data. Every run produces an Apify dataset you can download as JSON, CSV, Excel, or XML in one click, or access programmatically via the API.
  • Automatic rate limit recovery. When GitHub returns a 403 rate limit response, the actor waits 60 seconds and retries automatically so you never lose data mid-run.
  • Scheduled monitoring. Set up daily or weekly schedules to track new repositories appearing for any topic or technology.
  • Built-in integrations. Push results to Google Sheets, Slack, webhooks, Zapier, Make, or any downstream tool without writing glue code.
  • Works without a GitHub token. Unauthenticated access is supported out of the box. Provide a personal access token only if you need faster throughput.

Key features

  • Full GitHub search syntax -- Supports qualifiers like language:python, topic:react, stars:>1000, created:>2024-01-01, user:facebook, license:mit, archived:false, and Boolean operators.
  • Four sort modes -- Sort by stars, forks, recently updated, or best match (GitHub's relevance ranking).
  • Minimum star filter -- Exclude low-activity repositories by setting a star threshold directly in the input, no query modification needed.
  • Language filter -- Restrict results to a specific programming language via a dedicated input field.
  • Up to 1,000 results per query -- Automatically paginates through up to 10 pages of 100 results each, the maximum the GitHub Search API allows.
  • Authenticated and unauthenticated modes -- Works at 10 requests/min without a token, or 30 requests/min with a GitHub personal access token.
  • 23-field structured output -- Each repository includes owner info, topics array, license, timestamps, size, archive status, fork status, and more.

  1. Go to the GitHub Repository Search actor page on Apify and click Try for free.
  2. Enter a Search Query such as web scraping, machine learning framework language:python, or topic:nextjs stars:>500.
  3. Optionally select a Sort By method -- Most Stars, Most Forks, Recently Updated, or Best Match.
  4. Optionally set a Min Stars value to exclude repositories below a popularity threshold.
  5. Optionally type a Language Filter (e.g., python, rust, typescript) to restrict results to one language.
  6. Adjust Max Results to control how many repositories are returned (1 to 1,000; default is 30).
  7. Optionally paste a GitHub Token (personal access token) for 3x faster rate limits.
  8. Click Start and wait for results to appear in the Dataset tab.
  9. Download results as JSON, CSV, or Excel, or access them via the Apify API.

Input parameters

ParameterTypeRequiredDefaultDescription
querystringYes--Search query supporting GitHub qualifiers like language:python, topic:react, stars:>1000.
sortBystringNostarsSort order: stars, forks, updated, or best-match.
minStarsintegerNo--Minimum star count. Appended to the query as stars:>=N.
languagestringNo--Programming language filter. Appended to the query as language:X.
maxResultsintegerNo30Maximum number of repositories to return (1--1,000).
githubTokenstringNo--GitHub personal access token for higher rate limits (30 req/min vs 10).

Input example

{
"query": "machine learning framework",
"sortBy": "stars",
"minStars": 500,
"language": "python",
"maxResults": 50,
"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}

Tips for best results

  • Use GitHub search qualifiers directly in your query for precise targeting. For example, topic:machine-learning language:python stars:>500 created:>2024-01-01 finds popular, recent ML repositories.
  • Provide a GitHub personal access token when fetching more than 100 results. This triples the rate limit and cuts run time significantly.
  • Combine the minStars and language input fields with your query for cleaner configuration without manually writing qualifiers.
  • Sort by updated to find actively maintained repositories rather than popular but potentially abandoned projects.
  • Start with a smaller maxResults value (e.g., 30) to verify your query returns relevant results before scaling to 1,000.
  • Use archived:false in your query to exclude repositories that are no longer actively maintained.

Output

Output example

{
"fullName": "scrapy/scrapy",
"name": "scrapy",
"owner": "scrapy",
"ownerType": "Organization",
"ownerUrl": "https://github.com/scrapy",
"description": "Scrapy, a fast high-level web crawling & scraping framework for Python.",
"stars": 53200,
"forks": 10580,
"watchers": 53200,
"openIssues": 478,
"language": "Python",
"topics": [
"crawler",
"crawling",
"framework",
"hacktoberfest",
"python",
"scraping",
"web-crawling",
"web-scraping"
],
"license": "BSD-3-Clause",
"homepage": "https://scrapy.org",
"repoUrl": "https://github.com/scrapy/scrapy",
"createdAt": "2010-02-22T02:01:14Z",
"updatedAt": "2025-05-15T08:32:01Z",
"pushedAt": "2025-05-14T19:44:33Z",
"sizeKb": 27894,
"isArchived": false,
"isFork": false,
"defaultBranch": "master",
"extractedAt": "2025-05-15T12:00:00.000Z"
}

Output fields

FieldTypeDescription
fullNamestringFull repository name in owner/repo format.
namestringRepository name without the owner prefix.
ownerstringGitHub username or organization that owns the repository.
ownerTypestringOwner account type: User or Organization.
ownerUrlstringURL to the owner's GitHub profile page.
descriptionstringRepository description as set by the owner. May be null.
starsintegerNumber of stars (stargazers) the repository has received.
forksintegerNumber of times the repository has been forked.
watchersintegerNumber of watchers subscribed to the repository.
openIssuesintegerNumber of currently open issues and pull requests.
languagestringPrimary programming language detected by GitHub. May be null.
topicsarrayList of topic tags assigned to the repository.
licensestringSPDX license identifier (e.g., MIT, Apache-2.0). May be null.
homepagestringProject homepage URL if set by the owner. May be null.
repoUrlstringDirect URL to the repository on GitHub.
createdAtstringISO 8601 timestamp when the repository was created.
updatedAtstringISO 8601 timestamp of the last repository metadata update.
pushedAtstringISO 8601 timestamp of the most recent push to any branch.
sizeKbintegerRepository size in kilobytes.
isArchivedbooleanWhether the repository has been archived by its owner.
isForkbooleanWhether the repository is a fork of another repository.
defaultBranchstringName of the default branch (e.g., main, master).
extractedAtstringISO 8601 timestamp when this record was extracted by the actor.

Use cases

  • Open-source discovery -- Find trending repositories in a specific language or topic area, sorted by stars or recent activity, to stay on top of the ecosystem.
  • Competitive analysis -- Search for repositories related to your product category (e.g., topic:headless-cms language:javascript stars:>100) to understand the competitive landscape and identify emerging alternatives.
  • Technology research -- Identify the most popular frameworks, libraries, or tools for a given technology stack before making architectural decisions. Compare star counts, fork activity, and maintenance frequency across candidates.
  • Recruitment and talent sourcing -- Find prolific open-source contributors by searching for active repositories in niche technologies and reviewing owner profiles linked in the output.
  • Security and compliance auditing -- Search for repositories matching your organization's dependencies to track license types (MIT, Apache-2.0, GPL, etc.), maintenance status, and open issue counts.
  • Academic research -- Build datasets of repositories for software engineering research, such as analyzing adoption patterns, licensing trends, or language popularity over time. Export directly to CSV for use with R, pandas, or other analysis tools.

API & Integration

You can call GitHub Repository Search programmatically using the Apify API or any of the official client libraries.

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_API_TOKEN")
run = client.actor("SN4iHGft2Oos3a4Ud").call(run_input={
"query": "web scraping",
"sortBy": "stars",
"minStars": 100,
"language": "python",
"maxResults": 50,
})
for repo in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{repo['fullName']} - {repo['stars']} stars")

JavaScript

import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_APIFY_API_TOKEN" });
const run = await client.actor("SN4iHGft2Oos3a4Ud").call({
query: "web scraping",
sortBy: "stars",
minStars: 100,
language: "python",
maxResults: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((repo) => {
console.log(`${repo.fullName} - ${repo.stars} stars`);
});

cURL

curl "https://api.apify.com/v2/acts/SN4iHGft2Oos3a4Ud/runs" \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_APIFY_API_TOKEN" \
-d '{
"query": "web scraping",
"sortBy": "stars",
"minStars": 100,
"language": "python",
"maxResults": 50
}'

Integrations

  • Google Sheets -- Automatically push repository data to a spreadsheet for tracking and analysis.
  • Slack / Microsoft Teams -- Receive notifications when new repositories match your search criteria.
  • Webhooks -- Trigger downstream workflows whenever a run completes.
  • Zapier / Make -- Incorporate GitHub repository data into any automation pipeline.
  • Apify Schedules -- Run the actor on a recurring basis to monitor emerging repositories over time.

How it works

Input query + filters
|
v
+-------------------+
| Build search URL | Append minStars, language qualifiers
+-------------------+
|
v
+-------------------+
| GitHub Search API | GET api.github.com/search/repositories
+-------------------+ 100 results per page, up to 10 pages
|
|--- 403 rate limit? ---> Wait 60s, retry
|
v
+-------------------+
| Transform results | Map 23 fields per repository
+-------------------+
|
v
+-------------------+
| Push to dataset | Apify dataset (JSON, CSV, Excel, XML)
+-------------------+
|
v
Run complete
  1. The actor reads your input and constructs a GitHub Search API query URL, appending any minStars and language qualifiers to the base query string.
  2. It sends paginated GET requests to api.github.com/search/repositories with the Accept: application/vnd.github.v3+json header. Each page returns up to 100 results.
  3. Between each request, the actor enforces a delay of 6.5 seconds (unauthenticated) or 2.1 seconds (authenticated) to stay within GitHub's rate limits.
  4. If a 403 rate limit response is received, the actor waits 60 seconds and retries the same page. No data is lost during rate limit events.
  5. Pagination continues until maxResults is reached, no more results are available, or 10 pages have been fetched (the GitHub Search API hard limit).
  6. Each raw API response is transformed into a clean 23-field object with consistent camelCase naming, stripping unnecessary nested structures.
  7. Results are pushed to the Apify dataset one record at a time as they are processed.
  8. A summary log is printed showing total repos returned, aggregate star count, top languages, and license distribution.

Performance & cost

ScenarioResultsPagesApprox. Time (no token)Approx. Time (with token)Compute Units
Quick search301~5 seconds~3 seconds~0.005
Medium batch1001~5 seconds~3 seconds~0.005
Large batch5005~35 seconds~12 seconds~0.01
Maximum1,00010~65 seconds~22 seconds~0.02
  • The actor runs with 256 MB of memory (Apify minimum) since it only makes REST API calls with no browser rendering.
  • The GitHub Search API itself is free for both authenticated and unauthenticated requests. There is no cost on the GitHub side.
  • Unauthenticated mode enforces a 6.5-second delay between requests (10 req/min). Authenticated mode reduces this to 2.1 seconds (30 req/min).
  • Total cost for even the largest run is typically under $0.01 in Apify compute units.
  • The actor uses the GitHub REST API v3 with the application/vnd.github.v3+json media type and identifies itself with the ApifyGitHubSearch/1.0 User-Agent header.
  • Run time is dominated by the inter-request delay, not data processing. Providing a GitHub token is the single most impactful way to speed up large runs.

Limitations

  • GitHub API maximum of 1,000 results. The GitHub Search API caps results at 1,000 per query regardless of how many repositories match. Use narrower qualifiers to target specific subsets if you need different slices of a large result set.
  • Rate limits. Unauthenticated access allows 10 requests per minute; authenticated access allows 30. The actor handles this automatically but large runs will take longer without a token.
  • Search index lag. GitHub's search index may take a few minutes to reflect very recent repository changes such as newly created repos or updated star counts.
  • No file content search. This actor searches repository metadata only (name, description, topics, language). It does not search within file contents or README text. Use GitHub's code search for that.
  • No private repositories. Only public repositories are returned. Private repositories are not accessible through the GitHub Search API, even with a token, unless the token has explicit access to those repositories.
  • Sort order affects which 1,000 results you get. When a query matches more than 1,000 repositories, the 1,000 returned depend on the sort order. Switching between stars, forks, updated, and best-match may return different subsets of the total result pool.

Responsible use

This actor queries the official GitHub REST API and complies with GitHub's rate limiting and Terms of Service. It identifies itself with the ApifyGitHubSearch/1.0 User-Agent string. The actor does not scrape the GitHub website, bypass authentication, or access private data.

  • All data returned is publicly available through GitHub's official API.
  • Rate limits are respected automatically with built-in delays and retry logic.
  • No personal or private user data is collected beyond what GitHub exposes in public repository metadata.

Please ensure your usage complies with GitHub's Acceptable Use Policies and API Terms of Service.


FAQ

Do I need a GitHub account or token to use this actor? No. The actor works without any authentication using GitHub's public API access at 10 requests per minute. A GitHub personal access token is optional and only needed for faster throughput (30 req/min) when fetching large result sets.

What is the maximum number of repositories I can retrieve? The GitHub Search API enforces a hard limit of 1,000 results per query (10 pages of 100 results). If your search matches more than 1,000 repositories, you will get the top 1,000 sorted by your chosen sort order. Use narrower qualifiers to target different subsets.

Can I search by topic, creation date, or license? Yes. The query field accepts the full GitHub search syntax. You can use qualifiers such as topic:react, created:>2024-01-01, license:mit, archived:false, user:facebook, org:google, and many more. See the GitHub search documentation for the complete list.

How fresh is the data? The actor queries the GitHub API in real time. Star counts, fork counts, open issues, and all other metrics reflect the latest values at the moment the actor runs.

Can I export results to CSV or Google Sheets? Yes. After a run completes, you can download the dataset in JSON, CSV, Excel, or XML format from the Apify Console. You can also connect the output to Google Sheets, Slack, or other tools using Apify's built-in integrations or webhooks.

How do I create a GitHub personal access token? Go to GitHub Settings > Developer Settings > Personal Access Tokens and generate a new token. No special scopes are required for searching public repositories -- a token with no scopes selected will work and provides the higher rate limit.

What does the best-match sort option do? When you select best-match, the actor omits the sort parameter and lets GitHub's own relevance algorithm rank results. GitHub considers factors like keyword match quality, repository activity, and popularity. This is useful when you want the most relevant results rather than simply the most starred ones.

Can I run this actor on a schedule? Yes. In the Apify Console, navigate to the actor's page and click Schedules to set up daily, weekly, or custom cron-based runs. Combine with integrations like Google Sheets or Slack to get automated reports of newly discovered repositories.


ActorDescriptionLink
Stack Overflow & StackExchange SearchSearch programming Q&A to complement GitHub research.View actor
Hacker News SearchFind discussions about open-source projects on Hacker News.View actor
Website Tech Stack DetectorIdentify technologies used by websites built with projects you discover.View actor
NVD CVE Vulnerability SearchCheck for known security vulnerabilities in repository dependencies.View actor
CISA KEV CatalogSearch the CISA Known Exploited Vulnerabilities catalog.View actor
WHOIS Domain LookupLook up domain registration data for project homepages you discover.View actor