GitHub Repository & Trending Scraper
Pricing
$6.99/month + usage
GitHub Repository & Trending Scraper
GitHub Repository & Trending Scraper extracts repository data from GitHub in two powerful modes: Search (by keyword, language, stars) and Trending (daily/weekly/monthly popular repos)
Pricing
$6.99/month + usage
Rating
0.0
(0)
Developer

Scrape Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
GitHub Repository & Trending Scraper
A powerful GitHub scraper that extracts repository data from GitHub β either by search (keywords, language, stars range) or by fetching trending repositories (daily/weekly/monthly). Get clean, structured JSON output perfect for developer tools, analytics dashboards, research, or building your own GitHub insights platform.
π Focus keyword: github scraper β built for flexibility, speed, and reliability.
β¨ Features
- π Search mode β Search repositories by keyword, language, stars, forks, etc. (supports GitHub search syntax).
- π Trending mode β Fetch trending repositories for any language (daily, weekly, monthly).
- π Rich repository data β Extracts name, description, URL, stars, forks, watchers, language, topics, license, dates, owner info, and even the README URL.
- π Proxy support β Builtβin Apify proxy with residential groups to avoid rate limiting.
- π¦ Scalable β Set
max_resultsfrom 1 to 1000+ (respects GitHub pagination). - π§Ή Clean output β Consistent schema with null values where data is missing.
- β‘ Fast & efficient β Uses concurrency and request queuing to maximise throughput.
- π Easy integration β Run on Apify platform or use as a Node.js module.
π How It Works
- Choose mode β
search(custom query) ortrending(preβdefined trending lists). - Provide parameters β For search: query, language, sort, order. For trending: language, period.
- Scraping process β The actor sends requests to GitHub, parses repository data, and follows pagination.
- Proxy rotation β For large jobs, residential proxies help avoid IPβbased blocks.
- Output β Returns a clean JSON array of repository objects.
π₯ Input Schema
The actor accepts the following input fields. Depending on mode, different fields are required.
| Field | Type | Default | Description |
|---|---|---|---|
mode | String | "search" | Either "search" or "trending". |
query | String | "" | (Search mode) GitHub search query (e.g., "machine learning"). Supports advanced syntax. |
language | String | "" | (Search or trending) Filter by programming language (e.g., "python", "javascript"). |
sort | String | "stars" | (Search mode) Sort by: "stars", "forks", "updated". |
order | String | "desc" | (Search mode) "desc" or "asc". |
period | String | "daily" | (Trending mode) "daily", "weekly", or "monthly". |
max_results | Integer | 25 | Maximum number of repositories to return (1β1000). |
proxyConfiguration | Object | { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] } | Proxy settings. See Proxy Configuration. |
include_readme | Boolean | false | If true, includes the raw README URL (always included, but this flag is for internal use). |
Example Input (Search Mode)
{"mode": "search","query": "machine learning","language": "python","sort": "stars","order": "desc","max_results": 20,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Example Input (Trending Mode)
{"mode": "trending","language": "javascript","period": "weekly","max_results": 30}
π€ Output Format
The actor returns an array of repository objects. Each object contains the following fields:
| Field | Type | Description |
|---|---|---|
type | String | Always "repository". |
name | String | Full repository name (e.g., "owner/repo"). |
description | String or null | Repository description. |
url | String | GitHub URL of the repository. |
stars | Integer | Number of stars. |
forks | Integer | Number of forks. |
watchers | Integer | Number of watchers (usually same as stars). |
language | String or null | Primary programming language. |
topics | Array of strings | Repository topics/tags. |
license | String or null | License name. |
created_at | String | Creation date (YYYY-MM-DD). |
updated_at | String | Last update date. |
pushed_at | String | Last push date. |
open_issues | Integer | Number of open issues. |
size_kb | Integer | Repository size in KB. |
default_branch | String | Default branch name. |
is_fork | Boolean | Whether the repo is a fork. |
homepage | String or null | Project homepage URL. |
owner | String | Owner login. |
owner_type | String | "User" or "Organization". |
readme_url | String or null | Raw URL of the README file (if exists). |
Example Output (Truncated)
[{"type": "repository","name": "codecrafters-io/build-your-own-x","description": "Master programming by recreating your favorite technologies from scratch.","url": "https://github.com/codecrafters-io/build-your-own-x","stars": 475006,"forks": 44543,"watchers": 475006,"language": "Markdown","topics": ["awesome-list", "free", "programming", "tutorial-code"],"license": null,"created_at": "2018-05-09","updated_at": "2026-03-14","pushed_at": "2026-02-21","open_issues": 442,"size_kb": 1201,"default_branch": "master","is_fork": false,"homepage": "https://codecrafters.io","owner": "codecrafters-io","owner_type": "Organization","readme_url": "https://raw.githubusercontent.com/codecrafters-io/build-your-own-x/master/README.md"},{"type": "repository","name": "sindresorhus/awesome","description": "π Awesome lists about all kinds of interesting topics","url": "https://github.com/sindresorhus/awesome","stars": 445369,"forks": 33527,"watchers": 445369,"language": null,"topics": ["awesome", "awesome-list", "lists", "resources"],"license": "Creative Commons Zero v1.0 Universal","created_at": "2014-07-11","updated_at": "2026-03-14","pushed_at": "2026-03-09","open_issues": 80,"size_kb": 1534,"default_branch": "main","is_fork": false,"homepage": "","owner": "sindresorhus","owner_type": "User","readme_url": "https://raw.githubusercontent.com/sindresorhus/awesome/main/README.md"}// ... up to max_results]
π οΈ Usage
βΆοΈ Run on Apify Console
- Go to Apify Console and open the Actor page for GitHub Repository & Trending Scraper.
- Click "Run".
- Select mode and fill in the required fields.
- Click "Start" and wait for results.
π Run via Apify API (cURL)
curl -X POST "https://api.apify.com/v2/acts/your-username~github-trending-scraper/runs?token=<YOUR_API_TOKEN>" \-H "Content-Type: application/json" \-d '{"mode": "search","query": "machine learning","language": "python","max_results": 10}'
π¦ Use as a Node.js Module
Install the package:
$npm install github-trending-scraper
Then use it in your code:
const { scrapeGitHub } = require('github-trending-scraper');(async () => {const repos = await scrapeGitHub({mode: 'trending',language: 'javascript',period: 'weekly',max_results: 20});console.log(repos);})();
π Proxy Configuration
To avoid IPβbased blocking, especially for large scraping jobs, you can configure proxies. The actor integrates seamlessly with Apify Proxy.
| Property | Type | Description |
|---|---|---|
useApifyProxy | Boolean | If true, enables Apify Proxy. Default: true. |
apifyProxyGroups | Array | Proxy groups: ["RESIDENTIAL"], ["DATACENTER"], or ["SHADER"]. Residential is recommended for GitHub. |
proxyUrls | Array | Custom proxy URLs (e.g., ["http://user:pass@proxy.example.com:8080"]). Ignored if useApifyProxy is true. |
Example with custom proxies:
{"proxyConfiguration": {"useApifyProxy": false,"proxyUrls": ["http://user:pass@123.45.67.89:8080"]}}
β FAQ / Troubleshooting
Q: What's the difference between search and trending mode?
A: Search mode uses GitHub's search API to find repositories matching your query. Trending mode scrapes the GitHub trending page (like https://github.com/trending), which shows popular repositories for a given language and time period.
Q: Can I get more than 1000 results?
A: GitHub's search API limits results to 1000. If you need more, you can run multiple queries with different filters. Trending mode is limited by what GitHub shows on the trending pages (usually around 25 per page, up to a few hundred).
Q: Why are some fields null?
A: Some repositories may not have a description, license, topics, etc. The actor sets those fields to null.
Q: I'm getting blocked / rate limited.
A: Enable residential proxies (apifyProxyGroups: ["RESIDENTIAL"]). You can also reduce max_results or add delays (not yet configurable, but planned).
Q: Can I scrape issues or pull requests?
A: Not in this version. This actor focuses on repository metadata only.