GitHub Trending Scraper
Pricing
Pay per event
GitHub Trending Scraper
Scrape GitHub Trending repositories by language and time range: today, this week, or this month. Extracts repo names, star counts, forks, star gains, and top contributors. Great for dev trend tracking, tech newsletters, and investment research. Export to JSON, CSV, or Excel.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
16
Total users
10
Monthly active users
9 days ago
Last modified
Categories
Share
Scrape trending repositories from GitHub Trending. Get stars, forks, language, description, topics, license, last-updated date, star growth metrics, and full README content for the hottest open-source projects — with built-in AI/ML filtering.
What does GitHub Trending Scraper do?
GitHub Trending Scraper extracts data from GitHub's trending repositories page and enriches each result with GitHub API data. Beyond the basic trending page data (stars, forks, language, description), it fetches topics, license, last-updated date, and optionally the full README markdown for each repository.
Key capabilities:
- 🔥 Scrape trending repos by language, time range (daily/weekly/monthly), and spoken language
- 🤖 Filter for AI/ML repos only — built-in classifier covering LLMs, deep learning, NLP, computer vision, and more
- 🏷️ Filter by GitHub topics — e.g., only repos tagged
llm,transformers,pytorch - 📄 Extract full README content — ideal for training datasets and documentation corpora
- 📈 Star growth metrics — calculate percentage growth (stars gained vs. prior baseline)
- 🔐 License detection — MIT, Apache-2.0, GPL, and others
Who is it for?
- 🤖 AI/ML researchers — building curated datasets of trending ML repos with README content for training data, benchmarks, or competitive analysis
- 📊 Technology analysts — tracking programming language and framework popularity trends with rich metadata
- 🧑💻 Software developers — discovering trending repositories and emerging open-source tools in their stack
- 📝 Developer advocates and newsletter writers — curating weekly trending repos with descriptions and topics
- 🏢 Engineering managers and CTOs — scouting promising open-source projects for team adoption
- 💹 Venture capitalists and investors — identifying hot open-source projects gaining traction
Why scrape GitHub Trending?
GitHub Trending is the go-to source for discovering popular and rising open-source projects, updated continuously to reflect what the developer community is actively starring.
Key reasons to scrape it:
- AI/ML landscape monitoring — instantly identify trending LLMs, fine-tuning tools, agentic frameworks, and AI infrastructure
- Training data collection — pair README content (docs, code examples, problem statements) with structured metadata for language model training
- Developer tools discovery — find new libraries and frameworks gaining traction before mainstream adoption
- Competitive intelligence — track trending projects in your tech stack over time
- Investment research — spot emerging technologies with accelerating star growth
- Newsletter content curation — automate weekly trending repo digests with rich metadata
Data extracted
| Field | Type | Description |
|---|---|---|
rank | number | Position on the trending page |
owner | string | Repository owner/organization |
name | string | Repository name |
fullName | string | Full name (owner/name) |
url | string | Repository URL |
description | string | Repository description |
language | string | Primary programming language |
stars | number | Total star count |
forks | number | Total fork count |
starsToday | number | Stars gained in the selected period |
starsGrowthPercent | number | Percentage growth (starsToday / prior stars × 100) |
topics | array | GitHub topic tags (e.g., ["llm", "transformers", "pytorch"]) |
license | string | SPDX license identifier (e.g., MIT, Apache-2.0) |
lastUpdated | string | ISO timestamp of last push |
readmeContent | string | Full README markdown (when includeReadme: true) |
builtBy | array | Top contributors (username + avatar URL) |
scrapedAt | string | ISO timestamp of extraction |
How to scrape GitHub Trending
- Go to GitHub Trending Scraper on Apify Store
- Optionally select a programming language filter (e.g.,
python) - Choose a time range (today, this week, or this month)
- Enable AI/ML repos only to filter for machine learning and AI repositories
- Enable Include README to extract full documentation for training data use cases
- Click Start and wait for results
- Download data as JSON, CSV, or Excel
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
language | string | "" | Programming language filter (e.g., "python", "javascript") |
since | string | "daily" | Time range: daily, weekly, or monthly |
spokenLanguageCode | string | "" | Spoken language code (e.g., "en", "zh") |
aiOnly | boolean | false | Filter to AI/ML repos only |
filterTopics | array | [] | Only return repos with at least one of these topic tags |
includeReadme | boolean | false | Fetch full README markdown for each repo |
maxRepos | integer | 0 | Max repos to return (0 = all, typically 25) |
Input example — AI/ML repos with READMEs
{"language": "python","since": "weekly","aiOnly": true,"includeReadme": true,"maxRepos": 10}
Input example — LLM repos this month
{"since": "monthly","filterTopics": ["llm", "large-language-model", "transformers"],"includeReadme": false}
Input example — Standard trending (all repos)
{"language": "python","since": "daily"}
Output example
{"rank": 1,"owner": "Blaizzy","name": "mlx-vlm","fullName": "Blaizzy/mlx-vlm","url": "https://github.com/Blaizzy/mlx-vlm","description": "MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.","language": "Python","stars": 3747,"forks": 410,"starsToday": 343,"starsGrowthPercent": 10.1,"topics": ["apple-silicon", "llm", "local-ai", "mlx", "vision-language-model", "vision-transformer"],"license": "MIT","lastUpdated": "2026-04-04T15:18:28Z","readmeContent": "# MLX-VLM\n\nMLX-VLM is a package for inference and fine-tuning...","builtBy": [{ "username": "Blaizzy", "avatar": "https://avatars.githubusercontent.com/u/23445657" }],"scrapedAt": "2026-04-05T10:14:50.255Z"}
AI/ML filtering explained
When aiOnly: true is set, the scraper classifies each repo using a two-stage check:
- GitHub topic tags — matches against 70+ AI/ML topics including:
machine-learning,deep-learning,llm,large-language-model,transformers,nlp,computer-vision,reinforcement-learning,generative-ai,pytorch,tensorflow,huggingface,langchain,rag,ai-agents,fine-tuning,embeddings,quantization,lora, and more. - Description keywords — if topics don't match, scans the description for phrases like "machine learning", "large language model", "neural network", "computer vision", "fine-tuning", "vector store", etc.
For precise control, use filterTopics instead — it only matches repos whose topics include at least one of the specified tags.
How much does it cost to scrape GitHub Trending?
GitHub Trending Scraper uses pay-per-event pricing with volume discounts:
| Event | Free tier | Standard | Power users |
|---|---|---|---|
| Run started | $0.001 | $0.001 | $0.001 |
| Repo extracted | $0.00115/repo | $0.001/repo | from $0.00028/repo |
Prices scale with your Apify subscription tier. The run-start fee is a one-time charge per run.
Cost examples
| Scenario | Repos | Approx. cost |
|---|---|---|
| Daily trending (all languages) | ~25 | ~$0.030 |
| Weekly Python AI/ML only | ~5–10 | ~$0.007–$0.013 |
| Monthly LLM repos | ~5–15 | ~$0.007–$0.018 |
Platform compute costs are negligible — typically under $0.001 per run. The free Apify plan includes enough compute to run daily trending scrapes at no cost.
Tips
- 🔄 Run on a schedule to build a historical dataset of trending repos over time
- 🤖 Use
aiOnly: true+since: "weekly"for a reliable weekly AI/ML digest - 📈 Sort by
starsGrowthPercentto identify repos with explosive recent momentum - 🏷️ Combine
filterTopicsfor targeted research (e.g., onlyrag+langchain) - 📚 Enable
includeReadmeto collect paired (metadata + documentation) training data - 🌐 Use
spokenLanguageCode: "en"to exclude non-English repos from results - 🔑 Set
GITHUB_TOKENenvironment variable to increase GitHub API rate limits (60 → 5000 req/hr)
Integrations
GitHub Trending Scraper works with all Apify integrations:
- Scheduled runs — Track trending repos daily, weekly, or monthly. Schedule AI/ML monitoring for your research workflow.
- Webhooks — Get notified when a scrape finishes and pipe results to a data pipeline
- Google Sheets — Export trending repos directly to a spreadsheet for team visibility
- Slack — Send a daily AI/ML trending digest to your team channel
- Make / Zapier — Trigger downstream workflows when new trending AI repos appear
- API — Trigger runs programmatically and stream results into vector databases or training pipelines
Connect to Zapier, Make, or Google Sheets for automated workflows.
Using GitHub Trending Scraper with the Apify API
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/github-trending-scraper').call({language: 'python',since: 'weekly',aiOnly: true,includeReadme: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Found ${items.length} AI/ML trending repos`);items.forEach(repo => {console.log(`#${repo.rank} ${repo.fullName} ⭐ ${repo.stars} (+${repo.starsToday} / ${repo.starsGrowthPercent}%)`);console.log(` Topics: ${repo.topics.join(', ')}`);console.log(` README: ${repo.readmeContent.length} chars`);});
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('automation-lab/github-trending-scraper').call(run_input={'language': 'python','since': 'weekly','aiOnly': True,'includeReadme': True,})dataset = client.dataset(run['defaultDatasetId']).list_items().itemsprint(f'Found {len(dataset)} AI/ML trending repos')for repo in dataset:print(f"#{repo['rank']} {repo['fullName']} ⭐ {repo['stars']} (+{repo['starsToday']})")print(f" Topics: {', '.join(repo['topics'])}")print(f" License: {repo['license']}")print(f" README: {len(repo['readmeContent'])} chars")
cURL
curl -X POST "https://api.apify.com/v2/acts/automation-lab~github-trending-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"language": "python","since": "weekly","aiOnly": true,"includeReadme": false}'
Use with Claude AI (MCP)
This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.
Setup for Claude Code
$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/github-trending-scraper"
Setup for Claude Desktop, Cursor, or VS Code
Add this to your MCP config file:
{"mcpServers": {"apify": {"url": "https://mcp.apify.com?tools=automation-lab/github-trending-scraper"}}}
Example prompts
- "What are the most starred AI/ML repositories trending on GitHub this week?"
- "Show me today's trending Python repositories tagged with LLM or transformers"
- "Find trending machine learning repos from the past month and extract their README docs"
- "Which GitHub repositories have the highest star growth percentage today?"
- "Get all trending repos with topics matching 'rag' or 'langchain' this week"
Learn more in the Apify MCP documentation.
Legality
Scraping publicly available data is generally legal according to the US Court of Appeals ruling (HiQ Labs v. LinkedIn). This actor only accesses publicly available information from GitHub's trending page and public repository data via the GitHub API. It does not require authentication (though a GitHub token improves rate limits). Always review and comply with GitHub's Terms of Service before scraping. For personal data, ensure compliance with GDPR, CCPA, and other applicable privacy regulations.
FAQ
How many repos does it return?
GitHub Trending typically shows 25 repositories per page. The exact number varies by time of day, language filter, and availability. With aiOnly: true or filterTopics, you may get fewer results since filtering happens after enrichment.
How does the AI/ML filter work?
The aiOnly filter checks GitHub topic tags first (70+ AI/ML topics like llm, deep-learning, pytorch) and then scans the repository description for AI/ML keywords. For precise control, use filterTopics with specific topic names.
Does README extraction use the GitHub API rate limit?
Yes — with includeReadme: true, one additional GitHub API request is made per repository. Without a GITHUB_TOKEN, GitHub's unauthenticated rate limit is 60 requests/hour. Set the GITHUB_TOKEN environment variable in the actor's input to get 5000 requests/hour (enough for many daily runs).
What is starsGrowthPercent?
It represents the percentage of stars gained in the selected period relative to prior total stars. For example, a repo with 1000 total stars and 100 gained today has a 10% growth rate. null is returned when there's insufficient data.
Can I get trending developers instead of repos? Currently this scraper focuses on repositories. GitHub also has a trending developers page that could be supported in a future version.
How often does GitHub update trending? GitHub Trending is updated continuously throughout the day. Running the scraper at different times may yield different results.
The scraper returns fewer than 25 repos.
GitHub Trending page size varies. Some language/period combinations have fewer trending repos. Additionally, aiOnly and filterTopics filters reduce results to only matching repos. This is expected behavior.
starsToday seems too high — is it accurate?
The starsToday field reflects stars gained in the selected period (daily, weekly, or monthly), not just today. For weekly/monthly trending, these numbers represent the full period's star gains.
Rate limit errors or missing topics/license data.
If you run the actor frequently or with many repos, GitHub's unauthenticated API limit (60 req/hr) may be hit. Set a GITHUB_TOKEN environment variable (free to create at github.com/settings/tokens) to increase the limit to 5000 requests/hour.
Other developer tools
- GitHub Scraper — Repositories, profiles, trending, and search results from GitHub
- Hacker News Scraper — Stories from Hacker News front page, newest, Ask HN, and more
- Homebrew Scraper — Homebrew formulas and casks with install counts
- Stack Overflow Scraper — Questions, answers, and tags from Stack Overflow
- npm Scraper — Package metadata from the npm registry
- PyPI Scraper — Python package data from PyPI
- Crates Scraper — Rust crate metadata from crates.io
- Hash Generator — Generate MD5, SHA-1, SHA-256, and SHA-512 hashes