GitHub Trending Scraper avatar

GitHub Trending Scraper

Pricing

Pay per event

Go to Apify Store
GitHub Trending Scraper

GitHub Trending Scraper

Scrape GitHub Trending repositories by language and time range: today, this week, or this month. Extracts repo names, star counts, forks, star gains, and top contributors. Great for dev trend tracking, tech newsletters, and investment research. Export to JSON, CSV, or Excel.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

16

Total users

10

Monthly active users

9 days ago

Last modified

Share

Scrape trending repositories from GitHub Trending. Get stars, forks, language, description, topics, license, last-updated date, star growth metrics, and full README content for the hottest open-source projects — with built-in AI/ML filtering.

GitHub Trending Scraper extracts data from GitHub's trending repositories page and enriches each result with GitHub API data. Beyond the basic trending page data (stars, forks, language, description), it fetches topics, license, last-updated date, and optionally the full README markdown for each repository.

Key capabilities:

  • 🔥 Scrape trending repos by language, time range (daily/weekly/monthly), and spoken language
  • 🤖 Filter for AI/ML repos only — built-in classifier covering LLMs, deep learning, NLP, computer vision, and more
  • 🏷️ Filter by GitHub topics — e.g., only repos tagged llm, transformers, pytorch
  • 📄 Extract full README content — ideal for training datasets and documentation corpora
  • 📈 Star growth metrics — calculate percentage growth (stars gained vs. prior baseline)
  • 🔐 License detection — MIT, Apache-2.0, GPL, and others

Who is it for?

  • 🤖 AI/ML researchers — building curated datasets of trending ML repos with README content for training data, benchmarks, or competitive analysis
  • 📊 Technology analysts — tracking programming language and framework popularity trends with rich metadata
  • 🧑‍💻 Software developers — discovering trending repositories and emerging open-source tools in their stack
  • 📝 Developer advocates and newsletter writers — curating weekly trending repos with descriptions and topics
  • 🏢 Engineering managers and CTOs — scouting promising open-source projects for team adoption
  • 💹 Venture capitalists and investors — identifying hot open-source projects gaining traction

GitHub Trending is the go-to source for discovering popular and rising open-source projects, updated continuously to reflect what the developer community is actively starring.

Key reasons to scrape it:

  • AI/ML landscape monitoring — instantly identify trending LLMs, fine-tuning tools, agentic frameworks, and AI infrastructure
  • Training data collection — pair README content (docs, code examples, problem statements) with structured metadata for language model training
  • Developer tools discovery — find new libraries and frameworks gaining traction before mainstream adoption
  • Competitive intelligence — track trending projects in your tech stack over time
  • Investment research — spot emerging technologies with accelerating star growth
  • Newsletter content curation — automate weekly trending repo digests with rich metadata

Data extracted

FieldTypeDescription
ranknumberPosition on the trending page
ownerstringRepository owner/organization
namestringRepository name
fullNamestringFull name (owner/name)
urlstringRepository URL
descriptionstringRepository description
languagestringPrimary programming language
starsnumberTotal star count
forksnumberTotal fork count
starsTodaynumberStars gained in the selected period
starsGrowthPercentnumberPercentage growth (starsToday / prior stars × 100)
topicsarrayGitHub topic tags (e.g., ["llm", "transformers", "pytorch"])
licensestringSPDX license identifier (e.g., MIT, Apache-2.0)
lastUpdatedstringISO timestamp of last push
readmeContentstringFull README markdown (when includeReadme: true)
builtByarrayTop contributors (username + avatar URL)
scrapedAtstringISO timestamp of extraction
  1. Go to GitHub Trending Scraper on Apify Store
  2. Optionally select a programming language filter (e.g., python)
  3. Choose a time range (today, this week, or this month)
  4. Enable AI/ML repos only to filter for machine learning and AI repositories
  5. Enable Include README to extract full documentation for training data use cases
  6. Click Start and wait for results
  7. Download data as JSON, CSV, or Excel

Input parameters

ParameterTypeDefaultDescription
languagestring""Programming language filter (e.g., "python", "javascript")
sincestring"daily"Time range: daily, weekly, or monthly
spokenLanguageCodestring""Spoken language code (e.g., "en", "zh")
aiOnlybooleanfalseFilter to AI/ML repos only
filterTopicsarray[]Only return repos with at least one of these topic tags
includeReadmebooleanfalseFetch full README markdown for each repo
maxReposinteger0Max repos to return (0 = all, typically 25)

Input example — AI/ML repos with READMEs

{
"language": "python",
"since": "weekly",
"aiOnly": true,
"includeReadme": true,
"maxRepos": 10
}

Input example — LLM repos this month

{
"since": "monthly",
"filterTopics": ["llm", "large-language-model", "transformers"],
"includeReadme": false
}
{
"language": "python",
"since": "daily"
}

Output example

{
"rank": 1,
"owner": "Blaizzy",
"name": "mlx-vlm",
"fullName": "Blaizzy/mlx-vlm",
"url": "https://github.com/Blaizzy/mlx-vlm",
"description": "MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.",
"language": "Python",
"stars": 3747,
"forks": 410,
"starsToday": 343,
"starsGrowthPercent": 10.1,
"topics": ["apple-silicon", "llm", "local-ai", "mlx", "vision-language-model", "vision-transformer"],
"license": "MIT",
"lastUpdated": "2026-04-04T15:18:28Z",
"readmeContent": "# MLX-VLM\n\nMLX-VLM is a package for inference and fine-tuning...",
"builtBy": [
{ "username": "Blaizzy", "avatar": "https://avatars.githubusercontent.com/u/23445657" }
],
"scrapedAt": "2026-04-05T10:14:50.255Z"
}

AI/ML filtering explained

When aiOnly: true is set, the scraper classifies each repo using a two-stage check:

  1. GitHub topic tags — matches against 70+ AI/ML topics including: machine-learning, deep-learning, llm, large-language-model, transformers, nlp, computer-vision, reinforcement-learning, generative-ai, pytorch, tensorflow, huggingface, langchain, rag, ai-agents, fine-tuning, embeddings, quantization, lora, and more.
  2. Description keywords — if topics don't match, scans the description for phrases like "machine learning", "large language model", "neural network", "computer vision", "fine-tuning", "vector store", etc.

For precise control, use filterTopics instead — it only matches repos whose topics include at least one of the specified tags.

GitHub Trending Scraper uses pay-per-event pricing with volume discounts:

EventFree tierStandardPower users
Run started$0.001$0.001$0.001
Repo extracted$0.00115/repo$0.001/repofrom $0.00028/repo

Prices scale with your Apify subscription tier. The run-start fee is a one-time charge per run.

Cost examples

ScenarioReposApprox. cost
Daily trending (all languages)~25~$0.030
Weekly Python AI/ML only~5–10~$0.007–$0.013
Monthly LLM repos~5–15~$0.007–$0.018

Platform compute costs are negligible — typically under $0.001 per run. The free Apify plan includes enough compute to run daily trending scrapes at no cost.

Tips

  • 🔄 Run on a schedule to build a historical dataset of trending repos over time
  • 🤖 Use aiOnly: true + since: "weekly" for a reliable weekly AI/ML digest
  • 📈 Sort by starsGrowthPercent to identify repos with explosive recent momentum
  • 🏷️ Combine filterTopics for targeted research (e.g., only rag + langchain)
  • 📚 Enable includeReadme to collect paired (metadata + documentation) training data
  • 🌐 Use spokenLanguageCode: "en" to exclude non-English repos from results
  • 🔑 Set GITHUB_TOKEN environment variable to increase GitHub API rate limits (60 → 5000 req/hr)

Integrations

GitHub Trending Scraper works with all Apify integrations:

  • Scheduled runs — Track trending repos daily, weekly, or monthly. Schedule AI/ML monitoring for your research workflow.
  • Webhooks — Get notified when a scrape finishes and pipe results to a data pipeline
  • Google Sheets — Export trending repos directly to a spreadsheet for team visibility
  • Slack — Send a daily AI/ML trending digest to your team channel
  • Make / Zapier — Trigger downstream workflows when new trending AI repos appear
  • API — Trigger runs programmatically and stream results into vector databases or training pipelines

Connect to Zapier, Make, or Google Sheets for automated workflows.

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('automation-lab/github-trending-scraper').call({
language: 'python',
since: 'weekly',
aiOnly: true,
includeReadme: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Found ${items.length} AI/ML trending repos`);
items.forEach(repo => {
console.log(`#${repo.rank} ${repo.fullName}${repo.stars} (+${repo.starsToday} / ${repo.starsGrowthPercent}%)`);
console.log(` Topics: ${repo.topics.join(', ')}`);
console.log(` README: ${repo.readmeContent.length} chars`);
});

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('automation-lab/github-trending-scraper').call(run_input={
'language': 'python',
'since': 'weekly',
'aiOnly': True,
'includeReadme': True,
})
dataset = client.dataset(run['defaultDatasetId']).list_items().items
print(f'Found {len(dataset)} AI/ML trending repos')
for repo in dataset:
print(f"#{repo['rank']} {repo['fullName']}{repo['stars']} (+{repo['starsToday']})")
print(f" Topics: {', '.join(repo['topics'])}")
print(f" License: {repo['license']}")
print(f" README: {len(repo['readmeContent'])} chars")

cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~github-trending-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"language": "python",
"since": "weekly",
"aiOnly": true,
"includeReadme": false
}'

Use with Claude AI (MCP)

This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/github-trending-scraper"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
"mcpServers": {
"apify": {
"url": "https://mcp.apify.com?tools=automation-lab/github-trending-scraper"
}
}
}

Example prompts

  • "What are the most starred AI/ML repositories trending on GitHub this week?"
  • "Show me today's trending Python repositories tagged with LLM or transformers"
  • "Find trending machine learning repos from the past month and extract their README docs"
  • "Which GitHub repositories have the highest star growth percentage today?"
  • "Get all trending repos with topics matching 'rag' or 'langchain' this week"

Learn more in the Apify MCP documentation.

Legality

Scraping publicly available data is generally legal according to the US Court of Appeals ruling (HiQ Labs v. LinkedIn). This actor only accesses publicly available information from GitHub's trending page and public repository data via the GitHub API. It does not require authentication (though a GitHub token improves rate limits). Always review and comply with GitHub's Terms of Service before scraping. For personal data, ensure compliance with GDPR, CCPA, and other applicable privacy regulations.

FAQ

How many repos does it return? GitHub Trending typically shows 25 repositories per page. The exact number varies by time of day, language filter, and availability. With aiOnly: true or filterTopics, you may get fewer results since filtering happens after enrichment.

How does the AI/ML filter work? The aiOnly filter checks GitHub topic tags first (70+ AI/ML topics like llm, deep-learning, pytorch) and then scans the repository description for AI/ML keywords. For precise control, use filterTopics with specific topic names.

Does README extraction use the GitHub API rate limit? Yes — with includeReadme: true, one additional GitHub API request is made per repository. Without a GITHUB_TOKEN, GitHub's unauthenticated rate limit is 60 requests/hour. Set the GITHUB_TOKEN environment variable in the actor's input to get 5000 requests/hour (enough for many daily runs).

What is starsGrowthPercent? It represents the percentage of stars gained in the selected period relative to prior total stars. For example, a repo with 1000 total stars and 100 gained today has a 10% growth rate. null is returned when there's insufficient data.

Can I get trending developers instead of repos? Currently this scraper focuses on repositories. GitHub also has a trending developers page that could be supported in a future version.

How often does GitHub update trending? GitHub Trending is updated continuously throughout the day. Running the scraper at different times may yield different results.

The scraper returns fewer than 25 repos. GitHub Trending page size varies. Some language/period combinations have fewer trending repos. Additionally, aiOnly and filterTopics filters reduce results to only matching repos. This is expected behavior.

starsToday seems too high — is it accurate? The starsToday field reflects stars gained in the selected period (daily, weekly, or monthly), not just today. For weekly/monthly trending, these numbers represent the full period's star gains.

Rate limit errors or missing topics/license data. If you run the actor frequently or with many repos, GitHub's unauthenticated API limit (60 req/hr) may be hit. Set a GITHUB_TOKEN environment variable (free to create at github.com/settings/tokens) to increase the limit to 5000 requests/hour.

Other developer tools