GitHub Trending Scraper avatar

GitHub Trending Scraper

Pricing

from $0.03 / 1,000 trending repository saveds

Go to Apify Store
GitHub Trending Scraper

GitHub Trending Scraper

Scrape GitHub Trending repositories by language and time window for developer research, newsletters, and market intelligence.

Pricing

from $0.03 / 1,000 trending repository saveds

Rating

0.0

(0)

Developer

Hanna Nosova

Hanna Nosova

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

12 hours ago

Last modified

Share

Track public GitHub Trending repositories by language and time window. The actor saves ranked repositories, descriptions, languages, stars, forks, stars gained, contributor avatars, and optional README text so you can monitor fast-moving developer ecosystems without copying pages by hand.

At a glance

  • Best for: open-source trend monitoring, developer relations, newsletter research, market intelligence, and technical ecosystem tracking.
  • Inputs: GitHub languages, trending time window, maximum repositories, optional README enrichment, and proxy settings.
  • Outputs: one row per trending repository with rank, owner, repo, URL, description, language, stars, forks, stars gained, contributors, source URL, and timestamp.
  • Exports: download CSV, JSON, Excel, XML, RSS, or use the Apify Dataset API.
  • Cost: $0.005 per run plus the item event for each saved repository row.

Ready-to-run examples

Use these saved Store examples as starting points. Open any example to prefill the Actor input, then adjust URLs, keywords, limits, or filters for your own run.

What can it do?

  • Scrape overall GitHub Trending: collect the public overall Trending page for daily, weekly, or monthly windows.
  • Scrape language-specific trends: track JavaScript, Python, TypeScript, Go, Rust, and other GitHub language pages.
  • Export repository metrics: save repository names, owners, URLs, descriptions, languages, stars, forks, and stars gained.
  • Collect contributor hints: save public contributor avatars and profile links shown on Trending cards.
  • Enrich with README text: optionally fetch public README text for smaller research or classification runs.
  • Build a repeatable trend monitor: schedule the same input and compare saved datasets across runs.

Who is it for?

This actor is useful for several teams:

  • Developer-relations teams: track projects that are suddenly gaining attention.
  • Content teams: build newsletters and research queues around open-source tools.
  • VC and market-intelligence teams: discover emerging developer infrastructure and ecosystems.
  • Recruiting teams: watch active projects and technical communities.
  • Product teams: monitor competing frameworks, SDKs, and AI tools.
  • Data teams: build repeatable GitHub trend dashboards.

Why use it?

GitHub Trending is easy to view once, but hard to monitor reliably over time. This actor turns the page into structured rows that can be exported, scheduled, and connected to your workflows.

Use it when you need:

  • Repeatable snapshots of trending repositories
  • Language-specific open-source discovery
  • A clean dataset instead of screenshots or pasted HTML
  • Automation that can run daily, weekly, or monthly
  • Fields that are ready for spreadsheet, BI, CRM, or alerting workflows

Output fields

FieldDescription
rankRepository rank on the requested Trending page
ownerGitHub owner or organization
repoRepository name
fullNameowner/repo value
repoUrlPublic GitHub repository URL
repositoryDescriptionRepository description shown on Trending
languagePrimary language shown on Trending
starsTotal stargazers count
forksTotal forks count
starsGainedStars gained in the selected time window
builtByPublic usernames, profile URLs, and avatar URLs shown by GitHub
sinceRequested time window
trendingLanguageRequested language or overall
trendingUrlSource Trending URL
scrapedAtTimestamp when the row was saved
readmeUrlREADME URL when enrichment is enabled
readmeTextREADME text when enrichment is enabled
readmeTruncatedWhether README text was shortened

Pricing

The actor uses pay-per-event pricing. You pay a small start fee plus a per-repository charge for saved dataset rows.

Charge eventExact priceCharged when
apify-actor-start$0.005Once when the run starts.
item$0.0000538 at BRONZE tierFor each repository row saved to the dataset.

At the BRONZE tier, the saved-record charge is about $0.0538 per 1,000 saved repository rows, plus the run start fee.

Only rows saved to the dataset are charged as item events. The final amount is shown in the Apify run billing details.

Quick start

  1. Open the actor on Apify.
  2. Choose one or more languages, or leave the language list empty for overall Trending.
  3. Select daily, weekly, or monthly.
  4. Set maxItems.
  5. Keep includeReadme disabled for the fastest first run.
  6. Start the actor.
  7. Download the dataset as JSON, CSV, Excel, or via API.

Input configuration

SettingJSON keyWhat it does
GitHub languageslanguagesArray of GitHub language names or slugs such as javascript, python, typescript, go, or rust. Leave empty for overall Trending.
Time windowsinceChoose daily, weekly, or monthly.
Maximum repositoriesmaxItemsMaximum repositories to save across all requested pages. Use a small number for quick tests.
Include README textincludeReadmeFetches public README text from common README paths. This adds extra requests, so keep it off unless you need text enrichment.
Proxy configurationproxyConfigurationOptional proxy settings. Most runs should work without proxy because GitHub Trending is public.

Example input

{
"languages": ["javascript", "python", "typescript"],
"since": "daily",
"maxItems": 30,
"includeReadme": false,
"proxyConfiguration": {
"useApifyProxy": false
}
}

Example output item

{
"rank": 1,
"owner": "sveltejs",
"repo": "svelte",
"fullName": "sveltejs/svelte",
"repoUrl": "https://github.com/sveltejs/svelte",
"repositoryDescription": "web development for the rest of us",
"language": "JavaScript",
"stars": 87592,
"forks": 4960,
"starsGained": 29,
"builtBy": [
{
"username": "Rich-Harris",
"profileUrl": "https://github.com/Rich-Harris",
"avatarUrl": "https://avatars.githubusercontent.com/u/1162160?s=40&v=4"
}
],
"since": "daily",
"trendingLanguage": "javascript",
"trendingUrl": "https://github.com/trending/javascript?since=daily",
"scrapedAt": "2026-06-30T08:00:00.000Z"
}

Tips for best results

  • Start with one or two languages and maxItems around 20.
  • Schedule daily runs if you want a trend history.
  • Use weekly or monthly windows for less noisy research.
  • Enable README enrichment only for smaller runs or when text analysis is required.
  • Store each scheduled run's dataset if you want time-series comparisons.

Common workflows

  • Daily developer-news monitoring: run the Actor every morning for javascript, python, and typescript, then send new repositories to a Slack channel or newsletter draft.
  • VC and startup discovery: run weekly Trending snapshots for AI, data, security, and infrastructure languages to review fast-growing repositories.
  • Competitive intelligence: monitor languages and frameworks related to your product category, then export starsGained, descriptions, and README text for classification.
  • Recruiting research: review builtBy users and active repositories in specific technology communities as a starting point for public GitHub profile research.

Integrations

You can connect the dataset to:

  • Google Sheets for editorial calendars
  • Slack alerts through Apify integrations
  • Airtable or Notion for research queues
  • BigQuery, Snowflake, or S3 for long-term trend storage
  • Zapier or Make for no-code workflows
  • Custom dashboards through the Apify API

API usage with Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('fetch_cat/github-trending-scraper').call({
languages: ['javascript', 'python'],
since: 'daily',
maxItems: 20,
includeReadme: false,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

API usage with Python

from apify_client import ApifyClient
import os
client = ApifyClient(os.environ['APIFY_TOKEN'])
run = client.actor('fetch_cat/github-trending-scraper').call(run_input={
'languages': ['javascript', 'python'],
'since': 'daily',
'maxItems': 20,
'includeReadme': False,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

API usage with cURL

curl -X POST "https://api.apify.com/v2/acts/fetch_cat~github-trending-scraper/runs?token=$APIFY_TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"languages": ["javascript", "python"],
"since": "daily",
"maxItems": 20,
"includeReadme": false
}'

MCP and AI agents

You can use this actor from AI tools through the official Apify MCP server.

MCP endpoint:

https://mcp.apify.com?tools=fetch_cat/github-trending-scraper

Claude Code setup:

$claude mcp add apify-github-trending --url "https://mcp.apify.com?tools=fetch_cat/github-trending-scraper"

Claude Desktop JSON configuration:

{
"mcpServers": {
"apify-github-trending": {
"url": "https://mcp.apify.com?tools=fetch_cat/github-trending-scraper"
}
}
}

The default Apify MCP server can search and run Actors. The focused URL exposes only this Actor to clients that support tool-scoped MCP connections.

Example prompts for MCP usage:

Developer tooling trend summary:

Use fetch_cat/github-trending-scraper to get today's top JavaScript and Python GitHub Trending repositories. Summarize the top 10 by stars gained and identify projects that look relevant to AI developer tooling.

Backend infrastructure review:

Run the GitHub Trending Scraper for rust and go weekly trends, then create a table with repository, description, stars gained, and why each project may matter to backend infrastructure teams.

Weekly newsletter draft:

Every Monday, run fetch_cat/github-trending-scraper for go, rust, and python weekly trends, then draft a short engineering newsletter with repository links and stars gained.

Scheduling

For monitoring, schedule the actor to run daily or weekly. Keep the same input each time, then compare datasets across runs.

Good schedules include:

  • Daily at 08:00 for newsletters
  • Weekly on Monday for market research
  • Monthly for broad ecosystem reports

Limitations

  • The actor extracts data visible on public GitHub Trending pages.
  • GitHub may change page layout, which can require parser updates.
  • Trending pages usually contain a limited number of repositories per language/window.
  • README enrichment may not find every repository README because branch names and file names vary.
  • The actor does not access private repositories or account-only data.

FAQ

Why did I get fewer items than requested?

GitHub Trending pages have a finite number of visible repositories. If the source page has fewer repositories than your maxItems, the actor saves all available repositories.

Why is README text missing?

README enrichment checks common public README locations. Some repositories use different default branches, different file names, generated documentation, or no README.

Should I enable proxy?

Usually no. GitHub Trending is public. Enable proxy only if your run environment requires it or you repeatedly see network-level errors.

Data freshness

GitHub Trending changes over time. The actor saves the state visible during the run and includes scrapedAt so you can compare snapshots.

Legality

This actor collects public information from GitHub Trending pages. Use the data responsibly, respect GitHub's terms, avoid excessive scheduling, and do not use the output for spam, harassment, or invasive profiling.

Changelog

0.1

  • Initial release with language-specific GitHub Trending scraping, daily/weekly/monthly windows, repository metadata extraction, contributor avatar extraction, optional README enrichment, pricing events, and dataset schema.

Support

If a run fails, returns no data, or a field looks wrong, open an issue from the Actor page.

Please include the Apify run ID or run URL, input JSON, one example public URL, query, or input item, what you expected, and what the dataset returned. Small reproducible inputs make parsing or site-layout issues much faster to fix.