Github Trending Scraper avatar
Github Trending Scraper

Pricing

Pay per usage

Go to Apify Store
Github Trending Scraper

Github Trending Scraper

github trending scraper

Pricing

Pay per usage

Rating

0.0

(0)

Developer

mohamed el hadi msaid

mohamed el hadi msaid

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Categories

Share

GitHub Trending Repositories Scraper

A production-ready Apify Actor that scrapes trending repositories from GitHub with comprehensive filtering options.

Features

  • Server-side rendered scraping - Uses CheerioCrawler for fast, efficient scraping (no browser overhead)
  • 🔍 Filter by programming language - JavaScript, Python, Go, Rust, TypeScript, etc.
  • 📅 Filter by date range - Daily, Weekly, or Monthly trending repos
  • 🌍 Filter by spoken language - English, Chinese, Spanish, etc.
  • Configurable limits - Set maximum number of repositories to scrape
  • 🔒 Proxy support - Built-in Apify Proxy support or custom proxies
  • 📊 Rich dataset output - Complete repository data with stars, forks, contributors, and more

Input Parameters

The Actor accepts the following input parameters:

ParameterTypeDefaultDescription
languageString"" (all)Filter by programming language (e.g., javascript, python, go)
dateRangeString"daily"Time period: "daily", "weekly", or "monthly"
spokenLanguageString"" (all)Filter by natural language (e.g., en, zh, es)
maxItemsInteger25Maximum number of repositories to scrape (0 = unlimited)
proxyConfigurationObject{ useApifyProxy: false }Proxy settings

Example Input

{
"language": "python",
"dateRange": "weekly",
"spokenLanguage": "",
"maxItems": 50,
"proxyConfiguration": {
"useApifyProxy": false
}
}

Output

The Actor outputs a dataset with the following fields for each repository:

{
owner: string; // Repository owner username
repositoryName: string; // Repository name
fullName: string; // Full name (owner/repo)
url: string; // GitHub repository URL
description: string; // Repository description
language: string; // Primary programming language
stars: number; // Total stars count
forks: number; // Total forks count
starsToday: number; // Stars gained in this period
builtBy: Array<{ // Top contributors
username: string;
profileUrl: string;
}>;
scrapedAt: string; // ISO timestamp
}

Example Output

{
"owner": "microsoft",
"repositoryName": "vscode",
"fullName": "microsoft/vscode",
"url": "https://github.com/microsoft/vscode",
"description": "Visual Studio Code",
"language": "TypeScript",
"stars": 162000,
"forks": 28000,
"starsToday": 150,
"builtBy": [
{
"username": "bpasero",
"profileUrl": "https://github.com/bpasero"
}
],
"scrapedAt": "2024-11-26T10:30:00.000Z"
}

How It Works

  1. URL Construction: Builds the GitHub trending URL based on your filters
  2. Server-Side Scraping: Uses CheerioCrawler (fast HTTP requests, no browser)
  3. Data Extraction: Parses HTML to extract repository data
  4. Dataset Storage: Pushes structured data to Apify Dataset

Local Development

Prerequisites

  • Node.js 18+
  • npm or yarn

Installation

$npm install

Running Locally

IMPORTANT: Always use apify run to run the Actor locally (NOT npm start):

# Run with default input from storage/key_value_stores/default/INPUT.json
apify run
# Run with custom input
apify run -i '{"language":"javascript","dateRange":"weekly","maxItems":10}'
# Run with input from file
apify run --input-file my-input.json

Testing Different Scenarios

# Get top 10 trending Python repos today
apify run -i '{"language":"python","dateRange":"daily","maxItems":10}'
# Get weekly trending JavaScript repos
apify run -i '{"language":"javascript","dateRange":"weekly","maxItems":25}'
# Get monthly trending repos (all languages)
apify run -i '{"dateRange":"monthly","maxItems":50}'
# Get trending repos in Chinese
apify run -i '{"spokenLanguage":"zh","maxItems":20}'

Deployment to Apify Platform

  1. Go to Actor creation page
  2. Click on Link Git Repository
  3. Connect your GitHub repository

Option 2: Push from Local Machine

# Login to Apify (requires API token)
apify login
# Deploy Actor to Apify Platform
apify push

Performance

  • Speed: ~2-5 seconds per run (server-side rendering)
  • Crawler Type: CheerioCrawler (HTTP-based, no browser overhead)
  • Memory: ~256MB typical usage
  • Concurrency: Single request (trending page is one page)

Use Cases

  • 📈 Trend Analysis: Track trending technologies and languages
  • 🔍 Repository Discovery: Find popular new projects
  • 📊 Data Collection: Build datasets for research
  • 🤖 Automation: Schedule daily/weekly trending reports
  • 📧 Notifications: Get alerts for trending repos in your language

Limitations

  • GitHub may rate-limit requests without proxy
  • Trending page shows ~25 repositories per page
  • No pagination (trending page is a single page)

Troubleshooting

No repositories scraped

  • Check if GitHub changed their HTML structure
  • Enable Apify Proxy if you're being rate-limited
  • Verify your language/date range filters are valid

Rate limiting

{
"proxyConfiguration": {
"useApifyProxy": true,
"groups": ["RESIDENTIAL"]
}
}

Resources

License

ISC

Development Tools

This Actor was built using the Apify AutoPlans VS Code Extension - an AI-powered development assistant for building Apify Actors with intelligent code generation, testing, and deployment capabilities.

Build Your Own Actor

Want to create your own Apify Actor with AI assistance? Install the extension:

  1. Open VS Code
  2. Search for "Apify AutoPlans" in the Extensions marketplace
  3. Install and start building production-ready scrapers with AI

Author

Built with ❤️ using Apify SDK, Crawlee, and Apify AutoPlans VS Code Extension