Github Trending Scraper
Pricing
Pay per usage
Pricing
Pay per usage
Rating
0.0
(0)
Developer

mohamed el hadi msaid
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
GitHub Trending Repositories Scraper
A production-ready Apify Actor that scrapes trending repositories from GitHub with comprehensive filtering options.
Features
- ✅ Server-side rendered scraping - Uses CheerioCrawler for fast, efficient scraping (no browser overhead)
- 🔍 Filter by programming language - JavaScript, Python, Go, Rust, TypeScript, etc.
- 📅 Filter by date range - Daily, Weekly, or Monthly trending repos
- 🌍 Filter by spoken language - English, Chinese, Spanish, etc.
- ⚡ Configurable limits - Set maximum number of repositories to scrape
- 🔒 Proxy support - Built-in Apify Proxy support or custom proxies
- 📊 Rich dataset output - Complete repository data with stars, forks, contributors, and more
Input Parameters
The Actor accepts the following input parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
language | String | "" (all) | Filter by programming language (e.g., javascript, python, go) |
dateRange | String | "daily" | Time period: "daily", "weekly", or "monthly" |
spokenLanguage | String | "" (all) | Filter by natural language (e.g., en, zh, es) |
maxItems | Integer | 25 | Maximum number of repositories to scrape (0 = unlimited) |
proxyConfiguration | Object | { useApifyProxy: false } | Proxy settings |
Example Input
{"language": "python","dateRange": "weekly","spokenLanguage": "","maxItems": 50,"proxyConfiguration": {"useApifyProxy": false}}
Output
The Actor outputs a dataset with the following fields for each repository:
{owner: string; // Repository owner usernamerepositoryName: string; // Repository namefullName: string; // Full name (owner/repo)url: string; // GitHub repository URLdescription: string; // Repository descriptionlanguage: string; // Primary programming languagestars: number; // Total stars countforks: number; // Total forks countstarsToday: number; // Stars gained in this periodbuiltBy: Array<{ // Top contributorsusername: string;profileUrl: string;}>;scrapedAt: string; // ISO timestamp}
Example Output
{"owner": "microsoft","repositoryName": "vscode","fullName": "microsoft/vscode","url": "https://github.com/microsoft/vscode","description": "Visual Studio Code","language": "TypeScript","stars": 162000,"forks": 28000,"starsToday": 150,"builtBy": [{"username": "bpasero","profileUrl": "https://github.com/bpasero"}],"scrapedAt": "2024-11-26T10:30:00.000Z"}
How It Works
- URL Construction: Builds the GitHub trending URL based on your filters
- Server-Side Scraping: Uses CheerioCrawler (fast HTTP requests, no browser)
- Data Extraction: Parses HTML to extract repository data
- Dataset Storage: Pushes structured data to Apify Dataset
Local Development
Prerequisites
- Node.js 18+
- npm or yarn
Installation
$npm install
Running Locally
IMPORTANT: Always use apify run to run the Actor locally (NOT npm start):
# Run with default input from storage/key_value_stores/default/INPUT.jsonapify run# Run with custom inputapify run -i '{"language":"javascript","dateRange":"weekly","maxItems":10}'# Run with input from fileapify run --input-file my-input.json
Testing Different Scenarios
# Get top 10 trending Python repos todayapify run -i '{"language":"python","dateRange":"daily","maxItems":10}'# Get weekly trending JavaScript reposapify run -i '{"language":"javascript","dateRange":"weekly","maxItems":25}'# Get monthly trending repos (all languages)apify run -i '{"dateRange":"monthly","maxItems":50}'# Get trending repos in Chineseapify run -i '{"spokenLanguage":"zh","maxItems":20}'
Deployment to Apify Platform
Option 1: Link Git Repository
- Go to Actor creation page
- Click on Link Git Repository
- Connect your GitHub repository
Option 2: Push from Local Machine
# Login to Apify (requires API token)apify login# Deploy Actor to Apify Platformapify push
Performance
- Speed: ~2-5 seconds per run (server-side rendering)
- Crawler Type: CheerioCrawler (HTTP-based, no browser overhead)
- Memory: ~256MB typical usage
- Concurrency: Single request (trending page is one page)
Use Cases
- 📈 Trend Analysis: Track trending technologies and languages
- 🔍 Repository Discovery: Find popular new projects
- 📊 Data Collection: Build datasets for research
- 🤖 Automation: Schedule daily/weekly trending reports
- 📧 Notifications: Get alerts for trending repos in your language
Limitations
- GitHub may rate-limit requests without proxy
- Trending page shows ~25 repositories per page
- No pagination (trending page is a single page)
Troubleshooting
No repositories scraped
- Check if GitHub changed their HTML structure
- Enable Apify Proxy if you're being rate-limited
- Verify your language/date range filters are valid
Rate limiting
{"proxyConfiguration": {"useApifyProxy": true,"groups": ["RESIDENTIAL"]}}
Resources
License
ISC
Development Tools
This Actor was built using the Apify AutoPlans VS Code Extension - an AI-powered development assistant for building Apify Actors with intelligent code generation, testing, and deployment capabilities.
Build Your Own Actor
Want to create your own Apify Actor with AI assistance? Install the extension:
- Open VS Code
- Search for "Apify AutoPlans" in the Extensions marketplace
- Install and start building production-ready scrapers with AI
Author
Built with ❤️ using Apify SDK, Crawlee, and Apify AutoPlans VS Code Extension