GitHub Repository Scraper
Pricing
from $2.00 / 1,000 results
Go to Apify Store

GitHub Repository Scraper
Scrape GitHub repositories, users, and trending projects via REST API. Extract repo names, stars, forks, languages, descriptions, and contributor data.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer

cloud9
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
GitHub Scraper
Scrape GitHub repositories, users, and trending projects via the GitHub REST API.
Features
-
3 Scraping Modes:
searchRepos: Search for repositories by keywordssearchUsers: Search for GitHub userstrending: Get trending repositories (created in last 7 days, sorted by stars)
-
Filters:
- Programming language filter
- Sort options: stars, forks, updated, best-match
- Configurable max results (1-100)
-
No Authentication Required: Uses GitHub's public REST API (60 requests/hour unauthenticated)
Input Parameters
| Field | Type | Required | Description |
|---|---|---|---|
mode | String | Yes | Scraping mode: searchRepos, searchUsers, or trending |
searchQuery | String | No | Search keywords (not used for trending mode) |
language | String | No | Filter by programming language (e.g., Python, JavaScript) |
sort | String | No | Sort by: stars, forks, updated, or best-match |
maxResults | Integer | No | Maximum results to scrape (1-100, default: 30) |
proxyConfiguration | Object | No | Proxy settings for API requests |
Example Input
Search Repositories
{"mode": "searchRepos","searchQuery": "machine learning","language": "Python","sort": "stars","maxResults": 30}
Search Users
{"mode": "searchUsers","searchQuery": "javascript developer","maxResults": 20}
Trending Repositories
{"mode": "trending","language": "TypeScript","maxResults": 50}
Output Data
Repository Output
{"name": "tensorflow","fullName": "tensorflow/tensorflow","description": "An Open Source Machine Learning Framework for Everyone","url": "https://github.com/tensorflow/tensorflow","stars": 185000,"forks": 74000,"language": "Python","topics": ["machine-learning", "deep-learning", "tensorflow"],"owner": "tensorflow","ownerUrl": "https://github.com/tensorflow","createdAt": "2015-11-07T01:19:20Z","updatedAt": "2024-02-14T12:34:56Z","openIssues": 2500,"watchers": 185000,"defaultBranch": "master","license": "Apache License 2.0","homepage": "https://www.tensorflow.org"}
User Output
{"login": "torvalds","url": "https://github.com/torvalds","avatarUrl": "https://avatars.githubusercontent.com/u/1024025","type": "User","publicRepos": 6,"followers": 180000,"following": 0,"createdAt": "2011-09-03T15:26:22Z","bio": "Creator of Linux and Git","company": "Linux Foundation","location": "Portland, OR","blog": "https://torvalds-family.blogspot.com"}
Rate Limits
- Unauthenticated API: 60 requests per hour
- Rate Limit Handling: Automatic wait and retry if rate limit is hit
- Request Delay: 1.5 seconds between requests to avoid hitting limits
Technical Details
- Uses GitHub REST API v3
- Built with Apify SDK 3.0 and Crawlee 3.0
- TypeScript for type safety
gotScrapingfor HTTP requests with proxy support- Multi-stage Docker build for optimized image size
Local Development
# Install dependenciesnpm install# Build TypeScriptnpm run build# Run locallynpm start
Deployment
Deploy to Apify platform:
$apify push
Use Cases
- Repository discovery and analysis
- Trending technology tracking
- Developer community research
- Open source project monitoring
- Programming language popularity tracking
License
MIT