GitHub Repository Scraper avatar

GitHub Repository Scraper

Pricing

from $2.00 / 1,000 results

Go to Apify Store
GitHub Repository Scraper

GitHub Repository Scraper

Scrape GitHub repositories, users, and trending projects via REST API. Extract repo names, stars, forks, languages, descriptions, and contributor data.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

cloud9

cloud9

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

GitHub Scraper

Scrape GitHub repositories, users, and trending projects via the GitHub REST API.

Features

  • 3 Scraping Modes:

    • searchRepos: Search for repositories by keywords
    • searchUsers: Search for GitHub users
    • trending: Get trending repositories (created in last 7 days, sorted by stars)
  • Filters:

    • Programming language filter
    • Sort options: stars, forks, updated, best-match
    • Configurable max results (1-100)
  • No Authentication Required: Uses GitHub's public REST API (60 requests/hour unauthenticated)

Input Parameters

FieldTypeRequiredDescription
modeStringYesScraping mode: searchRepos, searchUsers, or trending
searchQueryStringNoSearch keywords (not used for trending mode)
languageStringNoFilter by programming language (e.g., Python, JavaScript)
sortStringNoSort by: stars, forks, updated, or best-match
maxResultsIntegerNoMaximum results to scrape (1-100, default: 30)
proxyConfigurationObjectNoProxy settings for API requests

Example Input

Search Repositories

{
"mode": "searchRepos",
"searchQuery": "machine learning",
"language": "Python",
"sort": "stars",
"maxResults": 30
}

Search Users

{
"mode": "searchUsers",
"searchQuery": "javascript developer",
"maxResults": 20
}
{
"mode": "trending",
"language": "TypeScript",
"maxResults": 50
}

Output Data

Repository Output

{
"name": "tensorflow",
"fullName": "tensorflow/tensorflow",
"description": "An Open Source Machine Learning Framework for Everyone",
"url": "https://github.com/tensorflow/tensorflow",
"stars": 185000,
"forks": 74000,
"language": "Python",
"topics": ["machine-learning", "deep-learning", "tensorflow"],
"owner": "tensorflow",
"ownerUrl": "https://github.com/tensorflow",
"createdAt": "2015-11-07T01:19:20Z",
"updatedAt": "2024-02-14T12:34:56Z",
"openIssues": 2500,
"watchers": 185000,
"defaultBranch": "master",
"license": "Apache License 2.0",
"homepage": "https://www.tensorflow.org"
}

User Output

{
"login": "torvalds",
"url": "https://github.com/torvalds",
"avatarUrl": "https://avatars.githubusercontent.com/u/1024025",
"type": "User",
"publicRepos": 6,
"followers": 180000,
"following": 0,
"createdAt": "2011-09-03T15:26:22Z",
"bio": "Creator of Linux and Git",
"company": "Linux Foundation",
"location": "Portland, OR",
"blog": "https://torvalds-family.blogspot.com"
}

Rate Limits

  • Unauthenticated API: 60 requests per hour
  • Rate Limit Handling: Automatic wait and retry if rate limit is hit
  • Request Delay: 1.5 seconds between requests to avoid hitting limits

Technical Details

  • Uses GitHub REST API v3
  • Built with Apify SDK 3.0 and Crawlee 3.0
  • TypeScript for type safety
  • gotScraping for HTTP requests with proxy support
  • Multi-stage Docker build for optimized image size

Local Development

# Install dependencies
npm install
# Build TypeScript
npm run build
# Run locally
npm start

Deployment

Deploy to Apify platform:

$apify push

Use Cases

  • Repository discovery and analysis
  • Trending technology tracking
  • Developer community research
  • Open source project monitoring
  • Programming language popularity tracking

License

MIT