Github Repository Analyzer
Pricing
from $8.00 / 1,000 results
Github Repository Analyzer
GitHub Repository Analyzer extracts comprehensive repository metrics using the official GitHub API: stars, forks, watchers, contributors, commit activity, and issues/PRs.
Pricing
from $8.00 / 1,000 results
Rating
0.0
(0)
Developer

Kirill Y
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Analyze GitHub repositories with comprehensive stats: stars, forks, watchers, contributors, commit activity, issues, PRs, and more. Uses official GitHub API for reliable, structured data.
What This Actor Does
Extracts repository data including:
- Basic metrics: stars, forks, watchers, open issues
- Repository metadata: language, topics, license, creation/update dates
- Contributor data: top contributors with contribution counts
- Commit activity: 52-week activity stats and recent commit trends
- Issues and pull requests: counts and status
- Repository settings: archived status, wiki/issues enabled, fork status
Perfect for:
- Open source project research and discovery
- Developer portfolio analysis
- Technology trend monitoring
- Competitive intelligence for SaaS products
- Academic research on open source ecosystems
- Identifying active maintainers and contributors
- Tracking project health and community engagement
Why Use the API (Not Web Scraping)?
This actor uses GitHub's official REST API via Octokit, GitHub's recommended client library. Benefits:
- Reliable: API is stable and versioned, web scraping breaks with UI changes
- Fast: Structured JSON responses, no DOM parsing overhead
- Complete: API provides data not visible in web UI (contributor stats, commit activity)
- Legal: No ToS violations (GitHub prohibits web scraping but encourages API use)
- Rate limiting: Automatic throttling with @octokit/plugin-throttling prevents blocking
- Accurate: Data comes directly from GitHub's database, not rendered HTML
Input Configuration
Required Fields
repositories (array of strings)
- GitHub repository URLs to analyze
- Example:
["https://github.com/apify/crawlee", "https://github.com/mozilla/readability"] - Supports both HTTPS URLs and SSH format
- Automatically strips
.gitsuffix
Optional Fields
githubToken (string, secret)
- Personal access token for higher rate limits (5,000 req/hour vs 60 unauthenticated)
- No scopes required for public repository analysis
- Marked as secret (not logged or exposed)
- Default: empty (unauthenticated mode)
includeContributors (boolean)
- Fetch contributor data with contribution counts
- Adds 1-2 API calls per repository (with pagination)
- Default:
true
maxContributors (integer)
- Limit contributor list size to reduce API calls
- Range: 1-500
- Default:
100
includeCommitActivity (boolean)
- Fetch 52-week commit activity statistics
- Adds 1 API call per repository
- Default:
true
includeIssuesPRs (boolean)
- Include issues and pull requests counts
- No extra API calls (included in basic metadata)
- Default:
true
Example Input
{"repositories": ["https://github.com/apify/crawlee","https://github.com/mozilla/readability","https://github.com/facebook/react"],"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx","includeContributors": true,"maxContributors": 50,"includeCommitActivity": true,"includeIssuesPRs": true}
Getting a GitHub Token
- Go to GitHub Settings → Developer settings → Personal access tokens
- Click "Generate new token (classic)"
- No scopes required for public repository analysis
- Copy the token and add to actor input (automatically marked as secret)
Note: Tokens are optional but highly recommended for analyzing more than a few repositories.
Output Format
Array of repository objects with comprehensive metadata:
Basic Information
repository: Repository inowner/repoformaturl: HTML URL to repositorydescription: Repository description text
Metrics
stars: Number of stars (stargazers)forks: Number of forkswatchers: Number of watchersopenIssues: Number of open issues (includes PRs)
Metadata
language: Primary programming languagetopics: Array of repository topics/tagslicense: License name (e.g., "MIT License")createdAt: Repository creation date (ISO 8601)updatedAt: Last update date (ISO 8601)pushedAt: Last push date (ISO 8601)size: Repository size in KBdefaultBranch: Default branch name (e.g., "main", "master")
Status Flags
isArchived: Whether repository is archivedisFork: Whether repository is a forkhasWiki: Whether repository has wiki enabledhasIssues: Whether repository has issues enabled
Contributor Data (if includeContributors enabled)
contributors: Array of contributor objects:login: GitHub usernamecontributions: Number of contributionsavatarUrl: Avatar image URL
contributorCount: Number of contributors fetched
Commit Activity (if includeCommitActivity enabled)
commitActivity: Array of weekly activity objects (last 12 weeks):week: Unix timestamp for week starttotal: Total commits in weekdays: Array of commits per day (Sunday-Saturday)
recentCommitActivity: Total commits in last 12 weeks
Metadata
scrapedAt: Timestamp when data was collected (ISO 8601)
Example Output
{"repository": "apify/crawlee","url": "https://github.com/apify/crawlee","description": "Crawlee—A web scraping and browser automation library...","stars": 21720,"forks": 1207,"watchers": 21720,"openIssues": 176,"language": "TypeScript","topics": ["crawler", "scraping", "puppeteer", "playwright"],"license": "Apache License 2.0","createdAt": "2016-08-26T18:35:03Z","updatedAt": "2026-02-17T08:10:39Z","pushedAt": "2026-02-17T07:39:59Z","size": 157102,"defaultBranch": "master","isArchived": false,"isFork": false,"hasWiki": true,"hasIssues": true,"contributors": [{"login": "mnmkng","contributions": 1523,"avatarUrl": "https://avatars.githubusercontent.com/u/..."}],"contributorCount": 50,"recentCommitActivity": 234,"scrapedAt": "2026-02-17T08:45:59.500Z"}
Rate Limits
GitHub enforces rate limits based on authentication:
Without token: 60 requests per hour
- Can analyze ~6-10 repositories (with all features enabled)
- Suitable for testing or small batches
With token: 5,000 requests per hour
- Can analyze ~500-700 repositories (with all features enabled)
- Recommended for production use
API calls per repository:
- Basic metadata: 1 call (always)
- Contributors: 1-2 calls (if enabled, with pagination)
- Commit activity: 1 call (if enabled)
- Total: 2-4 calls per repository
Automatic Rate Limit Handling
Actor automatically handles rate limits with @octokit/plugin-throttling:
- Primary rate limits: Retries up to 3 times with exponential backoff
- Secondary rate limits: Always retries (burst protection)
- Logging: Warns when limits are hit and shows retry attempts
- Graceful degradation: Continues processing other repositories if one fails
Reducing API Calls
To analyze more repositories within rate limits:
- Set
includeContributors: false(saves 1-2 calls per repo) - Set
includeCommitActivity: false(saves 1 call per repo) - Reduce
maxContributors(reduces pagination overhead)
With all features disabled, you can analyze up to 60 repositories per hour (unauthenticated) or 5,000 per hour (with token).
Technical Details
API Client: Octokit (@octokit/rest) - GitHub's official JavaScript SDK
Rate Limiting: @octokit/plugin-throttling for automatic throttling and retry logic
Pagination: Efficiently fetches up to maxContributors with pagination support
Runtime: Node.js 24 on apify/actor-node:24 base image
Dataset: Output stored in Apify dataset with configurable views
Error Handling
The actor is designed to be robust and production-ready:
- Per-repository errors: One failed repository doesn't crash entire batch
- Invalid URLs: Logs error and continues to next repository
- API errors: Catches 404 (not found), 403 (rate limit), 500 (server error) and continues
- Rate limit exhaustion: Automatically retries with exponential backoff
- Partial data: Returns available data even if some API calls fail (e.g., commit activity unavailable)
Error results include error field with failure reason for debugging.
Changelog
See ./CHANGELOG.md for version history.