Github Repository Analyzer avatar

Github Repository Analyzer

Pricing

from $8.00 / 1,000 results

Go to Apify Store
Github Repository Analyzer

Github Repository Analyzer

GitHub Repository Analyzer extracts comprehensive repository metrics using the official GitHub API: stars, forks, watchers, contributors, commit activity, and issues/PRs.

Pricing

from $8.00 / 1,000 results

Rating

0.0

(0)

Developer

Kirill Y

Kirill Y

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Analyze GitHub repositories with comprehensive stats: stars, forks, watchers, contributors, commit activity, issues, PRs, and more. Uses official GitHub API for reliable, structured data.

What This Actor Does

Extracts repository data including:

  • Basic metrics: stars, forks, watchers, open issues
  • Repository metadata: language, topics, license, creation/update dates
  • Contributor data: top contributors with contribution counts
  • Commit activity: 52-week activity stats and recent commit trends
  • Issues and pull requests: counts and status
  • Repository settings: archived status, wiki/issues enabled, fork status

Perfect for:

  • Open source project research and discovery
  • Developer portfolio analysis
  • Technology trend monitoring
  • Competitive intelligence for SaaS products
  • Academic research on open source ecosystems
  • Identifying active maintainers and contributors
  • Tracking project health and community engagement

Why Use the API (Not Web Scraping)?

This actor uses GitHub's official REST API via Octokit, GitHub's recommended client library. Benefits:

  • Reliable: API is stable and versioned, web scraping breaks with UI changes
  • Fast: Structured JSON responses, no DOM parsing overhead
  • Complete: API provides data not visible in web UI (contributor stats, commit activity)
  • Legal: No ToS violations (GitHub prohibits web scraping but encourages API use)
  • Rate limiting: Automatic throttling with @octokit/plugin-throttling prevents blocking
  • Accurate: Data comes directly from GitHub's database, not rendered HTML

Input Configuration

Required Fields

repositories (array of strings)

  • GitHub repository URLs to analyze
  • Example: ["https://github.com/apify/crawlee", "https://github.com/mozilla/readability"]
  • Supports both HTTPS URLs and SSH format
  • Automatically strips .git suffix

Optional Fields

githubToken (string, secret)

  • Personal access token for higher rate limits (5,000 req/hour vs 60 unauthenticated)
  • No scopes required for public repository analysis
  • Marked as secret (not logged or exposed)
  • Default: empty (unauthenticated mode)

includeContributors (boolean)

  • Fetch contributor data with contribution counts
  • Adds 1-2 API calls per repository (with pagination)
  • Default: true

maxContributors (integer)

  • Limit contributor list size to reduce API calls
  • Range: 1-500
  • Default: 100

includeCommitActivity (boolean)

  • Fetch 52-week commit activity statistics
  • Adds 1 API call per repository
  • Default: true

includeIssuesPRs (boolean)

  • Include issues and pull requests counts
  • No extra API calls (included in basic metadata)
  • Default: true

Example Input

{
"repositories": [
"https://github.com/apify/crawlee",
"https://github.com/mozilla/readability",
"https://github.com/facebook/react"
],
"githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"includeContributors": true,
"maxContributors": 50,
"includeCommitActivity": true,
"includeIssuesPRs": true
}

Getting a GitHub Token

  1. Go to GitHub Settings → Developer settings → Personal access tokens
  2. Click "Generate new token (classic)"
  3. No scopes required for public repository analysis
  4. Copy the token and add to actor input (automatically marked as secret)

Note: Tokens are optional but highly recommended for analyzing more than a few repositories.

Output Format

Array of repository objects with comprehensive metadata:

Basic Information

  • repository: Repository in owner/repo format
  • url: HTML URL to repository
  • description: Repository description text

Metrics

  • stars: Number of stars (stargazers)
  • forks: Number of forks
  • watchers: Number of watchers
  • openIssues: Number of open issues (includes PRs)

Metadata

  • language: Primary programming language
  • topics: Array of repository topics/tags
  • license: License name (e.g., "MIT License")
  • createdAt: Repository creation date (ISO 8601)
  • updatedAt: Last update date (ISO 8601)
  • pushedAt: Last push date (ISO 8601)
  • size: Repository size in KB
  • defaultBranch: Default branch name (e.g., "main", "master")

Status Flags

  • isArchived: Whether repository is archived
  • isFork: Whether repository is a fork
  • hasWiki: Whether repository has wiki enabled
  • hasIssues: Whether repository has issues enabled

Contributor Data (if includeContributors enabled)

  • contributors: Array of contributor objects:
    • login: GitHub username
    • contributions: Number of contributions
    • avatarUrl: Avatar image URL
  • contributorCount: Number of contributors fetched

Commit Activity (if includeCommitActivity enabled)

  • commitActivity: Array of weekly activity objects (last 12 weeks):
    • week: Unix timestamp for week start
    • total: Total commits in week
    • days: Array of commits per day (Sunday-Saturday)
  • recentCommitActivity: Total commits in last 12 weeks

Metadata

  • scrapedAt: Timestamp when data was collected (ISO 8601)

Example Output

{
"repository": "apify/crawlee",
"url": "https://github.com/apify/crawlee",
"description": "Crawlee—A web scraping and browser automation library...",
"stars": 21720,
"forks": 1207,
"watchers": 21720,
"openIssues": 176,
"language": "TypeScript",
"topics": ["crawler", "scraping", "puppeteer", "playwright"],
"license": "Apache License 2.0",
"createdAt": "2016-08-26T18:35:03Z",
"updatedAt": "2026-02-17T08:10:39Z",
"pushedAt": "2026-02-17T07:39:59Z",
"size": 157102,
"defaultBranch": "master",
"isArchived": false,
"isFork": false,
"hasWiki": true,
"hasIssues": true,
"contributors": [
{
"login": "mnmkng",
"contributions": 1523,
"avatarUrl": "https://avatars.githubusercontent.com/u/..."
}
],
"contributorCount": 50,
"recentCommitActivity": 234,
"scrapedAt": "2026-02-17T08:45:59.500Z"
}

Rate Limits

GitHub enforces rate limits based on authentication:

Without token: 60 requests per hour

  • Can analyze ~6-10 repositories (with all features enabled)
  • Suitable for testing or small batches

With token: 5,000 requests per hour

  • Can analyze ~500-700 repositories (with all features enabled)
  • Recommended for production use

API calls per repository:

  • Basic metadata: 1 call (always)
  • Contributors: 1-2 calls (if enabled, with pagination)
  • Commit activity: 1 call (if enabled)
  • Total: 2-4 calls per repository

Automatic Rate Limit Handling

Actor automatically handles rate limits with @octokit/plugin-throttling:

  • Primary rate limits: Retries up to 3 times with exponential backoff
  • Secondary rate limits: Always retries (burst protection)
  • Logging: Warns when limits are hit and shows retry attempts
  • Graceful degradation: Continues processing other repositories if one fails

Reducing API Calls

To analyze more repositories within rate limits:

  1. Set includeContributors: false (saves 1-2 calls per repo)
  2. Set includeCommitActivity: false (saves 1 call per repo)
  3. Reduce maxContributors (reduces pagination overhead)

With all features disabled, you can analyze up to 60 repositories per hour (unauthenticated) or 5,000 per hour (with token).

Technical Details

API Client: Octokit (@octokit/rest) - GitHub's official JavaScript SDK Rate Limiting: @octokit/plugin-throttling for automatic throttling and retry logic Pagination: Efficiently fetches up to maxContributors with pagination support Runtime: Node.js 24 on apify/actor-node:24 base image Dataset: Output stored in Apify dataset with configurable views

Error Handling

The actor is designed to be robust and production-ready:

  • Per-repository errors: One failed repository doesn't crash entire batch
  • Invalid URLs: Logs error and continues to next repository
  • API errors: Catches 404 (not found), 403 (rate limit), 500 (server error) and continues
  • Rate limit exhaustion: Automatically retries with exponential backoff
  • Partial data: Returns available data even if some API calls fail (e.g., commit activity unavailable)

Error results include error field with failure reason for debugging.

Changelog

See ./CHANGELOG.md for version history.

Resources