Pricing

from $8.00 / 1,000 results

Github Repository Analyzer

GitHub Repository Analyzer extracts comprehensive repository metrics using the official GitHub API: stars, forks, watchers, contributors, commit activity, and issues/PRs.

Pricing

from $8.00 / 1,000 results

Rating

0.0

(0)

Developer

john Y

Actor stats

Bookmarked

Total users

Monthly active users

22 days ago

Last modified

What This Actor Does

Extracts repository data including:

Basic metrics: stars, forks, watchers, open issues
Repository metadata: language, topics, license, creation/update dates
Contributor data: top contributors with contribution counts
Commit activity: 52-week activity stats and recent commit trends
Issues and pull requests: counts and status
Repository settings: archived status, wiki/issues enabled, fork status

Perfect for:

Open source project research and discovery
Developer portfolio analysis
Technology trend monitoring
Competitive intelligence for SaaS products
Academic research on open source ecosystems
Identifying active maintainers and contributors
Tracking project health and community engagement

Why Use the API (Not Web Scraping)?

This actor uses GitHub's official REST API via Octokit, GitHub's recommended client library. Benefits:

Reliable: API is stable and versioned, web scraping breaks with UI changes
Fast: Structured JSON responses, no DOM parsing overhead
Complete: API provides data not visible in web UI (contributor stats, commit activity)
Legal: No ToS violations (GitHub prohibits web scraping but encourages API use)
Rate limiting: Automatic throttling with @octokit/plugin-throttling prevents blocking
Accurate: Data comes directly from GitHub's database, not rendered HTML

Input Configuration

Required Fields

repositories (array of strings)

GitHub repository URLs to analyze
Example: ["https://github.com/apify/crawlee", "https://github.com/mozilla/readability"]
Supports both HTTPS URLs and SSH format
Automatically strips .git suffix

Optional Fields

githubToken (string, secret)

Personal access token for higher rate limits (5,000 req/hour vs 60 unauthenticated)
No scopes required for public repository analysis
Marked as secret (not logged or exposed)
Default: empty (unauthenticated mode)

includeContributors (boolean)

Fetch contributor data with contribution counts
Adds 1-2 API calls per repository (with pagination)
Default: true

maxContributors (integer)

Limit contributor list size to reduce API calls
Range: 1-500
Default: 100

includeCommitActivity (boolean)

Fetch 52-week commit activity statistics
Adds 1 API call per repository
Default: true

includeIssuesPRs (boolean)

Include issues and pull requests counts
No extra API calls (included in basic metadata)
Default: true

Example Input

{
  "repositories": [
    "https://github.com/apify/crawlee",
    "https://github.com/mozilla/readability",
    "https://github.com/facebook/react"
  ],
  "githubToken": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "includeContributors": true,
  "maxContributors": 50,
  "includeCommitActivity": true,
  "includeIssuesPRs": true
}

Getting a GitHub Token

Go to GitHub Settings → Developer settings → Personal access tokens
Click "Generate new token (classic)"
No scopes required for public repository analysis
Copy the token and add to actor input (automatically marked as secret)

Note: Tokens are optional but highly recommended for analyzing more than a few repositories.

Output Format

Array of repository objects with comprehensive metadata:

Basic Information

repository: Repository in owner/repo format
url: HTML URL to repository
description: Repository description text

Metrics

stars: Number of stars (stargazers)
forks: Number of forks
watchers: Number of watchers
openIssues: Number of open issues (includes PRs)

Metadata

language: Primary programming language
topics: Array of repository topics/tags
license: License name (e.g., "MIT License")
createdAt: Repository creation date (ISO 8601)
updatedAt: Last update date (ISO 8601)
pushedAt: Last push date (ISO 8601)
size: Repository size in KB
defaultBranch: Default branch name (e.g., "main", "master")

Status Flags

isArchived: Whether repository is archived
isFork: Whether repository is a fork
hasWiki: Whether repository has wiki enabled
hasIssues: Whether repository has issues enabled

Contributor Data (if includeContributors enabled)

contributors: Array of contributor objects:
- login: GitHub username
- contributions: Number of contributions
- avatarUrl: Avatar image URL
contributorCount: Number of contributors fetched

Commit Activity (if includeCommitActivity enabled)

commitActivity: Array of weekly activity objects (last 12 weeks):
- week: Unix timestamp for week start
- total: Total commits in week
- days: Array of commits per day (Sunday-Saturday)
recentCommitActivity: Total commits in last 12 weeks

Metadata

scrapedAt: Timestamp when data was collected (ISO 8601)

Example Output

{
  "repository": "apify/crawlee",
  "url": "https://github.com/apify/crawlee",
  "description": "Crawlee—A web scraping and browser automation library...",
  "stars": 21720,
  "forks": 1207,
  "watchers": 21720,
  "openIssues": 176,
  "language": "TypeScript",
  "topics": ["crawler", "scraping", "puppeteer", "playwright"],
  "license": "Apache License 2.0",
  "createdAt": "2016-08-26T18:35:03Z",
  "updatedAt": "2026-02-17T08:10:39Z",
  "pushedAt": "2026-02-17T07:39:59Z",
  "size": 157102,
  "defaultBranch": "master",
  "isArchived": false,
  "isFork": false,
  "hasWiki": true,
  "hasIssues": true,
  "contributors": [
    {
      "login": "mnmkng",
      "contributions": 1523,
      "avatarUrl": "https://avatars.githubusercontent.com/u/..."
    }
  ],
  "contributorCount": 50,
  "recentCommitActivity": 234,
  "scrapedAt": "2026-02-17T08:45:59.500Z"
}

Rate Limits

GitHub enforces rate limits based on authentication:

Without token: 60 requests per hour

Can analyze ~6-10 repositories (with all features enabled)
Suitable for testing or small batches

With token: 5,000 requests per hour

Can analyze ~500-700 repositories (with all features enabled)
Recommended for production use

API calls per repository:

Basic metadata: 1 call (always)
Contributors: 1-2 calls (if enabled, with pagination)
Commit activity: 1 call (if enabled)
Total: 2-4 calls per repository

Automatic Rate Limit Handling

Actor automatically handles rate limits with @octokit/plugin-throttling:

Primary rate limits: Retries up to 3 times with exponential backoff
Secondary rate limits: Always retries (burst protection)
Logging: Warns when limits are hit and shows retry attempts
Graceful degradation: Continues processing other repositories if one fails

Reducing API Calls

To analyze more repositories within rate limits:

Set includeContributors: false (saves 1-2 calls per repo)
Set includeCommitActivity: false (saves 1 call per repo)
Reduce maxContributors (reduces pagination overhead)

With all features disabled, you can analyze up to 60 repositories per hour (unauthenticated) or 5,000 per hour (with token).

Technical Details

API Client: Octokit (@octokit/rest) - GitHub's official JavaScript SDK Rate Limiting: @octokit/plugin-throttling for automatic throttling and retry logic Pagination: Efficiently fetches up to maxContributors with pagination support Runtime: Node.js 24 on apify/actor-node:24 base image Dataset: Output stored in Apify dataset with configurable views

Error Handling

The actor is designed to be robust and production-ready:

Per-repository errors: One failed repository doesn't crash entire batch
Invalid URLs: Logs error and continues to next repository
API errors: Catches 404 (not found), 403 (rate limit), 500 (server error) and continues
Rate limit exhaustion: Automatically retries with exponential backoff
Partial data: Returns available data even if some API calls fail (e.g., commit activity unavailable)

Error results include error field with failure reason for debugging.

Changelog

See ./CHANGELOG.md for version history.

Resources

GitHub Repository Analyzer

optimus-fulcria/github-repo-analyzer

Analyze GitHub repositories: stars, forks, issues, contributors, languages, commit activity. Competitive intelligence for open source.

Fulcria Labs

GitHub Repository Stats

nexgendata/github-repo-stats

Extract repository statistics including stars, forks, issues, contributors, and activity metrics from GitHub. Monitor open source project health.

Stephan Corbeil

GitHub Repository Scraper — Stars, Issues & Activity

sovereigntaylor/github-repo-scraper

Scrape any GitHub repository for stars, forks, issues, PRs, contributors, languages, topics, releases, license, last commit, and README preview. Search repos by keyword with language and star filters. Great for tech research and competitive analysis.

Ricardo Akiyoshi

GitHub Repository Scraper

vulnv/github-repository-scraper

Scrape and extract GitHub repository data, metadata, statistics, stars, forks, issues, and project information from multiple repositories at once.

VulnV

5.0

GitHub Repository Analyzer

pretty_dress/github-repo-analyzer

Analyze GitHub repositories to get comprehensive metrics: stars, forks, contributors, languages, activity scores, and health assessments. Perfect for tech due diligence and competitive analysis.

嘉杰韦

GitHub Repository Scraper

nexgendata/github-scraper

Search and extract GitHub repositories with stars, forks, languages, descriptions and contributor info.

Stephan Corbeil

GitHub Repositories Scraper - Cheap📦🐙🔍

scrapestorm/github-repositories-scraper---cheap

🔍 Easily collect repositories from GitHub Provide a GitHub profile URL or username and extract detailed repository information such as repository name, description, language, stars, topics & repository link 📦🐙 Perfect for open-source analysis, developer scouting & market intelligence 📊🔥