# GitHub Repository Analyzer for AI Agents (`project_bbb/github-repo-analyzer`) Actor

Extracts structured data from GitHub repositories for AI agents and RAG pipelines. Supports README, file tree, dependencies, issues, contributors extraction with multiple output formats.

- **URL**: https://apify.com/project\_bbb/github-repo-analyzer.md
- **Developed by:** [JARVIS](https://apify.com/project_bbb) (community)
- **Categories:** AI, Developer tools, Open source
- **Stats:** 2 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## GitHub Repository Analyzer for AI Agents

Extract structured data from GitHub repositories, optimized for AI agent pipelines, RAG systems, and automated code analysis.

### What it does

This Actor takes one or more GitHub repository URLs and extracts comprehensive structured data:

- **Repository metadata**: Stars, forks, language, license, topics, timestamps
- **README content**: Full Markdown content for RAG ingestion
- **File tree**: Directory structure with configurable depth
- **Dependencies**: Parsed from package.json, requirements.txt, go.mod, Cargo.toml
- **Issues**: Open issues with labels and comments count
- **Contributors**: Top contributors with contribution counts
- **Language breakdown**: Bytes per language
- **Releases**: Recent releases with changelogs

### Output formats

| Format | Description | Use case |
|--------|-------------|----------|
| `ai-optimized` | Includes a pre-formatted summary field for direct LLM consumption | RAG pipelines, AI agents |
| `full` | All available data in structured JSON | Data analysis, indexing |
| `compact` | Essential fields only, minimal payload | Quick lookups, dashboards |

### Example use cases

- **AI Agent code understanding**: Feed repository structure to an AI agent that needs to understand a codebase
- **Tech stack analysis**: Analyze dependencies across hundreds of repositories
- **Open source intelligence**: Track repository health, contributor activity, and release cadence
- **RAG knowledge base**: Build a searchable knowledge base from GitHub repositories

### Rate limits

- Without token: 60 requests/hour (roughly 10 repos with full analysis)
- With token: 5,000 requests/hour (hundreds of repos)

For best results, provide a GitHub personal access token in the input.

### Input example

```json
{
    "repoUrls": [
        "https://github.com/apify/crawlee",
        "https://github.com/microsoft/TypeScript"
    ],
    "includeReadme": true,
    "includeFileTree": true,
    "fileTreeDepth": 3,
    "includeDependencies": true,
    "includeLanguages": true,
    "outputFormat": "ai-optimized"
}
````

### Output example

```json
{
    "repoUrl": "https://github.com/apify/crawlee",
    "name": "crawlee",
    "fullName": "apify/crawlee",
    "description": "Crawlee—A web scraping and browser automation library...",
    "stars": 15000,
    "forks": 800,
    "language": "TypeScript",
    "license": "Apache-2.0",
    "topics": ["web-scraping", "crawler", "automation"],
    "dependencies": {
        "packageManager": "npm",
        "dependencies": { ... }
    },
    "aiSummary": "## apify/crawlee\n..."
}
```

### Cost

This Actor uses minimal compute. Each repository requires 3-10 GitHub API calls depending on options selected. With the free Apify plan, you can analyze hundreds of repositories per month.

# Actor input Schema

## `repoUrls` (type: `array`):

List of GitHub repository URLs to analyze (e.g., https://github.com/owner/repo)

## `includeReadme` (type: `boolean`):

Extract and include the README content (Markdown)

## `includeFileTree` (type: `boolean`):

Extract the repository file/directory structure

## `fileTreeDepth` (type: `integer`):

Maximum depth for file tree traversal (0 = root only)

## `includeDependencies` (type: `boolean`):

Parse and include dependency information (package.json, requirements.txt, go.mod, etc.)

## `includeIssues` (type: `boolean`):

Fetch open issues

## `issueLimit` (type: `integer`):

Maximum number of issues to fetch per repository

## `includeContributors` (type: `boolean`):

Fetch contributor information

## `includeLanguages` (type: `boolean`):

Fetch language breakdown statistics

## `includeReleases` (type: `boolean`):

Fetch recent releases/tags

## `releaseLimit` (type: `integer`):

Maximum number of releases to fetch

## `githubToken` (type: `string`):

Personal access token for higher rate limits (5000 req/h vs 60 req/h). Required for private repos.

## `outputFormat` (type: `string`):

How to structure the output data

## Actor input object example

```json
{
  "repoUrls": [
    "https://github.com/apify/crawlee"
  ],
  "includeReadme": true,
  "includeFileTree": true,
  "fileTreeDepth": 3,
  "includeDependencies": true,
  "includeIssues": false,
  "issueLimit": 20,
  "includeContributors": false,
  "includeLanguages": true,
  "includeReleases": false,
  "releaseLimit": 5,
  "outputFormat": "ai-optimized"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "repoUrls": [
        "https://github.com/apify/crawlee"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("project_bbb/github-repo-analyzer").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "repoUrls": ["https://github.com/apify/crawlee"] }

# Run the Actor and wait for it to finish
run = client.actor("project_bbb/github-repo-analyzer").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "repoUrls": [
    "https://github.com/apify/crawlee"
  ]
}' |
apify call project_bbb/github-repo-analyzer --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=project_bbb/github-repo-analyzer",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "GitHub Repository Analyzer for AI Agents",
        "description": "Extracts structured data from GitHub repositories for AI agents and RAG pipelines. Supports README, file tree, dependencies, issues, contributors extraction with multiple output formats.",
        "version": "1.0",
        "x-build-id": "vZFK3oQ1SbuW7sr5f"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/project_bbb~github-repo-analyzer/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-project_bbb-github-repo-analyzer",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/project_bbb~github-repo-analyzer/runs": {
            "post": {
                "operationId": "runs-sync-project_bbb-github-repo-analyzer",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/project_bbb~github-repo-analyzer/run-sync": {
            "post": {
                "operationId": "run-sync-project_bbb-github-repo-analyzer",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "repoUrls"
                ],
                "properties": {
                    "repoUrls": {
                        "title": "Repository URLs",
                        "type": "array",
                        "description": "List of GitHub repository URLs to analyze (e.g., https://github.com/owner/repo)",
                        "items": {
                            "type": "string"
                        }
                    },
                    "includeReadme": {
                        "title": "Include README",
                        "type": "boolean",
                        "description": "Extract and include the README content (Markdown)",
                        "default": true
                    },
                    "includeFileTree": {
                        "title": "Include File Tree",
                        "type": "boolean",
                        "description": "Extract the repository file/directory structure",
                        "default": true
                    },
                    "fileTreeDepth": {
                        "title": "File Tree Depth",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Maximum depth for file tree traversal (0 = root only)",
                        "default": 3
                    },
                    "includeDependencies": {
                        "title": "Include Dependencies",
                        "type": "boolean",
                        "description": "Parse and include dependency information (package.json, requirements.txt, go.mod, etc.)",
                        "default": true
                    },
                    "includeIssues": {
                        "title": "Include Issues",
                        "type": "boolean",
                        "description": "Fetch open issues",
                        "default": false
                    },
                    "issueLimit": {
                        "title": "Issue Limit",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of issues to fetch per repository",
                        "default": 20
                    },
                    "includeContributors": {
                        "title": "Include Contributors",
                        "type": "boolean",
                        "description": "Fetch contributor information",
                        "default": false
                    },
                    "includeLanguages": {
                        "title": "Include Languages",
                        "type": "boolean",
                        "description": "Fetch language breakdown statistics",
                        "default": true
                    },
                    "includeReleases": {
                        "title": "Include Releases",
                        "type": "boolean",
                        "description": "Fetch recent releases/tags",
                        "default": false
                    },
                    "releaseLimit": {
                        "title": "Release Limit",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Maximum number of releases to fetch",
                        "default": 5
                    },
                    "githubToken": {
                        "title": "GitHub Token (Optional)",
                        "type": "string",
                        "description": "Personal access token for higher rate limits (5000 req/h vs 60 req/h). Required for private repos."
                    },
                    "outputFormat": {
                        "title": "Output Format",
                        "enum": [
                            "full",
                            "compact",
                            "ai-optimized"
                        ],
                        "type": "string",
                        "description": "How to structure the output data",
                        "default": "ai-optimized"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
