GitHub Repository Analyzer for AI Agents avatar

GitHub Repository Analyzer for AI Agents

Pricing

Pay per usage

Go to Apify Store
GitHub Repository Analyzer for AI Agents

GitHub Repository Analyzer for AI Agents

Extracts structured data from GitHub repositories for AI agents and RAG pipelines. Supports README, file tree, dependencies, issues, contributors extraction with multiple output formats.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

JARVIS

JARVIS

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

2 days ago

Last modified

Share

Extract structured data from GitHub repositories, optimized for AI agent pipelines, RAG systems, and automated code analysis.

What it does

This Actor takes one or more GitHub repository URLs and extracts comprehensive structured data:

  • Repository metadata: Stars, forks, language, license, topics, timestamps
  • README content: Full Markdown content for RAG ingestion
  • File tree: Directory structure with configurable depth
  • Dependencies: Parsed from package.json, requirements.txt, go.mod, Cargo.toml
  • Issues: Open issues with labels and comments count
  • Contributors: Top contributors with contribution counts
  • Language breakdown: Bytes per language
  • Releases: Recent releases with changelogs

Output formats

FormatDescriptionUse case
ai-optimizedIncludes a pre-formatted summary field for direct LLM consumptionRAG pipelines, AI agents
fullAll available data in structured JSONData analysis, indexing
compactEssential fields only, minimal payloadQuick lookups, dashboards

Example use cases

  • AI Agent code understanding: Feed repository structure to an AI agent that needs to understand a codebase
  • Tech stack analysis: Analyze dependencies across hundreds of repositories
  • Open source intelligence: Track repository health, contributor activity, and release cadence
  • RAG knowledge base: Build a searchable knowledge base from GitHub repositories

Rate limits

  • Without token: 60 requests/hour (roughly 10 repos with full analysis)
  • With token: 5,000 requests/hour (hundreds of repos)

For best results, provide a GitHub personal access token in the input.

Input example

{
"repoUrls": [
"https://github.com/apify/crawlee",
"https://github.com/microsoft/TypeScript"
],
"includeReadme": true,
"includeFileTree": true,
"fileTreeDepth": 3,
"includeDependencies": true,
"includeLanguages": true,
"outputFormat": "ai-optimized"
}

Output example

{
"repoUrl": "https://github.com/apify/crawlee",
"name": "crawlee",
"fullName": "apify/crawlee",
"description": "Crawlee—A web scraping and browser automation library...",
"stars": 15000,
"forks": 800,
"language": "TypeScript",
"license": "Apache-2.0",
"topics": ["web-scraping", "crawler", "automation"],
"dependencies": {
"packageManager": "npm",
"dependencies": { ... }
},
"aiSummary": "# apify/crawlee\n..."
}

Cost

This Actor uses minimal compute. Each repository requires 3-10 GitHub API calls depending on options selected. With the free Apify plan, you can analyze hundreds of repositories per month.