GitHub Repository Analyzer for AI Agents
Pricing
Pay per usage
GitHub Repository Analyzer for AI Agents
Extracts structured data from GitHub repositories for AI agents and RAG pipelines. Supports README, file tree, dependencies, issues, contributors extraction with multiple output formats.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
JARVIS
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
Extract structured data from GitHub repositories, optimized for AI agent pipelines, RAG systems, and automated code analysis.
What it does
This Actor takes one or more GitHub repository URLs and extracts comprehensive structured data:
- Repository metadata: Stars, forks, language, license, topics, timestamps
- README content: Full Markdown content for RAG ingestion
- File tree: Directory structure with configurable depth
- Dependencies: Parsed from package.json, requirements.txt, go.mod, Cargo.toml
- Issues: Open issues with labels and comments count
- Contributors: Top contributors with contribution counts
- Language breakdown: Bytes per language
- Releases: Recent releases with changelogs
Output formats
| Format | Description | Use case |
|---|---|---|
ai-optimized | Includes a pre-formatted summary field for direct LLM consumption | RAG pipelines, AI agents |
full | All available data in structured JSON | Data analysis, indexing |
compact | Essential fields only, minimal payload | Quick lookups, dashboards |
Example use cases
- AI Agent code understanding: Feed repository structure to an AI agent that needs to understand a codebase
- Tech stack analysis: Analyze dependencies across hundreds of repositories
- Open source intelligence: Track repository health, contributor activity, and release cadence
- RAG knowledge base: Build a searchable knowledge base from GitHub repositories
Rate limits
- Without token: 60 requests/hour (roughly 10 repos with full analysis)
- With token: 5,000 requests/hour (hundreds of repos)
For best results, provide a GitHub personal access token in the input.
Input example
{"repoUrls": ["https://github.com/apify/crawlee","https://github.com/microsoft/TypeScript"],"includeReadme": true,"includeFileTree": true,"fileTreeDepth": 3,"includeDependencies": true,"includeLanguages": true,"outputFormat": "ai-optimized"}
Output example
{"repoUrl": "https://github.com/apify/crawlee","name": "crawlee","fullName": "apify/crawlee","description": "Crawlee—A web scraping and browser automation library...","stars": 15000,"forks": 800,"language": "TypeScript","license": "Apache-2.0","topics": ["web-scraping", "crawler", "automation"],"dependencies": {"packageManager": "npm","dependencies": { ... }},"aiSummary": "# apify/crawlee\n..."}
Cost
This Actor uses minimal compute. Each repository requires 3-10 GitHub API calls depending on options selected. With the free Apify plan, you can analyze hundreds of repositories per month.