GitHub Repository Analyzer for AI Agents
Pricing
Pay per usage
GitHub Repository Analyzer for AI Agents
Extracts structured data from GitHub repositories for AI agents and RAG pipelines. Supports README, file tree, dependencies, issues, contributors extraction with multiple output formats.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
BBB & Company
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
0
Monthly active users
6 days ago
Last modified
Categories
Share
Extract structured data from GitHub repositories, optimized for AI agent pipelines, RAG systems, and automated code analysis.
What it does
This Actor takes one or more GitHub repository URLs and extracts comprehensive structured data:
- Repository metadata: Stars, forks, language, license, topics, timestamps
- README content: Full Markdown content for RAG ingestion
- File tree: Directory structure with configurable depth
- Dependencies: Parsed from package.json, requirements.txt, go.mod, Cargo.toml
- Issues: Open issues with labels and comments count
- Contributors: Top contributors with contribution counts
- Language breakdown: Bytes per language
- Releases: Recent releases with changelogs
Output formats
| Format | Description | Use case |
|---|---|---|
ai-optimized | Includes a pre-formatted summary field for direct LLM consumption | RAG pipelines, AI agents |
full | All available data in structured JSON | Data analysis, indexing |
compact | Essential fields only, minimal payload | Quick lookups, dashboards |
Example use cases
- AI Agent code understanding: Feed repository structure to an AI agent that needs to understand a codebase
- Tech stack analysis: Analyze dependencies across hundreds of repositories
- Open source intelligence: Track repository health, contributor activity, and release cadence
- RAG knowledge base: Build a searchable knowledge base from GitHub repositories
Rate limits
- Without token: 60 requests/hour (roughly 10 repos with full analysis)
- With token: 5,000 requests/hour (hundreds of repos)
For best results, provide a GitHub personal access token in the input.
Input example
{"repoUrls": ["https://github.com/apify/crawlee","https://github.com/microsoft/TypeScript"],"includeReadme": true,"includeFileTree": true,"fileTreeDepth": 3,"includeDependencies": true,"includeLanguages": true,"outputFormat": "ai-optimized"}
Output example
{"repoUrl": "https://github.com/apify/crawlee","name": "crawlee","fullName": "apify/crawlee","description": "Crawlee—A web scraping and browser automation library...","stars": 15000,"forks": 800,"language": "TypeScript","license": "Apache-2.0","topics": ["web-scraping", "crawler", "automation"],"dependencies": {"packageManager": "npm","dependencies": { ... }},"aiSummary": "# apify/crawlee\n..."}
Cost
This Actor uses minimal compute. Each repository requires 3-10 GitHub API calls depending on options selected. With the free Apify plan, you can analyze hundreds of repositories per month.
Store submission packet
Primary category: Developer tools
Secondary category: AI agents / Data extraction
Store description: Extract GitHub repository metadata, README content, file trees, dependencies, language breakdowns, issues, contributors, and releases into clean JSON for AI agents, RAG ingestion, due diligence, and open-source intelligence workflows.
Recommended monetization: Pay per event (PPE). Keep the automatic apify-actor-start event at Apify's default $0.00005. Enable the automatic apify-default-dataset-item event at $0.03 per repository result ($30 per 1,000 analyzed repositories). Do not pass platform usage costs to users after the first pricing validation run unless margins are negative.
Chargeable output mapping: One default dataset item is written for each repository analysis. If a repository fails after validation/API work, one error item is written and is also chargeable through apify-default-dataset-item.
User-visible limits: Unauthenticated GitHub runs are limited by GitHub's public API rate limits. Users should provide their own GitHub token for private repositories or large batches.