Github Repo Scraper
Pricing
Pay per usage
Github Repo Scraper
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Donny Nguyen
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
GitHub Repository Scraper
What does it do?
GitHub Repository Scraper is an Apify actor that searches GitHub for repositories by keyword or topic and extracts comprehensive repository data. It collects repository names, owners, star counts, fork counts, programming languages, descriptions, last update dates, and topic tags. The scraper navigates through GitHub's search results and individual repository pages to gather detailed information.
This actor is built for developers, tech recruiters, open-source researchers, and anyone analyzing the GitHub ecosystem. It automates the process of discovering and cataloging repositories across any technology domain.
Why use this scraper?
GitHub hosts over 300 million repositories, making it the largest source code hosting platform in the world. Finding and comparing repositories manually across multiple topics is extremely time-consuming. This scraper provides automated discovery and data collection, enabling trend analysis, technology landscape mapping, competitive intelligence, and developer ecosystem research.
Whether you are tracking emerging frameworks, identifying popular libraries in a specific domain, or building a curated list of tools for your organization, this scraper delivers structured data ready for immediate analysis.
How to use it
- Navigate to the GitHub Repository Scraper page on Apify Store.
- Click Try for free to open the actor configuration.
- Enter your search queries (keywords or topics) and set the maximum results.
- Click Start to begin scraping.
- Download the extracted data in JSON, CSV, Excel, or other formats from the Dataset tab.
You can automate runs using the Apify API or integrate with external tools via webhooks.
Input configuration
| Field | Type | Description | Default |
|---|---|---|---|
| queries | Array | List of search keywords or topics | ["web scraping"] |
| maxResults | Integer | Maximum repositories to collect per query | 500 |
| proxyConfiguration | Object | Proxy settings for the scraper | Apify Proxy |
Output data
Each repository entry in the dataset contains:
{"name": "scrapy","owner": "scrapy","fullName": "scrapy/scrapy","description": "Scrapy, a fast high-level web crawling & scraping framework for Python.","stars": 52000,"forks": 10500,"language": "Python","lastUpdated": "2026-02-15T10:30:00Z","topics": ["web-scraping", "python", "crawler"],"url": "https://github.com/scrapy/scrapy","query": "web scraping","scrapedAt": "2026-02-18T12:00:00.000Z"}
Cost of usage
This actor uses Pay-Per-Event pricing at $0.0003 per result delivered. A small fee is also charged per actor start. Scraping 500 repositories for a single query costs approximately $0.15. The lightweight Cheerio-based approach keeps resource usage minimal.
Monitor your spending on the Apify billing page. Apify Proxy usage is included in the per-event pricing.
Tips and tricks
- Broad vs. specific queries: Use broad terms like "machine learning" for ecosystem overviews, or specific terms like "transformer NLP pytorch" for targeted results.
- Track trends: Schedule regular runs with Apify Schedules to monitor star growth and new repositories over time.
- Sort by popularity: The scraper returns results in GitHub's default relevance order. Post-process the dataset to sort by stars or forks.
- Multiple topics: Add several queries in one run to efficiently compare different technology ecosystems.
- Data pipelines: Export data via the Apify API or use integrations with Google Sheets, Airtable, or databases for automated reporting.
- Combine with other data: Pair repository data with contributor information or issue tracking for deeper analysis.
Built by consummate_mandala with Crawlee and Apify SDK.