Github Repo Scraper avatar

Github Repo Scraper

Pricing

Pay per usage

Go to Apify Store
Github Repo Scraper

Github Repo Scraper

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny Nguyen

Donny Nguyen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

GitHub Repository Scraper

What does it do?

GitHub Repository Scraper is an Apify actor that searches GitHub for repositories by keyword or topic and extracts comprehensive repository data. It collects repository names, owners, star counts, fork counts, programming languages, descriptions, last update dates, and topic tags. The scraper navigates through GitHub's search results and individual repository pages to gather detailed information.

This actor is built for developers, tech recruiters, open-source researchers, and anyone analyzing the GitHub ecosystem. It automates the process of discovering and cataloging repositories across any technology domain.

Why use this scraper?

GitHub hosts over 300 million repositories, making it the largest source code hosting platform in the world. Finding and comparing repositories manually across multiple topics is extremely time-consuming. This scraper provides automated discovery and data collection, enabling trend analysis, technology landscape mapping, competitive intelligence, and developer ecosystem research.

Whether you are tracking emerging frameworks, identifying popular libraries in a specific domain, or building a curated list of tools for your organization, this scraper delivers structured data ready for immediate analysis.

How to use it

  1. Navigate to the GitHub Repository Scraper page on Apify Store.
  2. Click Try for free to open the actor configuration.
  3. Enter your search queries (keywords or topics) and set the maximum results.
  4. Click Start to begin scraping.
  5. Download the extracted data in JSON, CSV, Excel, or other formats from the Dataset tab.

You can automate runs using the Apify API or integrate with external tools via webhooks.

Input configuration

FieldTypeDescriptionDefault
queriesArrayList of search keywords or topics["web scraping"]
maxResultsIntegerMaximum repositories to collect per query500
proxyConfigurationObjectProxy settings for the scraperApify Proxy

Output data

Each repository entry in the dataset contains:

{
"name": "scrapy",
"owner": "scrapy",
"fullName": "scrapy/scrapy",
"description": "Scrapy, a fast high-level web crawling & scraping framework for Python.",
"stars": 52000,
"forks": 10500,
"language": "Python",
"lastUpdated": "2026-02-15T10:30:00Z",
"topics": ["web-scraping", "python", "crawler"],
"url": "https://github.com/scrapy/scrapy",
"query": "web scraping",
"scrapedAt": "2026-02-18T12:00:00.000Z"
}

Cost of usage

This actor uses Pay-Per-Event pricing at $0.0003 per result delivered. A small fee is also charged per actor start. Scraping 500 repositories for a single query costs approximately $0.15. The lightweight Cheerio-based approach keeps resource usage minimal.

Monitor your spending on the Apify billing page. Apify Proxy usage is included in the per-event pricing.

Tips and tricks

  • Broad vs. specific queries: Use broad terms like "machine learning" for ecosystem overviews, or specific terms like "transformer NLP pytorch" for targeted results.
  • Track trends: Schedule regular runs with Apify Schedules to monitor star growth and new repositories over time.
  • Sort by popularity: The scraper returns results in GitHub's default relevance order. Post-process the dataset to sort by stars or forks.
  • Multiple topics: Add several queries in one run to efficiently compare different technology ecosystems.
  • Data pipelines: Export data via the Apify API or use integrations with Google Sheets, Airtable, or databases for automated reporting.
  • Combine with other data: Pair repository data with contributor information or issue tracking for deeper analysis.

Built by consummate_mandala with Crawlee and Apify SDK.