Actor picture

GitHub repos search scraper

strajk/github-repos-search-scraper

Given a search query (e.g. "Apify"), scrapes all repos from GitHub containing that query in title or description. It's not limited to the first 1000 results as the official API is.

No credit card required

Author's avatarPavel Dolecek
  • Modified
  • Users1
  • Runs0
Actor picture
GitHub repos search scraper

About implementation

https://docs.github.com/en/free-pro-team@latest/rest/reference/search GitHub Search API provides only up to 1000 results for each search. Because of this limitation, we have to do some workarounds, and even with them the results are not guaranteed to be complete.

The workaround

  • Sort results by stars
  • Get first 1000 results
  • Check number of stars of the last result, and use that number for filtering the next search query
  • Repeat

Limitation

If there's more than 1000 results with same number of stars, there's no way to get them all

Real example

Statistics of results for meteor query (as of 2020-11-03)

Stars Results Diff
no filter 46 742 1000
<26 45 759 983
<11 44 851 908
<7 44 120 731
<5 43 401 719
<4 42 839 562
<3 41 971 868
<2 40 415 1556
<1 36 068 5347

Why it is not possible to sort by date? https://stackoverflow.com/questions/37602893/github-search-limit-results#comment85767535_37639739

Output example

  • owner string e.g. apify
  • name string e.g. apify-js
  • url string e.g. https://github.com/apify/apify-js
  • fork boolean
  • description string e.g. Apify SDK — The scalable web scraping and crawling library for JavaScript/Node.js
  • created_at undefined e.g. 2012-01-19T01:58:17Z
  • updated_at undefined e.g. 2020-11-03T04:16:58Z
  • pushed_at undefined e.g. 2020-10-31T16:21:04Z
  • homepage string e.g. https://sdk.apify.com/
  • size number e.g. 80509
  • stars number e.g. 42034
  • open_issues number e.g. 144
  • forks number e.g. 5140
  • language string e.g. JavaScript
  • archived boolean
  • disabled boolean

Industries

See how GitHub repos search scraper is used in industries around the world