GitHub Repository & Trending Scraper avatar

GitHub Repository & Trending Scraper

Pricing

$6.99/month + usage

Go to Apify Store
GitHub Repository & Trending Scraper

GitHub Repository & Trending Scraper

GitHub Repository & Trending Scraper extracts repository data from GitHub in two powerful modes: Search (by keyword, language, stars) and Trending (daily/weekly/monthly popular repos)

Pricing

$6.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share


GitHub Repository & Trending Scraper


A powerful GitHub scraper that extracts repository data from GitHub – either by search (keywords, language, stars range) or by fetching trending repositories (daily/weekly/monthly). Get clean, structured JSON output perfect for developer tools, analytics dashboards, research, or building your own GitHub insights platform.

πŸ‘‰ Focus keyword: github scraper – built for flexibility, speed, and reliability.


✨ Features

  • πŸ” Search mode – Search repositories by keyword, language, stars, forks, etc. (supports GitHub search syntax).
  • πŸ“ˆ Trending mode – Fetch trending repositories for any language (daily, weekly, monthly).
  • πŸ“Š Rich repository data – Extracts name, description, URL, stars, forks, watchers, language, topics, license, dates, owner info, and even the README URL.
  • 🌐 Proxy support – Built‑in Apify proxy with residential groups to avoid rate limiting.
  • πŸ“¦ Scalable – Set max_results from 1 to 1000+ (respects GitHub pagination).
  • 🧹 Clean output – Consistent schema with null values where data is missing.
  • ⚑ Fast & efficient – Uses concurrency and request queuing to maximise throughput.
  • πŸ”Œ Easy integration – Run on Apify platform or use as a Node.js module.

πŸš€ How It Works

  1. Choose mode – search (custom query) or trending (pre‑defined trending lists).
  2. Provide parameters – For search: query, language, sort, order. For trending: language, period.
  3. Scraping process – The actor sends requests to GitHub, parses repository data, and follows pagination.
  4. Proxy rotation – For large jobs, residential proxies help avoid IP‑based blocks.
  5. Output – Returns a clean JSON array of repository objects.

πŸ“₯ Input Schema

The actor accepts the following input fields. Depending on mode, different fields are required.

FieldTypeDefaultDescription
modeString"search"Either "search" or "trending".
queryString""(Search mode) GitHub search query (e.g., "machine learning"). Supports advanced syntax.
languageString""(Search or trending) Filter by programming language (e.g., "python", "javascript").
sortString"stars"(Search mode) Sort by: "stars", "forks", "updated".
orderString"desc"(Search mode) "desc" or "asc".
periodString"daily"(Trending mode) "daily", "weekly", or "monthly".
max_resultsInteger25Maximum number of repositories to return (1–1000).
proxyConfigurationObject{ "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }Proxy settings. See Proxy Configuration.
include_readmeBooleanfalseIf true, includes the raw README URL (always included, but this flag is for internal use).

Example Input (Search Mode)

{
"mode": "search",
"query": "machine learning",
"language": "python",
"sort": "stars",
"order": "desc",
"max_results": 20,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
{
"mode": "trending",
"language": "javascript",
"period": "weekly",
"max_results": 30
}

πŸ“€ Output Format

The actor returns an array of repository objects. Each object contains the following fields:

FieldTypeDescription
typeStringAlways "repository".
nameStringFull repository name (e.g., "owner/repo").
descriptionString or nullRepository description.
urlStringGitHub URL of the repository.
starsIntegerNumber of stars.
forksIntegerNumber of forks.
watchersIntegerNumber of watchers (usually same as stars).
languageString or nullPrimary programming language.
topicsArray of stringsRepository topics/tags.
licenseString or nullLicense name.
created_atStringCreation date (YYYY-MM-DD).
updated_atStringLast update date.
pushed_atStringLast push date.
open_issuesIntegerNumber of open issues.
size_kbIntegerRepository size in KB.
default_branchStringDefault branch name.
is_forkBooleanWhether the repo is a fork.
homepageString or nullProject homepage URL.
ownerStringOwner login.
owner_typeString"User" or "Organization".
readme_urlString or nullRaw URL of the README file (if exists).

Example Output (Truncated)

[
{
"type": "repository",
"name": "codecrafters-io/build-your-own-x",
"description": "Master programming by recreating your favorite technologies from scratch.",
"url": "https://github.com/codecrafters-io/build-your-own-x",
"stars": 475006,
"forks": 44543,
"watchers": 475006,
"language": "Markdown",
"topics": ["awesome-list", "free", "programming", "tutorial-code"],
"license": null,
"created_at": "2018-05-09",
"updated_at": "2026-03-14",
"pushed_at": "2026-02-21",
"open_issues": 442,
"size_kb": 1201,
"default_branch": "master",
"is_fork": false,
"homepage": "https://codecrafters.io",
"owner": "codecrafters-io",
"owner_type": "Organization",
"readme_url": "https://raw.githubusercontent.com/codecrafters-io/build-your-own-x/master/README.md"
},
{
"type": "repository",
"name": "sindresorhus/awesome",
"description": "😎 Awesome lists about all kinds of interesting topics",
"url": "https://github.com/sindresorhus/awesome",
"stars": 445369,
"forks": 33527,
"watchers": 445369,
"language": null,
"topics": ["awesome", "awesome-list", "lists", "resources"],
"license": "Creative Commons Zero v1.0 Universal",
"created_at": "2014-07-11",
"updated_at": "2026-03-14",
"pushed_at": "2026-03-09",
"open_issues": 80,
"size_kb": 1534,
"default_branch": "main",
"is_fork": false,
"homepage": "",
"owner": "sindresorhus",
"owner_type": "User",
"readme_url": "https://raw.githubusercontent.com/sindresorhus/awesome/main/README.md"
}
// ... up to max_results
]

πŸ› οΈ Usage

▢️ Run on Apify Console

  1. Go to Apify Console and open the Actor page for GitHub Repository & Trending Scraper.
  2. Click "Run".
  3. Select mode and fill in the required fields.
  4. Click "Start" and wait for results.

πŸ”Œ Run via Apify API (cURL)

curl -X POST "https://api.apify.com/v2/acts/your-username~github-trending-scraper/runs?token=<YOUR_API_TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"mode": "search",
"query": "machine learning",
"language": "python",
"max_results": 10
}'

πŸ“¦ Use as a Node.js Module

Install the package:

$npm install github-trending-scraper

Then use it in your code:

const { scrapeGitHub } = require('github-trending-scraper');
(async () => {
const repos = await scrapeGitHub({
mode: 'trending',
language: 'javascript',
period: 'weekly',
max_results: 20
});
console.log(repos);
})();

🌐 Proxy Configuration

To avoid IP‑based blocking, especially for large scraping jobs, you can configure proxies. The actor integrates seamlessly with Apify Proxy.

PropertyTypeDescription
useApifyProxyBooleanIf true, enables Apify Proxy. Default: true.
apifyProxyGroupsArrayProxy groups: ["RESIDENTIAL"], ["DATACENTER"], or ["SHADER"]. Residential is recommended for GitHub.
proxyUrlsArrayCustom proxy URLs (e.g., ["http://user:pass@proxy.example.com:8080"]). Ignored if useApifyProxy is true.

Example with custom proxies:

{
"proxyConfiguration": {
"useApifyProxy": false,
"proxyUrls": ["http://user:pass@123.45.67.89:8080"]
}
}

❓ FAQ / Troubleshooting

A: Search mode uses GitHub's search API to find repositories matching your query. Trending mode scrapes the GitHub trending page (like https://github.com/trending), which shows popular repositories for a given language and time period.

Q: Can I get more than 1000 results?

A: GitHub's search API limits results to 1000. If you need more, you can run multiple queries with different filters. Trending mode is limited by what GitHub shows on the trending pages (usually around 25 per page, up to a few hundred).

Q: Why are some fields null?

A: Some repositories may not have a description, license, topics, etc. The actor sets those fields to null.

Q: I'm getting blocked / rate limited.

A: Enable residential proxies (apifyProxyGroups: ["RESIDENTIAL"]). You can also reduce max_results or add delays (not yet configurable, but planned).

Q: Can I scrape issues or pull requests?

A: Not in this version. This actor focuses on repository metadata only.