Pricing

Pay per usage

GitHub Trending — CSV Stars, Topics by Period, No Token

20 runs. GitHub Trending repos in CSV/JSON — owner, name, url, language, stars, topics. Daily/weekly/monthly + lang filter, no token. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For OSS scouting + VC dealflow. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Actor stats

Bookmarked

Total users

Monthly active users

23 days ago

Last modified

What you actually get (verified against `src/main.js`)

The actor runs two extraction modes depending on input — TRENDING (the trending page) and SEARCH (the search results page). Each pushes one record per repo with slightly different fields.

{
  "owner": "vercel",
  "name": "ai",
  "fullName": "vercel/ai",
  "url": "https://github.com/vercel/ai",
  "description": "The AI Toolkit for TypeScript",
  "language": "TypeScript",
  "stars": 18400,
  "forks": 2370,
  "periodStars": 1840,
  "trendingPeriod": "daily",
  "contributors": ["jaredpalmer", "shuding", "leerob"],
  "scrapedAt": "2026-04-29T12:00:00.000Z"
}

periodStars = stars gained inside the trending window (daily / weekly / monthly), parsed from the float-right counter on the trending page. contributors is the up-to-5 visible avatars on the row (the "Built by" cluster).

SEARCH records (10 fields per repo)

{
  "owner": "openai",
  "name": "openai-cookbook",
  "fullName": "openai/openai-cookbook",
  "url": "https://github.com/openai/openai-cookbook",
  "description": "Examples and guides for using the OpenAI API",
  "language": "Jupyter Notebook",
  "stars": 64200,
  "topics": ["openai", "gpt-4", "examples"],
  "updatedAt": "2026-04-28T09:11:00Z",
  "scrapedAt": "2026-04-29T12:00:00.000Z"
}

SEARCH records include topics[] (parsed from .topic-tag) and updatedAt (from the <relative-time> element), and walk the ?p=2,3,... pagination chain via the rel="next" link.

Honest disclosure on what's NOT extracted: there is no developers-tab scrape, no trending-rank field, no language-color, no license, no sponsorship flag, and no separate popular_repo for trending developers. Earlier README versions claimed those — they don't exist in the code.

Input

Parameter	Type	Default	Description
`scrapeTrending`	boolean	`true`	If true, fetch the trending page(s).
`trendingPeriod`	string	`"daily"`	One of `daily`, `weekly`, `monthly`. Single window per run.
`languages`	array	`[]`	Language slugs (`python`, `typescript`, `rust`, `c++`, `jupyter-notebook`...). Empty array = all-languages trending page.
`searchQueries`	array	`[]`	Keywords searched against `github.com/search?type=repositories&sort=stars`. SEARCH-mode pagination follows.
`maxReposPerSource`	number	`50`	DEAD PARAMETER — destructured in main.js but never read. The `.each()` row iterators have no cap. Only effective limit is the crawler-level `maxRequestsPerCrawl: 200` (HTTP requests, not repos). To enforce a real per-source cap, request a custom build.

How it works

TRENDING URLs: https://github.com/trending/{language}?since={trendingPeriod} (or https://github.com/trending?since={trendingPeriod} for all-languages).
SEARCH URLs: https://github.com/search?q={query}&type=repositories&sort=stars.
CheerioCrawler with maxConcurrency=5, maxRequestsPerCrawl=200, requestHandlerTimeoutSecs=30.
Default fallback: if scrapeTrending=false AND searchQueries=[], the actor pulls daily-all-languages trending so a no-input run still produces data.

Honest limitations (read before bulk runs)

maxReposPerSource is a DEAD PARAMETER — destructured at the top of main.js but never referenced again. Row iterators ($('article.Box-row, .Box-row').each(...) and the SEARCH equivalent) emit ALL parsed rows regardless of this setting. The actual ceiling is maxRequestsPerCrawl: 200 HTTP requests on the crawler config — NOT a repo count. For a real per-source cap, request a custom build.
Search pagination is unbounded (subject only to the crawler's 200-request ceiling). A broad query can chase rel="next" links into hundreds of pages until the crawler limit kicks in. Workaround: use specific queries; if you need explicit per-query caps, request a custom build.
TRENDING ≈ 25 rows per page; SEARCH ≈ 10–25 per page. With 5 languages × ~25 rows = ~125 TRENDING records. Add searches and the global 200-request ceiling caps total HTTP fetches.
GitHub HTML changes — the trending page is stable; SEARCH selectors change more often when GitHub re-skins. The actor uses redundant selectors for both modes; a true rename means a same-week patch.
No proxy. Direct CheerioCrawler fetches from Apify worker IP. GitHub does rate-limit unauthenticated scrapers; with maxConcurrency=5 against trending pages this is rarely a problem, but high-volume search runs can hit 429. Crawlee's default retry handles transient errors (3 attempts).
No GitHub authentication. This actor scrapes public HTML, NOT the REST/GraphQL API. Auth would require a different code path entirely (custom build).
description may be empty for repos without a description set on GitHub.
updatedAt (SEARCH only) may be null if the <relative-time> element isn't surfaced in the row.
periodStars only on TRENDING records. SEARCH records do NOT include periodStars, forks, or contributors. Don't write code that assumes both modes share a schema.
contributors is up to 5 visible avatars from the "Built by" cluster on the trending row — alt-text only (handles, not commit counts).
GitHub Enterprise is not supported — no /trending page.

Python integration

from apify_client import ApifyClient
from collections import defaultdict

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("knotless_cadence/github-trending-scraper").call(
    run_input={
        "scrapeTrending": True,
        "trendingPeriod": "weekly",
        "languages": ["python", "typescript", "rust"],
        "maxReposPerSource": 25,
    }
)
repos = list(client.dataset(run["defaultDatasetId"]).iterate_items())

by_lang = defaultdict(list)
for r in repos:
    by_lang[r.get("language") or "(unknown)"].append(r)

for lang, rs in by_lang.items():
    rs.sort(key=lambda r: r.get("periodStars", 0), reverse=True)
    print(f"\n=== {lang} — top this week ===")
    for r in rs[:5]:
        print(f"  +{r.get('periodStars', 0):>5} stars  {r['fullName']:40s}  {r.get('description','')[:70]}")

Trend-tracking pattern (daily cron → S3)

Append daily snapshots; compute breakout-velocity client-side (today's periodStars ÷ trailing 7-day median):

curl -s -X POST \
  "https://api.apify.com/v2/acts/knotless_cadence~github-trending-scraper/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"scrapeTrending":true,"trendingPeriod":"daily","maxReposPerSource":25}' \
  > "daily_$(date +%Y-%m-%d).json"

A repo whose periodStars is >2× its 7-day median is a useful breakout signal. Combine with forks/stars ratio to filter out star-spam.

Common questions

Q: Why not use the GitHub REST API directly? A: GitHub has no /trending API endpoint — the trending page is frontend-only. This actor scrapes it. For other repo data (stars, forks, contributors over time), use the GitHub REST/GraphQL API directly or pair with GitHub Profile Scraper.

Q: Can I get stars-over-time for a specific repo? A: Not from this actor. Combine with star-history.com for historical counts, or build a custom snapshot pipeline (see Custom scraping below).

Q: How reliable is the scrape? A: The trending page HTML has been stable for years. The actor uses redundant selectors (article.Box-row, .Box-row) and falls back to alternate stars / language extractors. SEARCH selectors are more fragile because GitHub re-skins search more often — when a selector breaks, expect a same-week patch.

Q: Does this work for GitHub Enterprise? A: No — GitHub Enterprise has no public trending page. Actor works against github.com only.

Q: Frequency recommendation? A: Daily is the sweet spot — the trending window refreshes every 24h. Running hourly won't give you more signal, only more compute cost.

Q: Cost per run? A: Apify compute-unit pricing. A daily sweep across 5 languages with maxReposPerSource=25 is typically a few cents.

Export integrations

CSV / JSON / Excel / HTML (native Apify dataset download)
Google Sheets (via Apify integration)
Webhooks on each run
S3 / GCS direct sync
Zapier / Make.com / n8n

Source	Actor	Data
GitHub Trending (this)	Trending repos	Open-source signal
GitHub Profile Scraper	Developer profiles + repos	Recruiting / OSS-strategy
GitHub Issues Scraper	Issues / PRs	Project-health diagnostics
NPM Package Scraper	Package metadata	JS ecosystem
Hacker News Scraper	Tech discussion	Tech discourse
Public APIs Directory	Free API catalog	API discovery
arXiv Paper Scraper	Research papers	Academic

All 31 published actors free to inspect on Apify Store.

Custom scraping — pilot tiers

Need a developers-tab scrape, breakout-repo alert pipeline, multi-window aggregator, or competitor-OSS dashboard? Three tiers:

Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a one-off "top trending in $LANGUAGE this month" report.
Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most DevRel and competitor-OSS projects fit here.
Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (daily breakout-detection rollup, language-cohort competitor tracking, multi-source enrichment with Profile + Issues + NPM).

Email: spinov001@gmail.com — drop the language list and the schema you need; quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online

Disclaimer

Scrapes the publicly accessible github.com/trending and github.com/search pages. maxConcurrency=5 keeps the request rate polite. Not affiliated with GitHub, Inc. or Microsoft Corporation.

Honest disclosure: TRENDING records have 12 fields, SEARCH records have 10 fields (periodStars/forks/contributors are TRENDING-only; topics/updatedAt are SEARCH-only). No developers-tab scrape, no trending_rank, no license, no primary_color, no has_sponsorship. maxReposPerSource is currently a DEAD PARAMETER — destructured but never used; only maxRequestsPerCrawl: 200 HTTP-request ceiling enforces. Search-query pagination is unbounded subject to that ceiling. No proxy, no GitHub auth, scrapes public HTML only.

GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk

knotless_cadence/github-profile-scraper

21 runs. GitHub user intel in CSV/JSON — repos, stars, followers, contribs, languages, bio, email. No API token, no rate blocks. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For recruiter outreach + talent mapping. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Google Maps Scraper — Reviews, Contacts & Leads [No API Key]

knotless_cadence/google-maps-scraper-pro

18 runs. Google Maps: name, address, phone, site, category, rating, reviews, hours, GPS, place-ID. CSV/JSON, no key. Local-biz prospecting + competitor scout + territory mapping. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Product Hunt Scraper — Launches, Upvotes, Makers, CSV, Daily

knotless_cadence/product-hunt-scraper

Product Hunt intel as JSON/CSV — 11 fields/post (name, tagline, votes, comments, makers, topics, url, createdAt + 3 more). No API key (HTML+JSON parse). 16 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For VC dealflow + launch intel. spinov001@gmail.com · t.me/scraping_ai

Alex

IMDb Scraper — Ratings, Cast, Genres, JSON/CSV, No Key

knotless_cadence/imdb-movie-scraper

16 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. IMDb titles in JSON/CSV — title, imdbId, type, genres, actors, directors, rating. Bulk by ID or search. No API key. For streaming intel + licensing + recommender training. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Website Screenshot — Full Pages, Any Resolution, PNG, No Limits

knotless_cadence/website-screenshot-scraper

20 runs. Website screenshots as PNG/JPG/PDF in 2 min — full-page, desktop + mobile, custom viewport, bulk URL input. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For competitor visual tracking + UX research. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Yelp Scraper — Reviews, Ratings, Contacts, CSV, No API Key

knotless_cadence/yelp-business-scraper

Yelp business leads CSV/JSON — name, address, phone, website, rating, reviews, categories by keyword+city. No paid API, no copy-paste. 17 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For local-biz prospecting + SMB lead-gen. spinov001@gmail.com · blog.spinov.online

Alex

Tech Stack Detector — Frameworks, CMS, Analytics, JSON Out

knotless_cadence/website-tech-stack-detector

Competitor tech stack as CSV/JSON in 2 min — frameworks, CMS, analytics, CDN, servers, trackers. No Wappalyzer seat fee, no BuiltWith cap. 19 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

MCP Trend Detector — Market Trend Signals, JSON, No API Key

knotless_cadence/mcp-trend-detector

Trending topics across Reddit/HN/Google News in real time. MCP-native for Claude/ChatGPT agents. Backed by 971-run Trustpilot flagship · 32 public actors · 79-actor portfolio · paid work live: dev.to/0012303. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

ArXiv Paper Scraper — Search by Category, Bulk JSON, DOI

knotless_cadence/arxiv-paper-scraper

arXiv corpus as JSON — arxivId, title, authors, abstract, categories, dates, DOI, PDF URL. By search OR category. Built for ML/AI training data + lit reviews. 19 runs. Backed by 951-run Trustpilot flagship + 31-actor portfolio. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Walmart Reviews Scraper — Product Reviews to CSV/JSON in 2 min

knotless_cadence/walmart-reviews-scraper

25 runs / u7d=1 fresh signal. Backed by 971-run Trustpilot flagship + 32-actor portfolio (2190 lifetime runs). Walmart reviews → CSV/JSON. Bypasses 100-review UI cap. 17 fields: stars, text, author, date, helpful, images. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex