🦊 GitLab Scraper β€” Projects & Repository Data avatar

🦊 GitLab Scraper β€” Projects & Repository Data

Pricing

from $20.00 / 1,000 results

Go to Apify Store
🦊 GitLab Scraper β€” Projects & Repository Data

🦊 GitLab Scraper β€” Projects & Repository Data

Extract GitLab project data β€” stars, forks, issues, merge requests, contributor stats. GitHub Stats, Sourcegraph & OpenHub alternative for dev analytics, OSS intelligence and engineering dashboards. Pay per project, no token needed.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

Stephan Corbeil

Stephan Corbeil

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

🦊 GitLab Scraper β€” Projects, Merge Requests & Issues vs GitLab API

Pay-per-result GitLab scraper β€” extracts full project metadata, stars, forks, commit cadence, merge-request stats, issue counts, and CI/CD pipeline history from public GitLab.com and self-hosted GitLab instances. Built for OSS scouts, dev-tool competitive intel, and procurement researchers as a no-token-rotation alternative to the official GitLab REST API (10 req/sec authenticated), GraphQL API (point-budget), Sourcegraph Cloud ($299-1000+/mo), and SaaS aggregators like LinearB ($25-79/user/mo) and Velocity by Code Climate ($499+/mo).

Why GitLab Scraper Beats the GitLab API, Sourcegraph & LinearB

FeatureNexGenData GitLab ScraperGitLab official APISourcegraph CloudLinearB
Cost$0.002 / project, pay-per-resultFree + rate-limited$299-1000+ / month$25-79 / user / month
Rate limitNone for end user10 req/sec authenticatedPlan-dependentPlan-dependent
Self-hosted GitLab supportYes β€” set instance URLYes (with PAT)Yes (with PAT)Yes (with PAT)
AuthApify token + optional GitLab PATGitLab PAT required for most opsAccount + PATAccount + PAT
Bulk exportDirect dataset β†’ JSON/CSV/ExcelPer-call REST + paginationUI + limited APIAPI
Cross-org / cross-instance scanYesNo (instance-scoped)YesYes
Free trialFree Apify creditsFree for low volume30-day trial30-day trial

OSS scouts, devtool marketers, and procurement researchers pick this actor instead of building their own GitLab PAT-rotation rig because at >2000 projects the official API's per-second cap forces multi-hour scans. It is a drop-in alternative to Sourcegraph Cloud for "I just need GitLab project metadata for my BI tool" β€” the bare data without the UI overhead.

What You Get Per Project

Each dataset item is a flat JSON record:

  • id, namespace, path, web_url, description
  • stars, forks, open_issues, closed_issues, open_mrs, merged_mrs
  • primary_language, language_breakdown
  • topics, license, default_branch, visibility
  • created_at, last_activity_at, last_release_at
  • commits_last_90d, unique_committers_last_90d
  • top_committers β€” array of {username, name, commits}
  • merge_request_stats β€” average lead time, throughput, review latency
  • pipeline_success_rate_90d, last_pipeline_status
  • readme_text, archived, mirror, forked_from

Use Cases

  • OSS scouts β€” find growing GitLab-hosted projects that don't show up on GitHub trending
  • Devtool marketers β€” discover teams running self-hosted GitLab for outbound targeting
  • DevOps procurement β€” benchmark a target company's CI velocity before pitching them
  • Migration consultancies β€” quantify the scope of a GitLab β†’ GitHub migration RFP
  • Open-source maintainers β€” track fork activity and downstream contributions across instances
  • Investor diligence β€” verify "X projects on GitLab" claims with raw data
  • Internal-platform teams β€” audit your own org's GitLab health across thousands of repos

Quick Start

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("nexgendata/gitlab-scraper").call(run_input={
"instance": "https://gitlab.com",
"search": "kubernetes",
"minStars": 10,
"maxResults": 1000,
"includeMergeRequestStats": True
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["path"], item["stars"], item["commits_last_90d"])

Pricing

Pay-per-event β€” no PAT-rotation rig required, no monthly minimum.

  • Actor Start: $0.0001
  • Per project: $0.002

A 1000-project search costs about $2. Equivalent GitLab-API rate-respecting scan takes 2+ hours of PAT juggling.

Use caseActor
GitHub repos + stars + contributorsGitHub Scraper
Daily / weekly trending reposGitHub Trending Scraper
Deep stargazer-history analyticsGitHub Repo Stats
Docker Hub image pull countsDocker Hub Scraper
Dev.to articles & dev audienceDev.to Scraper
Developer tools MCP serverDeveloper Tools MCP Server
Company tech-stack detectorCompany Tech Stack Detector
StackOverflow Q&A trendsStackOverflow Questions

FAQ

Q: Does this work with self-hosted GitLab? Yes β€” set instance: "https://gitlab.yourcompany.com" and supply a PAT in gitlabToken. The actor uses the same API surface across self-hosted and gitlab.com.

Q: Does this need a GitLab PAT? Only for self-hosted, private projects, or higher request volumes. Public gitlab.com searches work without one.

Q: How does this compare to the GitLab GraphQL API? GraphQL gives you per-call efficiency but ties you to a point budget. This actor flattens the data, runs the multi-call dance for you, and returns one row per project.

Q: Can I get pipeline history? Yes β€” includePipelineHistory: true adds the last 90 days of pipeline runs per project.

Q: Schema stability? Field names are versioned per actor release. We track GitLab major API versions.

Q: What about merge-request review-latency metrics? merge_request_stats includes avg_lead_time_hours, avg_review_latency_hours, and throughput_per_week. These are computed from the public MR events stream.

About NexGenData

NexGenData publishes 260+ buyer-intent actors covering SEC filings, YC alumni, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, and more. All pay-per-result. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b


How NexGenData Pricing Works

Every NexGenData actor uses pay-per-event pricing β€” you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.

  • Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
  • Result / item: charged per item written to the default dataset
  • No charge for retries, internal proxy rotation, or failed sub-requests β€” those are absorbed by the platform

Apify Platform Bonus

New to Apify? Sign up with the NexGenData referral link β€” you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.

Integration Surface

Every actor in the NexGenData catalog can be triggered from:

  • Apify console β€” point-and-click run
  • Apify API β€” REST + webhooks
  • Apify Python / JS SDKs β€” programmatic batch
  • Zapier, Make.com, n8n β€” official integrations
  • MCP β€” many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
  • Schedules β€” built-in cron for daily / weekly / monthly runs
  • Webhooks β€” POST results to any HTTPS endpoint on dataset write

Support

NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome β€” high-demand features ship in the next version.

Home: thenextgennexus.com Full catalog: apify.com/nexgendata