MCP Server Directory Scraper: MCP Discovery Export
Pricing
$10.00/month + usage
MCP Server Directory Scraper: MCP Discovery Export
Aggregate public MCP server, ChatGPT app, and Claude connector listings from Glama.ai, PulseMCP, mcp.so, and MCP App Store into one deduplicated dataset.
Pricing
$10.00/month + usage
Rating
0.0
(0)
Developer
Kris Jensen
Maintained by CommunityActor stats
0
Bookmarked
4
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Aggregate public MCP (Model Context Protocol) server, ChatGPT app, and Claude connector listings from Glama.ai, PulseMCP, mcp.so, and MCP App Store into one deduplicated dataset. Use it to shortlist servers, compare sources, collect GitHub URLs, and optionally add quality signals from GitHub and npm.
Part of the AI Directory Scraper Suite — see also TAAFT Scraper, Futurepedia Scraper, and TopAI.tools Scraper.
What does MCP Server Directory Scraper do?
This Actor scrapes public MCP ecosystem listings from four major sources:
- Glama.ai — servers with structured JSON-LD data, tools, categories, and FAQ
- PulseMCP — servers with author attribution and related server links
- mcp.so — servers with GitHub URLs, key features, and innovation flags
- MCP App Store — ChatGPT apps, Claude connectors, remote MCP surfaces, auth/transport metadata, and connect links
It extracts listing metadata, deduplicates entries that appear across multiple sources, and merges data from all sources into enriched, unified records.
These sources are changing quickly. On 2026-05-20, Glama showed 23,963 servers and PulseMCP showed 15,445 servers. On 2026-05-28, MCP App Store exposed 620 public app/connector listings. Use maxServers to control the run size; deduped output depends on source availability, search terms, and the limit you choose.
Why use this Actor?
- Unified dataset: Cross-reference MCP servers, ChatGPT apps, and Claude connectors across four sources in one API call
- Deduplication: Automatically detects duplicate servers and merges their data
- Optional quality scoring: Composite 0–100 score per server based on GitHub activity, documentation, and completeness
- Enriched records: Combines tools, categories, descriptions, and metadata from all sources
- 4 flexible modes: Scrape everything, target one source, search by keyword, or provide specific URLs
- Production-ready: Residential proxy support, retry logic, and human-friendly error messages
- Part of a suite: Combine with our Futurepedia and TAAFT scrapers for AI tool and MCP ecosystem research
Best first runs
MCP directories are noisy and move quickly. The fastest way to get value is a focused shortlist:
| Use case | Mode | Suggested input |
|---|---|---|
| Browser automation servers | search | browser, maxServers: 25, deduplicate: true |
| GitHub/data servers | search | github or database, maxServers: 25, deduplicate: true |
| Single-source comparison | source | glama, maxServers: 50 |
| ChatGPT/Claude app scan | source | mcpapp, maxServers: 50 |
| Quality-filtered shortlist | all | maxServers: 50, computeQualityScores: true |
For quality scoring at scale, add a GitHub token as a secret environment variable so the run does not stall on GitHub's unauthenticated rate limit.
Quality Scoring
MCP Server Directory Scraper can compute a composite quality score (0–100) for every server it extracts. Use this to filter noisy MCP listings into a smaller shortlist for manual review.
What quality scoring measures
Each server is scored across four dimensions:
| Dimension | Max Points | Signals used |
|---|---|---|
| Popularity | 25 | GitHub stars, GitHub forks |
| Activity | 30 | Days since last commit, not archived, not a fork |
| Documentation | 25 | Has description, FAQ entries, categories, tools list |
| Completeness | 20 | GitHub URL present, npm URL present, author attribution |
| npm bonus | +10 | Weekly npm download volume (capped at 10 points) |
Additional fields exported alongside the score:
githubStars,githubForks— raw popularity signalsdaysSinceLastCommit— days since the last commit (lower = more active)isArchived— whether the repository has been archived (abandoned)isFork— whether the repo is a fork rather than an original projecthasLicense,licenseType— open-source license presence and typenpmWeeklyDownloads— weekly installs from npm (if the server is published as a package)
Servers without a GitHub URL receive a qualityScore of 0 and are still included in results.
How to enable quality scoring
Set computeQualityScores to true in the actor input. Each server requires up to 6 GitHub API calls to compute its score.
A GitHub personal access token is strongly recommended. Without one, the GitHub API is rate-limited to 60 requests/hour, which limits quality scoring to roughly 10 servers/hour. With a token (no special scopes needed), the limit is 5,000 requests/hour.
For best security, set your token as a Secret environment variable named GITHUB_TOKEN in the actor's settings. The githubToken input exists for convenience, but secret environment variables are safer.
{"mode": "all","maxServers": 500,"computeQualityScores": true,"githubToken": ""}
How much does it cost?
This Actor runs on the Apify platform. Usage costs depend on how many servers you scrape:
| Servers | Approx. Cost | Time |
|---|---|---|
| 5 | ~$0.01 | ~30s |
| 50 | ~$0.05 | ~2 min |
| 500 | ~$0.50 | ~15 min |
| 5,000 | ~$5.00 | ~1 hr |
Costs are based on Apify platform compute units (CU). Actual costs may vary based on proxy usage and retry rates.
Cost warning for large scrapes: Large crawls across all sources use significant compute and proxy bandwidth. Start with a smaller maxServers value (5-50) to validate your use case before scaling up.
Input
The Actor accepts the following input parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | enum | all | Scraping mode: all, source, urls, or search |
source | enum | glama | Directory to scrape (only used with source mode) |
urls | array | [] | Specific server page URLs (only used with urls mode) |
searchQuery | string | "" | Keyword to search for (only used with search mode) |
maxServers | integer | 5 | Maximum number of servers to return |
deduplicate | boolean | true | Merge duplicate servers found across directories |
computeQualityScores | boolean | false | Enrich each server with a 0–100 quality score using GitHub and npm data |
githubToken | string | "" | GitHub personal access token for quality scoring (enables 5,000 req/hr vs. 60/hr without) |
Input modes
all mode — Scrape from all supported sources simultaneously. Results are interleaved across sources for balanced coverage. Deduplication merges servers that appear in multiple directories.
source mode — Scrape from a single source only. Choose from glama, pulsemcp, mcpso, or mcpapp. Useful when you only need data from one directory or want to compare directories independently.
urls mode — Provide specific server page URLs from any supported directory. Useful for monitoring specific servers or enriching an existing dataset. URLs from different directories can be mixed freely.
search mode — Search across all directories by keyword. Matches against URL slugs (server names in the URL path). For example, searching "puppeteer" finds all servers with "puppeteer" in their URL.
Example input
Scrape 50 servers from all directories:
{"mode": "all","maxServers": 50,"deduplicate": true}
Search for browser automation MCP servers:
{"mode": "search","searchQuery": "browser","maxServers": 10}
Scrape specific server pages:
{"mode": "urls","urls": ["https://glama.ai/mcp/servers/@anthropics/mcp-server-puppeteer","https://www.pulsemcp.com/servers/playwright-mcp-server","https://mcp.so/server/puppeteer/anthropics","https://mcpapp.net/app/github"]}
Only scrape from Glama.ai:
{"mode": "source","source": "mcpapp","maxServers": 100}
Output
Each server record contains the following fields:
| Field | Type | Description |
|---|---|---|
name | string | Server name |
author | string | Author/organization name |
authorUrl | string | Author profile URL |
description | string | Full description (up to 1,000 chars) |
shortDescription | string | Brief description from meta tags |
slug | string | URL slug identifier |
githubUrl | string | GitHub repository URL |
npmUrl | string | npm package URL |
serverUrl | string | Server homepage URL |
appType | string | Listing type when available, such as app, connector, or server |
platformSurfaces | array | Where the listing is available, such as ChatGPT App or Claude Connector |
connectUrls | object | Direct connect/install URLs by platform surface |
capabilities | array | Capability labels such as Reads, Writes, Interactive, Claude, or Remote MCP |
privacyPolicyUrl | string | Privacy policy URL when published by the source |
termsUrl | string | Terms of service URL when published by the source |
supportUrl | string | Support URL or email when published by the source |
version | string | Version when published by the source |
transport | string | MCP transport when published by the source |
authType | string | Auth type when published by the source |
tools | array | List of MCP tool names exposed by the server |
categories | array | Server categories/tags (e.g., "Python", "TypeScript", "Remote") |
keyFeatures | array | Key features list |
faq | array | FAQ entries (question + answer) |
relatedServers | array | Names of related servers |
relatedApps | array | Related app/connector listings when published by the source |
relatedCollections | array | Related MCP App Store collections when published |
iconUrl | string | Server icon/logo URL |
sources | array | Which directories list this server (e.g., ["glama", "mcpso"]) |
sourceUrls | object | URL for this server in each directory |
qualityScore | number | Composite quality score 0–100 (when computeQualityScores is enabled) |
qualityBreakdown | object | Score breakdown by dimension (popularity, activity, documentation, completeness) |
qualityNote | string | Human-readable quality summary |
githubStars | integer | GitHub star count |
githubForks | integer | GitHub fork count |
daysSinceLastCommit | integer | Days since last commit |
isArchived | boolean | Whether the repository is archived |
isFork | boolean | Whether the repository is a fork |
hasLicense | boolean | Whether a license file is present |
licenseType | string | License name (MIT, Apache-2.0, etc.) |
npmWeeklyDownloads | integer | Weekly npm installs |
scrapedAt | string | ISO timestamp of when data was collected |
Not all fields are available from every directory. Fields are null when the source directory does not provide that data. Quality score fields are null when computeQualityScores is false.
Example output
{"name": "Puppeteer MCP Server","author": "anthropics","authorUrl": "https://github.com/anthropics","description": "A Model Context Protocol server that provides browser automation capabilities using Puppeteer...","slug": "@anthropics/mcp-server-puppeteer","githubUrl": "https://github.com/anthropics/mcp-server-puppeteer","tools": ["puppeteer_navigate", "puppeteer_screenshot", "puppeteer_click", "puppeteer_fill", "puppeteer_evaluate"],"categories": ["TypeScript", "Remote"],"sources": ["glama", "mcpso", "pulsemcp"],"sourceUrls": {"glama": "https://glama.ai/mcp/servers/@anthropics/mcp-server-puppeteer","mcpso": "https://mcp.so/server/puppeteer/anthropics","pulsemcp": "https://www.pulsemcp.com/servers/puppeteer-mcp-server"},"qualityScore": 82,"qualityBreakdown": { "popularity": 22, "activity": 28, "documentation": 20, "completeness": 12 },"qualityNote": "High quality: active repo, well-documented, 3,400+ stars","githubStars": 3421,"githubForks": 298,"daysSinceLastCommit": 14,"isArchived": false,"isFork": false,"hasLicense": true,"licenseType": "MIT","npmWeeklyDownloads": 8200,"scrapedAt": "2026-02-18T03:28:00.000Z"}
Use cases
- MCP ecosystem research: Analyze the MCP server landscape — track growth, identify categories, find trending servers
- Quality filtering: Use quality fields to shortlist servers for manual review
- Competitive analysis: Monitor competing MCP servers across all major directories
- Directory building: Build your own MCP directory or comparison tool using pre-aggregated data
- Server discovery: Find MCP servers by category, author, keyword, or technology stack
- Data enrichment: Cross-reference servers across directories for the most complete profiles
- Change monitoring: Track specific servers with scheduled runs to detect updates
- AI ecosystem mapping: Combine with Futurepedia Scraper and TAAFT Scraper for AI tool + MCP server research
Supported directories
| Directory | URL | Servers | Unique data |
|---|---|---|---|
| Glama.ai | glama.ai | 23,963 shown 2026-05-20 | Tools, categories, FAQ, JSON-LD structured data |
| PulseMCP | pulsemcp.com | 15,445 shown 2026-05-20 | Author info, related servers, server.json detection |
| mcp.so | mcp.so | public sitemap based | Key features, innovation flags, DXT flags, icons |
| MCP App Store | mcpapp.net | 620 public app/connector listings found 2026-05-28 | ChatGPT/Claude surfaces, connect links, auth, transport, support URLs |
Data availability by source
| Field | Glama | PulseMCP | mcp.so | MCP App Store |
|---|---|---|---|---|
| Name | Yes | Yes | Yes | Yes |
| Description | Yes | Yes | Yes | Yes |
| Author | Yes | Yes | Yes | Yes |
| GitHub URL | Yes | Yes | Yes | Sometimes |
| Website URL | Yes | No | No | Yes |
| Tools | Yes | No | Yes | Sometimes |
| Categories | Yes | Yes | No | Yes |
| FAQ | Yes | No | No | Yes |
| Related servers/apps | No | Yes | No | Yes |
| Platform surfaces | No | No | No | Yes |
| Auth/transport | No | No | No | Yes |
| Key features | No | No | Yes | No |
| Icon URL | No | No | Yes | Yes |
Integrations
This Actor works with standard Apify integrations:
- API — Call via REST API or Apify client libraries (Python, JavaScript, Go)
- Webhooks — Get notified when scraping completes
- Scheduler — Run on a schedule for fresh data (daily, weekly, etc.)
- Zapier / Make — Connect to 1,000+ apps for automated workflows
- Google Sheets — Export directly to spreadsheets
- Slack / Email — Send notifications when new servers are found
- Datasets — Browse and download results in JSON, CSV, XML, or Excel
Tips and best practices
- Start small: Begin with
maxServers: 5-50to validate output format and quality before scaling up. - Use search mode for targeted data collection instead of scraping everything. It's faster and cheaper.
- Enable deduplication (default: on) when scraping from all sources to avoid duplicate records.
- Schedule weekly runs to maintain a fresh dataset. MCP directories add new servers daily.
- For quality scoring at scale: Always provide a
githubToken— without it, the 60 req/hr GitHub rate limit will throttle scoring to ~10 servers/hr. - Combine with our other scrapers: Use alongside Futurepedia Scraper and TAAFT Scraper to build a comprehensive AI ecosystem dataset covering both AI tools and MCP servers.
FAQ
Q: How fresh is the data? A: Data is scraped in real-time from all supported sources when the Actor runs. There is no caching.
Q: Can I scrape every server?
A: Use a high maxServers value only after a small validation run. The source directories are large and changing, so full crawls can take a few hours and may cost materially more than a focused search or source run.
Q: Why are some fields null? A: Not all directories provide the same data. For example, Glama provides structured FAQ data, mcp.so provides innovation flags, PulseMCP provides related servers, and MCP App Store provides ChatGPT/Claude surface metadata. See the "Data availability by source" table above.
Q: How does deduplication work?
A: The Actor matches servers across directories by GitHub URL (primary) or normalized name + author (fallback). When duplicates are found, it merges data from all sources, keeping the richest value for each field. The sources array shows which directories had a listing for each server.
Q: Why does my quality score run very slowly without a GitHub token?
A: Without a GitHub personal access token, the GitHub API allows only 60 requests per hour. Each server scored needs up to 6 API calls, so the unauthenticated rate limits scoring to roughly 10 servers per hour. Add a token (Settings → Environment Variables → GITHUB_TOKEN) to raise the limit to 5,000 req/hr.
Q: What proxies does this Actor use? A: It uses Apify residential proxies with automatic fallback to datacenter proxies. This helps keep access reliable across supported sources.
Q: How accurate are the extracted tool names? A: Tool names are parsed from page structure where the source exposes them. Accuracy is high for well-structured server pages, but some servers/apps may have incomplete or missing tool data. PulseMCP does not expose tool information.
Q: What are the "sources" and "sourceUrls" fields?
A: sources is an array showing which directories list this server (e.g., ["glama", "mcpso"]). sourceUrls is an object mapping each source to the specific URL for that server on that directory. This lets you verify data against the original listings.
Related Actors
- Futurepedia Scraper — Export Futurepedia AI tool data
- TopAI.tools Scraper — Export TopAI.tools listings and categories
- TAAFT Scraper — Scrape 6,200+ AI tools from TheresAnAIForThat.com
Changelog
- 1.2 — Added MCP App Store as a fourth source with ChatGPT app, Claude connector, auth, transport, surface, connect URL, support URL, related app, and collection metadata.
- 1.1 — Added quality scoring: composite 0–100 score per server using GitHub stars, forks, commit recency, license status, and npm weekly downloads. Updated first-run guidance and current source-count language.
- 1.0 — Initial release with support for Glama.ai, PulseMCP, and mcp.so. Includes 4 scraping modes, cross-directory deduplication, and balanced source interleaving.


