MCP Server Directory Scraper: MCP Discovery Export avatar

MCP Server Directory Scraper: MCP Discovery Export

Pricing

$10.00/month + usage

Go to Apify Store
MCP Server Directory Scraper: MCP Discovery Export

MCP Server Directory Scraper: MCP Discovery Export

Aggregate public MCP server, ChatGPT app, and Claude connector listings from Glama.ai, PulseMCP, mcp.so, and MCP App Store into one deduplicated dataset.

Pricing

$10.00/month + usage

Rating

0.0

(0)

Developer

Kris Jensen

Kris Jensen

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

1

Monthly active users

4 days ago

Last modified

Share

Aggregate public MCP (Model Context Protocol) server, ChatGPT app, and Claude connector listings from Glama.ai, PulseMCP, mcp.so, and MCP App Store into one deduplicated dataset. Use it to shortlist servers, compare sources, collect GitHub URLs, and optionally add quality signals from GitHub and npm.

Part of the AI Directory Scraper Suite — see also TAAFT Scraper, Futurepedia Scraper, and TopAI.tools Scraper.

What does MCP Server Directory Scraper do?

This Actor scrapes public MCP ecosystem listings from four major sources:

  • Glama.ai — servers with structured JSON-LD data, tools, categories, and FAQ
  • PulseMCP — servers with author attribution and related server links
  • mcp.so — servers with GitHub URLs, key features, and innovation flags
  • MCP App Store — ChatGPT apps, Claude connectors, remote MCP surfaces, auth/transport metadata, and connect links

It extracts listing metadata, deduplicates entries that appear across multiple sources, and merges data from all sources into enriched, unified records.

These sources are changing quickly. On 2026-05-20, Glama showed 23,963 servers and PulseMCP showed 15,445 servers. On 2026-05-28, MCP App Store exposed 620 public app/connector listings. Use maxServers to control the run size; deduped output depends on source availability, search terms, and the limit you choose.

Why use this Actor?

  • Unified dataset: Cross-reference MCP servers, ChatGPT apps, and Claude connectors across four sources in one API call
  • Deduplication: Automatically detects duplicate servers and merges their data
  • Optional quality scoring: Composite 0–100 score per server based on GitHub activity, documentation, and completeness
  • Enriched records: Combines tools, categories, descriptions, and metadata from all sources
  • 4 flexible modes: Scrape everything, target one source, search by keyword, or provide specific URLs
  • Production-ready: Residential proxy support, retry logic, and human-friendly error messages
  • Part of a suite: Combine with our Futurepedia and TAAFT scrapers for AI tool and MCP ecosystem research

Best first runs

MCP directories are noisy and move quickly. The fastest way to get value is a focused shortlist:

Use caseModeSuggested input
Browser automation serverssearchbrowser, maxServers: 25, deduplicate: true
GitHub/data serverssearchgithub or database, maxServers: 25, deduplicate: true
Single-source comparisonsourceglama, maxServers: 50
ChatGPT/Claude app scansourcemcpapp, maxServers: 50
Quality-filtered shortlistallmaxServers: 50, computeQualityScores: true

For quality scoring at scale, add a GitHub token as a secret environment variable so the run does not stall on GitHub's unauthenticated rate limit.

Quality Scoring

MCP Server Directory Scraper can compute a composite quality score (0–100) for every server it extracts. Use this to filter noisy MCP listings into a smaller shortlist for manual review.

What quality scoring measures

Each server is scored across four dimensions:

DimensionMax PointsSignals used
Popularity25GitHub stars, GitHub forks
Activity30Days since last commit, not archived, not a fork
Documentation25Has description, FAQ entries, categories, tools list
Completeness20GitHub URL present, npm URL present, author attribution
npm bonus+10Weekly npm download volume (capped at 10 points)

Additional fields exported alongside the score:

  • githubStars, githubForks — raw popularity signals
  • daysSinceLastCommit — days since the last commit (lower = more active)
  • isArchived — whether the repository has been archived (abandoned)
  • isFork — whether the repo is a fork rather than an original project
  • hasLicense, licenseType — open-source license presence and type
  • npmWeeklyDownloads — weekly installs from npm (if the server is published as a package)

Servers without a GitHub URL receive a qualityScore of 0 and are still included in results.

How to enable quality scoring

Set computeQualityScores to true in the actor input. Each server requires up to 6 GitHub API calls to compute its score.

A GitHub personal access token is strongly recommended. Without one, the GitHub API is rate-limited to 60 requests/hour, which limits quality scoring to roughly 10 servers/hour. With a token (no special scopes needed), the limit is 5,000 requests/hour.

For best security, set your token as a Secret environment variable named GITHUB_TOKEN in the actor's settings. The githubToken input exists for convenience, but secret environment variables are safer.

{
"mode": "all",
"maxServers": 500,
"computeQualityScores": true,
"githubToken": ""
}

How much does it cost?

This Actor runs on the Apify platform. Usage costs depend on how many servers you scrape:

ServersApprox. CostTime
5~$0.01~30s
50~$0.05~2 min
500~$0.50~15 min
5,000~$5.00~1 hr

Costs are based on Apify platform compute units (CU). Actual costs may vary based on proxy usage and retry rates.

Cost warning for large scrapes: Large crawls across all sources use significant compute and proxy bandwidth. Start with a smaller maxServers value (5-50) to validate your use case before scaling up.

Input

The Actor accepts the following input parameters:

ParameterTypeDefaultDescription
modeenumallScraping mode: all, source, urls, or search
sourceenumglamaDirectory to scrape (only used with source mode)
urlsarray[]Specific server page URLs (only used with urls mode)
searchQuerystring""Keyword to search for (only used with search mode)
maxServersinteger5Maximum number of servers to return
deduplicatebooleantrueMerge duplicate servers found across directories
computeQualityScoresbooleanfalseEnrich each server with a 0–100 quality score using GitHub and npm data
githubTokenstring""GitHub personal access token for quality scoring (enables 5,000 req/hr vs. 60/hr without)

Input modes

all mode — Scrape from all supported sources simultaneously. Results are interleaved across sources for balanced coverage. Deduplication merges servers that appear in multiple directories.

source mode — Scrape from a single source only. Choose from glama, pulsemcp, mcpso, or mcpapp. Useful when you only need data from one directory or want to compare directories independently.

urls mode — Provide specific server page URLs from any supported directory. Useful for monitoring specific servers or enriching an existing dataset. URLs from different directories can be mixed freely.

search mode — Search across all directories by keyword. Matches against URL slugs (server names in the URL path). For example, searching "puppeteer" finds all servers with "puppeteer" in their URL.

Example input

Scrape 50 servers from all directories:

{
"mode": "all",
"maxServers": 50,
"deduplicate": true
}

Search for browser automation MCP servers:

{
"mode": "search",
"searchQuery": "browser",
"maxServers": 10
}

Scrape specific server pages:

{
"mode": "urls",
"urls": [
"https://glama.ai/mcp/servers/@anthropics/mcp-server-puppeteer",
"https://www.pulsemcp.com/servers/playwright-mcp-server",
"https://mcp.so/server/puppeteer/anthropics",
"https://mcpapp.net/app/github"
]
}

Only scrape from Glama.ai:

{
"mode": "source",
"source": "mcpapp",
"maxServers": 100
}

Output

Each server record contains the following fields:

FieldTypeDescription
namestringServer name
authorstringAuthor/organization name
authorUrlstringAuthor profile URL
descriptionstringFull description (up to 1,000 chars)
shortDescriptionstringBrief description from meta tags
slugstringURL slug identifier
githubUrlstringGitHub repository URL
npmUrlstringnpm package URL
serverUrlstringServer homepage URL
appTypestringListing type when available, such as app, connector, or server
platformSurfacesarrayWhere the listing is available, such as ChatGPT App or Claude Connector
connectUrlsobjectDirect connect/install URLs by platform surface
capabilitiesarrayCapability labels such as Reads, Writes, Interactive, Claude, or Remote MCP
privacyPolicyUrlstringPrivacy policy URL when published by the source
termsUrlstringTerms of service URL when published by the source
supportUrlstringSupport URL or email when published by the source
versionstringVersion when published by the source
transportstringMCP transport when published by the source
authTypestringAuth type when published by the source
toolsarrayList of MCP tool names exposed by the server
categoriesarrayServer categories/tags (e.g., "Python", "TypeScript", "Remote")
keyFeaturesarrayKey features list
faqarrayFAQ entries (question + answer)
relatedServersarrayNames of related servers
relatedAppsarrayRelated app/connector listings when published by the source
relatedCollectionsarrayRelated MCP App Store collections when published
iconUrlstringServer icon/logo URL
sourcesarrayWhich directories list this server (e.g., ["glama", "mcpso"])
sourceUrlsobjectURL for this server in each directory
qualityScorenumberComposite quality score 0–100 (when computeQualityScores is enabled)
qualityBreakdownobjectScore breakdown by dimension (popularity, activity, documentation, completeness)
qualityNotestringHuman-readable quality summary
githubStarsintegerGitHub star count
githubForksintegerGitHub fork count
daysSinceLastCommitintegerDays since last commit
isArchivedbooleanWhether the repository is archived
isForkbooleanWhether the repository is a fork
hasLicensebooleanWhether a license file is present
licenseTypestringLicense name (MIT, Apache-2.0, etc.)
npmWeeklyDownloadsintegerWeekly npm installs
scrapedAtstringISO timestamp of when data was collected

Not all fields are available from every directory. Fields are null when the source directory does not provide that data. Quality score fields are null when computeQualityScores is false.

Example output

{
"name": "Puppeteer MCP Server",
"author": "anthropics",
"authorUrl": "https://github.com/anthropics",
"description": "A Model Context Protocol server that provides browser automation capabilities using Puppeteer...",
"slug": "@anthropics/mcp-server-puppeteer",
"githubUrl": "https://github.com/anthropics/mcp-server-puppeteer",
"tools": ["puppeteer_navigate", "puppeteer_screenshot", "puppeteer_click", "puppeteer_fill", "puppeteer_evaluate"],
"categories": ["TypeScript", "Remote"],
"sources": ["glama", "mcpso", "pulsemcp"],
"sourceUrls": {
"glama": "https://glama.ai/mcp/servers/@anthropics/mcp-server-puppeteer",
"mcpso": "https://mcp.so/server/puppeteer/anthropics",
"pulsemcp": "https://www.pulsemcp.com/servers/puppeteer-mcp-server"
},
"qualityScore": 82,
"qualityBreakdown": { "popularity": 22, "activity": 28, "documentation": 20, "completeness": 12 },
"qualityNote": "High quality: active repo, well-documented, 3,400+ stars",
"githubStars": 3421,
"githubForks": 298,
"daysSinceLastCommit": 14,
"isArchived": false,
"isFork": false,
"hasLicense": true,
"licenseType": "MIT",
"npmWeeklyDownloads": 8200,
"scrapedAt": "2026-02-18T03:28:00.000Z"
}

Use cases

  • MCP ecosystem research: Analyze the MCP server landscape — track growth, identify categories, find trending servers
  • Quality filtering: Use quality fields to shortlist servers for manual review
  • Competitive analysis: Monitor competing MCP servers across all major directories
  • Directory building: Build your own MCP directory or comparison tool using pre-aggregated data
  • Server discovery: Find MCP servers by category, author, keyword, or technology stack
  • Data enrichment: Cross-reference servers across directories for the most complete profiles
  • Change monitoring: Track specific servers with scheduled runs to detect updates
  • AI ecosystem mapping: Combine with Futurepedia Scraper and TAAFT Scraper for AI tool + MCP server research

Supported directories

DirectoryURLServersUnique data
Glama.aiglama.ai23,963 shown 2026-05-20Tools, categories, FAQ, JSON-LD structured data
PulseMCPpulsemcp.com15,445 shown 2026-05-20Author info, related servers, server.json detection
mcp.somcp.sopublic sitemap basedKey features, innovation flags, DXT flags, icons
MCP App Storemcpapp.net620 public app/connector listings found 2026-05-28ChatGPT/Claude surfaces, connect links, auth, transport, support URLs

Data availability by source

FieldGlamaPulseMCPmcp.soMCP App Store
NameYesYesYesYes
DescriptionYesYesYesYes
AuthorYesYesYesYes
GitHub URLYesYesYesSometimes
Website URLYesNoNoYes
ToolsYesNoYesSometimes
CategoriesYesYesNoYes
FAQYesNoNoYes
Related servers/appsNoYesNoYes
Platform surfacesNoNoNoYes
Auth/transportNoNoNoYes
Key featuresNoNoYesNo
Icon URLNoNoYesYes

Integrations

This Actor works with standard Apify integrations:

  • API — Call via REST API or Apify client libraries (Python, JavaScript, Go)
  • Webhooks — Get notified when scraping completes
  • Scheduler — Run on a schedule for fresh data (daily, weekly, etc.)
  • Zapier / Make — Connect to 1,000+ apps for automated workflows
  • Google Sheets — Export directly to spreadsheets
  • Slack / Email — Send notifications when new servers are found
  • Datasets — Browse and download results in JSON, CSV, XML, or Excel

Tips and best practices

  • Start small: Begin with maxServers: 5-50 to validate output format and quality before scaling up.
  • Use search mode for targeted data collection instead of scraping everything. It's faster and cheaper.
  • Enable deduplication (default: on) when scraping from all sources to avoid duplicate records.
  • Schedule weekly runs to maintain a fresh dataset. MCP directories add new servers daily.
  • For quality scoring at scale: Always provide a githubToken — without it, the 60 req/hr GitHub rate limit will throttle scoring to ~10 servers/hr.
  • Combine with our other scrapers: Use alongside Futurepedia Scraper and TAAFT Scraper to build a comprehensive AI ecosystem dataset covering both AI tools and MCP servers.

FAQ

Q: How fresh is the data? A: Data is scraped in real-time from all supported sources when the Actor runs. There is no caching.

Q: Can I scrape every server? A: Use a high maxServers value only after a small validation run. The source directories are large and changing, so full crawls can take a few hours and may cost materially more than a focused search or source run.

Q: Why are some fields null? A: Not all directories provide the same data. For example, Glama provides structured FAQ data, mcp.so provides innovation flags, PulseMCP provides related servers, and MCP App Store provides ChatGPT/Claude surface metadata. See the "Data availability by source" table above.

Q: How does deduplication work? A: The Actor matches servers across directories by GitHub URL (primary) or normalized name + author (fallback). When duplicates are found, it merges data from all sources, keeping the richest value for each field. The sources array shows which directories had a listing for each server.

Q: Why does my quality score run very slowly without a GitHub token? A: Without a GitHub personal access token, the GitHub API allows only 60 requests per hour. Each server scored needs up to 6 API calls, so the unauthenticated rate limits scoring to roughly 10 servers per hour. Add a token (Settings → Environment Variables → GITHUB_TOKEN) to raise the limit to 5,000 req/hr.

Q: What proxies does this Actor use? A: It uses Apify residential proxies with automatic fallback to datacenter proxies. This helps keep access reliable across supported sources.

Q: How accurate are the extracted tool names? A: Tool names are parsed from page structure where the source exposes them. Accuracy is high for well-structured server pages, but some servers/apps may have incomplete or missing tool data. PulseMCP does not expose tool information.

Q: What are the "sources" and "sourceUrls" fields? A: sources is an array showing which directories list this server (e.g., ["glama", "mcpso"]). sourceUrls is an object mapping each source to the specific URL for that server on that directory. This lets you verify data against the original listings.

Changelog

  • 1.2 — Added MCP App Store as a fourth source with ChatGPT app, Claude connector, auth, transport, surface, connect URL, support URL, related app, and collection metadata.
  • 1.1 — Added quality scoring: composite 0–100 score per server using GitHub stars, forks, commit recency, license status, and npm weekly downloads. Updated first-run guidance and current source-count language.
  • 1.0 — Initial release with support for Glama.ai, PulseMCP, and mcp.so. Includes 4 scraping modes, cross-directory deduplication, and balanced source interleaving.