MCP Server Directory Scraper — 28K+ AI Servers
Pricing
Pay per usage
MCP Server Directory Scraper — 28K+ AI Servers
Scrape and aggregate MCP (Model Context Protocol) servers from glama.ai, PulseMCP, and mcp.so. The only unified MCP directory dataset on Apify.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Kris Jensen
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
MCP Server Directory Scraper
Aggregate MCP (Model Context Protocol) server data from the three largest directories into a single unified dataset. This is the only MCP directory aggregator on Apify.
What does MCP Server Directory Scraper do?
This Actor scrapes MCP server listings from three major directories:
- Glama.ai — 17,000+ servers with structured JSON-LD data, tools, categories, and FAQ
- PulseMCP — 8,000+ servers with author attribution and related server links
- mcp.so — 3,700+ servers with GitHub URLs, key features, and innovation flags
It extracts server metadata, deduplicates entries that appear across multiple directories, and merges data from all sources into enriched, unified records.
Why use this Actor?
- Unified dataset: Cross-reference servers across 3 directories in one API call
- Deduplication: Automatically detects duplicate servers and merges their data
- Enriched records: Combines tools, categories, descriptions, and metadata from all sources
- 4 flexible modes: Scrape everything, target one directory, search by keyword, or provide specific URLs
- Production-ready: Residential proxy support, retry logic, and human-friendly error messages
How much does it cost?
This Actor runs on the Apify platform. Usage costs depend on how many servers you scrape:
| Servers | Approx. Cost | Time |
|---|---|---|
| 5 | ~$0.01 | ~30s |
| 50 | ~$0.05 | ~2 min |
| 500 | ~$0.50 | ~15 min |
| 5,000 | ~$5.00 | ~1 hr |
Costs are based on Apify platform compute units. Actual costs may vary based on proxy usage and retry rates.
Input
The Actor accepts the following input parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | enum | all | Scraping mode: all, source, urls, or search |
source | enum | glama | Directory to scrape (only used with source mode) |
urls | array | [] | Specific server page URLs (only used with urls mode) |
searchQuery | string | "" | Keyword to search for (only used with search mode) |
maxServers | integer | 5 | Maximum number of servers to return |
deduplicate | boolean | true | Merge duplicate servers found across directories |
Input modes
all mode — Scrape from all three directories simultaneously. Results are interleaved across sources for balanced coverage. Deduplication merges servers that appear in multiple directories.
source mode — Scrape from a single directory only. Choose from glama, pulsemcp, or mcpso.
urls mode — Provide specific server page URLs from any supported directory. Useful for monitoring specific servers.
search mode — Search across all directories by keyword. Matches against URL slugs (server names in the URL path).
Example input
{"mode": "all","maxServers": 50,"deduplicate": true}
{"mode": "search","searchQuery": "puppeteer","maxServers": 10}
{"mode": "urls","urls": ["https://glama.ai/mcp/servers/@anthropics/mcp-server-puppeteer","https://www.pulsemcp.com/servers/playwright-mcp-server","https://mcp.so/server/puppeteer/anthropics"]}
Output
Each server record contains the following fields:
| Field | Type | Description |
|---|---|---|
name | string | Server name |
author | string | Author/organization name |
authorUrl | string | Author profile URL |
description | string | Full description (up to 1,000 chars) |
shortDescription | string | Brief description from meta tags |
slug | string | URL slug identifier |
githubUrl | string | GitHub repository URL |
npmUrl | string | npm package URL |
serverUrl | string | Server homepage URL |
tools | array | List of MCP tool names |
categories | array | Server categories/tags |
keyFeatures | array | Key features list |
faq | array | FAQ entries (question + answer) |
relatedServers | array | Names of related servers |
iconUrl | string | Server icon/logo URL |
sources | array | Which directories list this server |
sourceUrls | object | URL for this server in each directory |
scrapedAt | string | ISO timestamp of when data was collected |
Example output
{"name": "Puppeteer MCP Server","author": "anthropics","authorUrl": "https://github.com/anthropics","description": "A Model Context Protocol server that provides browser automation capabilities using Puppeteer...","slug": "@anthropics/mcp-server-puppeteer","githubUrl": "https://github.com/anthropics/mcp-server-puppeteer","tools": ["puppeteer_navigate", "puppeteer_screenshot", "puppeteer_click", "puppeteer_fill", "puppeteer_evaluate"],"categories": ["TypeScript", "Remote"],"sources": ["glama", "mcpso", "pulsemcp"],"sourceUrls": {"glama": "https://glama.ai/mcp/servers/@anthropics/mcp-server-puppeteer","mcpso": "https://mcp.so/server/puppeteer/anthropics","pulsemcp": "https://www.pulsemcp.com/servers/puppeteer-mcp-server"},"scrapedAt": "2026-02-17T03:28:00.000Z"}
Use cases
- Market research: Analyze the MCP server ecosystem, track growth trends, identify gaps
- Competitive analysis: Monitor competing MCP servers across all major directories
- Directory building: Build your own MCP directory or comparison tool using aggregated data
- Server discovery: Find MCP servers by category, author, or keyword
- Data enrichment: Combine data from multiple sources for richer server profiles
- Monitoring: Track specific servers across directories for changes
Supported directories
| Directory | URL | Servers | Data quality |
|---|---|---|---|
| Glama.ai | glama.ai | 17,000+ | Tools, categories, FAQ, JSON-LD |
| PulseMCP | pulsemcp.com | 8,000+ | Author, related servers |
| mcp.so | mcp.so | 3,700+ | Features, innovation flags, icons |
Integrations
This Actor works with standard Apify integrations:
- API — Call via REST API or Apify client libraries (Python, JavaScript)
- Webhooks — Get notified when scraping completes
- Scheduler — Run on a schedule for fresh data
- Zapier / Make — Connect to 1,000+ apps
- Google Sheets — Export directly to spreadsheets
- Slack — Send results to Slack channels
FAQ
Q: How fresh is the data? A: Data is scraped in real-time from all three directories when the Actor runs. There is no caching.
Q: Can I scrape all 28,000+ servers?
A: Yes, set maxServers to a high number. Be aware this will take longer and cost more in compute units.
Q: Why are some fields null? A: Not all directories provide the same data. For example, only Glama provides FAQ data, and only mcp.so provides innovation flags. Fields are null when the source directory doesn't include that information.
Q: How does deduplication work? A: The Actor matches servers across directories by GitHub URL (primary) or normalized name + author (fallback). When duplicates are found, it merges data from all sources, keeping the richest value for each field.
Q: What proxies does this Actor use? A: It uses Apify residential proxies with automatic fallback to datacenter proxies.
Changelog
- 1.0 — Initial release with support for Glama.ai, PulseMCP, and mcp.so