MCP Server Directory Scraper — 28K+ AI Servers avatar

MCP Server Directory Scraper — 28K+ AI Servers

Pricing

Pay per usage

Go to Apify Store
MCP Server Directory Scraper — 28K+ AI Servers

MCP Server Directory Scraper — 28K+ AI Servers

Scrape and aggregate MCP (Model Context Protocol) servers from glama.ai, PulseMCP, and mcp.so. The only unified MCP directory dataset on Apify.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Kris Jensen

Kris Jensen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

MCP Server Directory Scraper

Aggregate MCP (Model Context Protocol) server data from the three largest directories into a single unified dataset. This is the only MCP directory aggregator on Apify.

What does MCP Server Directory Scraper do?

This Actor scrapes MCP server listings from three major directories:

  • Glama.ai — 17,000+ servers with structured JSON-LD data, tools, categories, and FAQ
  • PulseMCP — 8,000+ servers with author attribution and related server links
  • mcp.so — 3,700+ servers with GitHub URLs, key features, and innovation flags

It extracts server metadata, deduplicates entries that appear across multiple directories, and merges data from all sources into enriched, unified records.

Why use this Actor?

  • Unified dataset: Cross-reference servers across 3 directories in one API call
  • Deduplication: Automatically detects duplicate servers and merges their data
  • Enriched records: Combines tools, categories, descriptions, and metadata from all sources
  • 4 flexible modes: Scrape everything, target one directory, search by keyword, or provide specific URLs
  • Production-ready: Residential proxy support, retry logic, and human-friendly error messages

How much does it cost?

This Actor runs on the Apify platform. Usage costs depend on how many servers you scrape:

ServersApprox. CostTime
5~$0.01~30s
50~$0.05~2 min
500~$0.50~15 min
5,000~$5.00~1 hr

Costs are based on Apify platform compute units. Actual costs may vary based on proxy usage and retry rates.

Input

The Actor accepts the following input parameters:

ParameterTypeDefaultDescription
modeenumallScraping mode: all, source, urls, or search
sourceenumglamaDirectory to scrape (only used with source mode)
urlsarray[]Specific server page URLs (only used with urls mode)
searchQuerystring""Keyword to search for (only used with search mode)
maxServersinteger5Maximum number of servers to return
deduplicatebooleantrueMerge duplicate servers found across directories

Input modes

all mode — Scrape from all three directories simultaneously. Results are interleaved across sources for balanced coverage. Deduplication merges servers that appear in multiple directories.

source mode — Scrape from a single directory only. Choose from glama, pulsemcp, or mcpso.

urls mode — Provide specific server page URLs from any supported directory. Useful for monitoring specific servers.

search mode — Search across all directories by keyword. Matches against URL slugs (server names in the URL path).

Example input

{
"mode": "all",
"maxServers": 50,
"deduplicate": true
}
{
"mode": "search",
"searchQuery": "puppeteer",
"maxServers": 10
}
{
"mode": "urls",
"urls": [
"https://glama.ai/mcp/servers/@anthropics/mcp-server-puppeteer",
"https://www.pulsemcp.com/servers/playwright-mcp-server",
"https://mcp.so/server/puppeteer/anthropics"
]
}

Output

Each server record contains the following fields:

FieldTypeDescription
namestringServer name
authorstringAuthor/organization name
authorUrlstringAuthor profile URL
descriptionstringFull description (up to 1,000 chars)
shortDescriptionstringBrief description from meta tags
slugstringURL slug identifier
githubUrlstringGitHub repository URL
npmUrlstringnpm package URL
serverUrlstringServer homepage URL
toolsarrayList of MCP tool names
categoriesarrayServer categories/tags
keyFeaturesarrayKey features list
faqarrayFAQ entries (question + answer)
relatedServersarrayNames of related servers
iconUrlstringServer icon/logo URL
sourcesarrayWhich directories list this server
sourceUrlsobjectURL for this server in each directory
scrapedAtstringISO timestamp of when data was collected

Example output

{
"name": "Puppeteer MCP Server",
"author": "anthropics",
"authorUrl": "https://github.com/anthropics",
"description": "A Model Context Protocol server that provides browser automation capabilities using Puppeteer...",
"slug": "@anthropics/mcp-server-puppeteer",
"githubUrl": "https://github.com/anthropics/mcp-server-puppeteer",
"tools": ["puppeteer_navigate", "puppeteer_screenshot", "puppeteer_click", "puppeteer_fill", "puppeteer_evaluate"],
"categories": ["TypeScript", "Remote"],
"sources": ["glama", "mcpso", "pulsemcp"],
"sourceUrls": {
"glama": "https://glama.ai/mcp/servers/@anthropics/mcp-server-puppeteer",
"mcpso": "https://mcp.so/server/puppeteer/anthropics",
"pulsemcp": "https://www.pulsemcp.com/servers/puppeteer-mcp-server"
},
"scrapedAt": "2026-02-17T03:28:00.000Z"
}

Use cases

  • Market research: Analyze the MCP server ecosystem, track growth trends, identify gaps
  • Competitive analysis: Monitor competing MCP servers across all major directories
  • Directory building: Build your own MCP directory or comparison tool using aggregated data
  • Server discovery: Find MCP servers by category, author, or keyword
  • Data enrichment: Combine data from multiple sources for richer server profiles
  • Monitoring: Track specific servers across directories for changes

Supported directories

DirectoryURLServersData quality
Glama.aiglama.ai17,000+Tools, categories, FAQ, JSON-LD
PulseMCPpulsemcp.com8,000+Author, related servers
mcp.somcp.so3,700+Features, innovation flags, icons

Integrations

This Actor works with standard Apify integrations:

  • API — Call via REST API or Apify client libraries (Python, JavaScript)
  • Webhooks — Get notified when scraping completes
  • Scheduler — Run on a schedule for fresh data
  • Zapier / Make — Connect to 1,000+ apps
  • Google Sheets — Export directly to spreadsheets
  • Slack — Send results to Slack channels

FAQ

Q: How fresh is the data? A: Data is scraped in real-time from all three directories when the Actor runs. There is no caching.

Q: Can I scrape all 28,000+ servers? A: Yes, set maxServers to a high number. Be aware this will take longer and cost more in compute units.

Q: Why are some fields null? A: Not all directories provide the same data. For example, only Glama provides FAQ data, and only mcp.so provides innovation flags. Fields are null when the source directory doesn't include that information.

Q: How does deduplication work? A: The Actor matches servers across directories by GitHub URL (primary) or normalized name + author (fallback). When duplicates are found, it merges data from all sources, keeping the richest value for each field.

Q: What proxies does this Actor use? A: It uses Apify residential proxies with automatic fallback to datacenter proxies.

Changelog

  • 1.0 — Initial release with support for Glama.ai, PulseMCP, and mcp.so