Skill Curator Scraper
Pricing
from $0.02 / 1,000 results
Skill Curator Scraper
MCP Skill Scraper collects AI skills from SkillsMP and GitHub. It extracts name, description, stars, license, and URLs, then calculates a quality score. Outputs structured JSON for discovering MCP tools, AI skills, and developer resources.
Pricing
from $0.02 / 1,000 results
Rating
0.0
(0)
Developer
Data Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Share
๐ Skill Curator Scraper is a powerful Apify Actor designed to discover and curate comprehensive Skill Curator resources from multiple sources including SkillsMP and GitHub. This tool aggregates Skill Curator content, learning resources, and skill repositories for any topic or domain. Whether you're building skill inventories, curating learning resources, or discovering professional development materials, the Skill Curator Scraper delivers quality Skill Curator content efficiently.
With multi-source aggregation, intelligent quality scoring, duplicate detection, and GitHub integration, the Skill Curator Scraper ensures comprehensive discovery of Skill Curator resources with reliability metrics. It focuses on key Skill Curator metrics including quality scores, repository stars, licensing information, and skill descriptions, making it an essential tool for Skill Curator research and professional development intelligence.
๐ Table of Contents
- Features
- Sources
- How It Works
- Input
- Output
- Quality Scoring
- Technical Stack
- Data Fields
- Use Cases
- Quick Start
- Configuration
- Performance
- Important Notes
- Keywords
- Changelog
- Support
๐ฅ Features
- Multi-Source Skill Discovery โ Aggregates Skill Curator resources from SkillsMP and GitHub simultaneously using parallel requests.
- SkillsMP Scraping โ HTML scraping of SkillsMP platform for Skill Curator content discovery.
- GitHub Repository Search โ GitHub API integration for discovering skill-related repositories and resources.
- Quality Scoring โ Intelligent algorithm scoring skills based on stars, licensing, description quality, and availability.
- Duplicate Detection โ Automatic deduplication of skills from multiple sources.
- Star-Based Ranking โ Prioritizes popular repositories and well-maintained projects.
- License Information โ Extracts and includes SPDX license identifiers for legal compliance.
- Author Attribution โ Captures author/owner information from repositories.
- Keyword-Based Search โ Supports multiple keywords for comprehensive skill discovery.
- Bulk Keyword Processing โ Analyzes multiple skill keywords simultaneously.
- Rate Limiting โ Includes automatic delays to respect API rate limits.
- Proxy Support โ Apify residential proxy support for reliable access.
- Real-Time Dataset Push โ Pushes results to Apify Dataset with metadata.
- Timestamp Recording โ Records discovery timestamp for audit trails.
- Error Handling โ Graceful error handling with detailed logging.
- Asyncio-Friendly โ Non-blocking async/await architecture.
๐ Sources
1. SkillsMP
- Platform: Skill marketplace and curator platform
- Search Type: HTML scraping
- Content: Skill cards, descriptions, skill URLs
- Data Extracted: Name, description, skill URL
- URL Format:
https://skillsmp.com/skills/{name} - Coverage: Broad skill marketplace
2. GitHub Repositories
- Platform: GitHub version control and open source
- Search Type: REST API (JSON)
- Content: Repositories, code projects, implementations
- Data Extracted: Name, description, stars, license, author, GitHub URL
- API Endpoint:
https://api.github.com/search/repositories - Search Query: Keywords + "mcp" + "skill"
- Sorting: By stars (most popular first)
โ๏ธ How It Works
The Skill Curator Scraper takes skill keywords as input and searches multiple sources simultaneously. It scrapes SkillsMP for skill cards and queries GitHub API for repositories. Each skill is assigned a quality score based on stars, licensing, description quality, and availability. Results are deduplicated and pushed to the Apify Dataset.
Key Processing Steps:
- Input Parsing โ Accept skill keywords from Actor input
- Proxy Setup โ Configure Apify residential proxy if available
- Parallel Source Queries โ Launch SkillsMP scraping and GitHub API search
- SkillsMP Scraping โ HTML parse skill cards from SkillsMP
- GitHub API Search โ Query GitHub with keyword filters
- Data Extraction โ Extract name, description, stars, license, author
- Quality Scoring โ Calculate quality score for each skill
- Deduplication โ Remove duplicate entries from multiple sources
- Result Compilation โ Aggregate findings from all sources
- Dataset Push โ Push to Apify Dataset with metadata
Key Benefits:
- Discover Skill Curator resources from multiple trusted sources
- Find popular and well-maintained skill implementations
- Compare skills across SkillsMP and GitHub
- Identify high-quality learning resources
- Build comprehensive skill inventories
- Research skill implementations and examples
๐ฅ Input
The Actor accepts the following input parameters:
| Field | Type | Default | Description |
|---|---|---|---|
keywords | array | required | Skill keywords to search (e.g., ["React", "Python", "DevOps"]) |
limit_per_keyword | integer | 20 | Maximum skills per keyword (1-100) |
proxyConfiguration | object | {"useApifyProxy": true} | Proxy configuration settings |
Example Input:
{"keywords": ["React", "Python", "DevOps", "GraphQL", "Kubernetes"],"limit_per_keyword": 25,"proxyConfiguration": {"useApifyProxy": true}}
Single Keyword Example:
{"keywords": ["Machine Learning"],"limit_per_keyword": 30}
๐ค Output
The Actor pushes Skill Curator records with the following structure:
| Field | Type | Description |
|---|---|---|
name | string | Skill or repository name |
description | string | Skill/repo description (max 300 chars) |
author | string | Repository owner/author name |
repoMetadata.stars | integer | GitHub stars count |
repoMetadata.license | string | SPDX license identifier |
githubUrl | string | Direct GitHub repository URL |
skillUrl | string | SkillsMP or project skill URL |
qualityScore.overall | integer | Quality score (0-100) |
keyword | string | Search keyword used |
detected_at | string | ISO 8601 discovery timestamp |
Example Output Record (GitHub):
{"name": "react-query","author": "tannerlinsley","description": "Powerful asynchronous state management, server-state utilities and data fetching with TS/JS, React Query, Solid Query, Svelte Query and Vue Query.","repoMetadata.stars": 42000,"repoMetadata.license": "MIT","githubUrl": "https://github.com/tannerlinsley/react-query","skillUrl": "https://skillsmp.com/skills/react-query","qualityScore.overall": 95,"keyword": "React","detected_at": "2025-02-14T12:00:00Z"}
Example Output Record (SkillsMP):
{"name": "Advanced React Patterns","description": "Learn advanced React patterns including render props, custom hooks, and compound components for building scalable applications.","repoMetadata.stars": 0,"githubUrl": "","skillUrl": "https://skillsmp.com/skills/advanced-react-patterns","qualityScore.overall": 65,"keyword": "React","detected_at": "2025-02-14T12:00:00Z"}
๐ฏ Quality Scoring
The Skill Curator Scraper uses an intelligent quality scoring algorithm to rank skills:
Scoring Criteria
| Factor | Points | Threshold |
|---|---|---|
| GitHub Stars | 40 | โฅ100 stars = 40, โฅ50 = 30, โฅ10 = 15 |
| License | 10 | Has SPDX license |
| Description | 15 | Description > 100 characters |
| GitHub URL | 10 | Repository URL available |
| Total | 100 | Maximum score |
Scoring Examples
High-Quality Repository (100+ stars + license + good description):- Stars (100+): 40 points- License: 10 points- Description (>100 chars): 15 points- GitHub URL: 10 points- Total: 75/100 (Good)Popular Repository (1000+ stars + license + excellent description):- Stars (100+): 40 points (capped at 40)- License: 10 points- Description: 15 points- GitHub URL: 10 points- Total: 75/100SkillsMP Card (No GitHub, good description):- Description: 15 points- Total: 15/100 (Low)
Score Interpretation
- 90-100: Excellent (Popular, well-maintained, licensed)
- 70-89: Good (Solid project with community support)
- 50-69: Fair (Emerging projects, niche tools)
- 30-49: Basic (Limited info or new projects)
- 0-29: Limited (Minimal metadata, research phase)
๐งฐ Technical Stack
- HTTP Requests: requests library with asyncio executor
- HTML Parsing: BeautifulSoup4 for SkillsMP scraping
- APIs: GitHub REST API v3 (JSON)
- Async: asyncio for concurrent requests
- Pattern Matching: Python regex for text cleaning
- Proxy: Apify Proxy with residential support
- Logging: Apify Actor logging system
- Platform: Apify Actor serverless environment
- Timeout: 20 seconds per request
๐ฏ Use Cases
- Skill Inventory Building โ Create comprehensive Skill Curator inventories
- Learning Resource Curation โ Discover quality learning materials
- Technology Research โ Research popular skill implementations
- Competitive Analysis โ Compare skills across platforms
- Professional Development โ Find resources for skill enhancement
- Project Reference โ Discover implementation examples
- Technology Stack Planning โ Evaluate skill options
- Team Skill Assessment โ Identify skill gaps and opportunities
- Startup Research โ Discover emerging skills and tools
- Education Planning โ Build curriculum with curated resources
- Vendor Evaluation โ Assess skill availability and quality
- Open Source Discovery โ Find high-quality open source projects
- Technology Benchmarking โ Compare skills across metrics
- Knowledge Management โ Build skill knowledge bases
- Job Market Analysis โ Research in-demand skills
Limit Configuration
Balanced (20 per keyword):
{"limit_per_keyword": 20}
Comprehensive (50 per keyword):
{"limit_per_keyword": 50}
๐ฆ Changelog
Initial Release:
- Multi-source skill discovery (SkillsMP + GitHub)
- SkillsMP HTML scraping for skill cards
- GitHub API integration for repositories
- Quality scoring algorithm (0-100 scale)
- Star-based popularity ranking
- License information extraction
- Duplicate detection across sources
- Author/owner attribution
- Bulk keyword processing
- Keyword-based search capability
- Rate limiting (1 second between keywords)
- Apify proxy support
- Asyncio executor for non-blocking requests
- Real-time Dataset push
- ISO 8601 timestamp recording
- Error handling and logging
Disclaimer: Skill Curator Scraper is provided as-is for skill discovery purposes. Users are responsible for ensuring compliance with platform ToS and laws. Always respect original authors and licenses.
๐ Get Started Today
Deploy now for skill discovery!
Use for:
- ๐ Learning Resource Curation
- ๐ Skill Research
- ๐ก Technology Intelligence
- ๐ Skill Inventory
- ๐ฏ Professional Development
Perfect for:
- Learning Platforms
- Career Coaches
- Educators
- Researchers
- Product Managers
๐ Related Tools
- Smart Article Extractor
- Business Social Media Finder
- Fast News Content Scraper
- Startup Company Data Collector
Your complete Apify-powered skill discovery solution! ๐โจ
๐ Skill Discovery Excellence
This Actor is optimized for Skill Curator discovery with:
- โ Multi-source aggregation
- โ Intelligent quality scoring
- โ GitHub API integration
- โ SkillsMP scraping
- โ Duplicate detection
- โ Real-time Dataset integration
- โ Error recovery
- โ Production-ready code
Discover and curate skills effortlessly! ๐๐