Skill Curator Scraper avatar

Skill Curator Scraper

Pricing

from $0.02 / 1,000 results

Go to Apify Store
Skill Curator Scraper

Skill Curator Scraper

MCP Skill Scraper collects AI skills from SkillsMP and GitHub. It extracts name, description, stars, license, and URLs, then calculates a quality score. Outputs structured JSON for discovering MCP tools, AI skills, and developer resources.

Pricing

from $0.02 / 1,000 results

Rating

0.0

(0)

Developer

Data Pilot

Data Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

๐ŸŽ“ Skill Curator Scraper is a powerful Apify Actor designed to discover and curate comprehensive Skill Curator resources from multiple sources including SkillsMP and GitHub. This tool aggregates Skill Curator content, learning resources, and skill repositories for any topic or domain. Whether you're building skill inventories, curating learning resources, or discovering professional development materials, the Skill Curator Scraper delivers quality Skill Curator content efficiently.

With multi-source aggregation, intelligent quality scoring, duplicate detection, and GitHub integration, the Skill Curator Scraper ensures comprehensive discovery of Skill Curator resources with reliability metrics. It focuses on key Skill Curator metrics including quality scores, repository stars, licensing information, and skill descriptions, making it an essential tool for Skill Curator research and professional development intelligence.


๐Ÿ“‹ Table of Contents


๐Ÿ”ฅ Features

  • Multi-Source Skill Discovery โ€“ Aggregates Skill Curator resources from SkillsMP and GitHub simultaneously using parallel requests.
  • SkillsMP Scraping โ€“ HTML scraping of SkillsMP platform for Skill Curator content discovery.
  • GitHub Repository Search โ€“ GitHub API integration for discovering skill-related repositories and resources.
  • Quality Scoring โ€“ Intelligent algorithm scoring skills based on stars, licensing, description quality, and availability.
  • Duplicate Detection โ€“ Automatic deduplication of skills from multiple sources.
  • Star-Based Ranking โ€“ Prioritizes popular repositories and well-maintained projects.
  • License Information โ€“ Extracts and includes SPDX license identifiers for legal compliance.
  • Author Attribution โ€“ Captures author/owner information from repositories.
  • Keyword-Based Search โ€“ Supports multiple keywords for comprehensive skill discovery.
  • Bulk Keyword Processing โ€“ Analyzes multiple skill keywords simultaneously.
  • Rate Limiting โ€“ Includes automatic delays to respect API rate limits.
  • Proxy Support โ€“ Apify residential proxy support for reliable access.
  • Real-Time Dataset Push โ€“ Pushes results to Apify Dataset with metadata.
  • Timestamp Recording โ€“ Records discovery timestamp for audit trails.
  • Error Handling โ€“ Graceful error handling with detailed logging.
  • Asyncio-Friendly โ€“ Non-blocking async/await architecture.

๐ŸŒ Sources

1. SkillsMP

  • Platform: Skill marketplace and curator platform
  • Search Type: HTML scraping
  • Content: Skill cards, descriptions, skill URLs
  • Data Extracted: Name, description, skill URL
  • URL Format: https://skillsmp.com/skills/{name}
  • Coverage: Broad skill marketplace

2. GitHub Repositories

  • Platform: GitHub version control and open source
  • Search Type: REST API (JSON)
  • Content: Repositories, code projects, implementations
  • Data Extracted: Name, description, stars, license, author, GitHub URL
  • API Endpoint: https://api.github.com/search/repositories
  • Search Query: Keywords + "mcp" + "skill"
  • Sorting: By stars (most popular first)

โš™๏ธ How It Works

The Skill Curator Scraper takes skill keywords as input and searches multiple sources simultaneously. It scrapes SkillsMP for skill cards and queries GitHub API for repositories. Each skill is assigned a quality score based on stars, licensing, description quality, and availability. Results are deduplicated and pushed to the Apify Dataset.

Key Processing Steps:

  1. Input Parsing โ€“ Accept skill keywords from Actor input
  2. Proxy Setup โ€“ Configure Apify residential proxy if available
  3. Parallel Source Queries โ€“ Launch SkillsMP scraping and GitHub API search
  4. SkillsMP Scraping โ€“ HTML parse skill cards from SkillsMP
  5. GitHub API Search โ€“ Query GitHub with keyword filters
  6. Data Extraction โ€“ Extract name, description, stars, license, author
  7. Quality Scoring โ€“ Calculate quality score for each skill
  8. Deduplication โ€“ Remove duplicate entries from multiple sources
  9. Result Compilation โ€“ Aggregate findings from all sources
  10. Dataset Push โ€“ Push to Apify Dataset with metadata

Key Benefits:

  • Discover Skill Curator resources from multiple trusted sources
  • Find popular and well-maintained skill implementations
  • Compare skills across SkillsMP and GitHub
  • Identify high-quality learning resources
  • Build comprehensive skill inventories
  • Research skill implementations and examples

๐Ÿ“ฅ Input

The Actor accepts the following input parameters:

FieldTypeDefaultDescription
keywordsarrayrequiredSkill keywords to search (e.g., ["React", "Python", "DevOps"])
limit_per_keywordinteger20Maximum skills per keyword (1-100)
proxyConfigurationobject{"useApifyProxy": true}Proxy configuration settings

Example Input:

{
"keywords": ["React", "Python", "DevOps", "GraphQL", "Kubernetes"],
"limit_per_keyword": 25,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Single Keyword Example:

{
"keywords": ["Machine Learning"],
"limit_per_keyword": 30
}

๐Ÿ“ค Output

The Actor pushes Skill Curator records with the following structure:

FieldTypeDescription
namestringSkill or repository name
descriptionstringSkill/repo description (max 300 chars)
authorstringRepository owner/author name
repoMetadata.starsintegerGitHub stars count
repoMetadata.licensestringSPDX license identifier
githubUrlstringDirect GitHub repository URL
skillUrlstringSkillsMP or project skill URL
qualityScore.overallintegerQuality score (0-100)
keywordstringSearch keyword used
detected_atstringISO 8601 discovery timestamp

Example Output Record (GitHub):

{
"name": "react-query",
"author": "tannerlinsley",
"description": "Powerful asynchronous state management, server-state utilities and data fetching with TS/JS, React Query, Solid Query, Svelte Query and Vue Query.",
"repoMetadata.stars": 42000,
"repoMetadata.license": "MIT",
"githubUrl": "https://github.com/tannerlinsley/react-query",
"skillUrl": "https://skillsmp.com/skills/react-query",
"qualityScore.overall": 95,
"keyword": "React",
"detected_at": "2025-02-14T12:00:00Z"
}

Example Output Record (SkillsMP):

{
"name": "Advanced React Patterns",
"description": "Learn advanced React patterns including render props, custom hooks, and compound components for building scalable applications.",
"repoMetadata.stars": 0,
"githubUrl": "",
"skillUrl": "https://skillsmp.com/skills/advanced-react-patterns",
"qualityScore.overall": 65,
"keyword": "React",
"detected_at": "2025-02-14T12:00:00Z"
}

๐ŸŽฏ Quality Scoring

The Skill Curator Scraper uses an intelligent quality scoring algorithm to rank skills:

Scoring Criteria

FactorPointsThreshold
GitHub Stars40โ‰ฅ100 stars = 40, โ‰ฅ50 = 30, โ‰ฅ10 = 15
License10Has SPDX license
Description15Description > 100 characters
GitHub URL10Repository URL available
Total100Maximum score

Scoring Examples

High-Quality Repository (100+ stars + license + good description):
- Stars (100+): 40 points
- License: 10 points
- Description (>100 chars): 15 points
- GitHub URL: 10 points
- Total: 75/100 (Good)
Popular Repository (1000+ stars + license + excellent description):
- Stars (100+): 40 points (capped at 40)
- License: 10 points
- Description: 15 points
- GitHub URL: 10 points
- Total: 75/100
SkillsMP Card (No GitHub, good description):
- Description: 15 points
- Total: 15/100 (Low)

Score Interpretation

  • 90-100: Excellent (Popular, well-maintained, licensed)
  • 70-89: Good (Solid project with community support)
  • 50-69: Fair (Emerging projects, niche tools)
  • 30-49: Basic (Limited info or new projects)
  • 0-29: Limited (Minimal metadata, research phase)

๐Ÿงฐ Technical Stack

  • HTTP Requests: requests library with asyncio executor
  • HTML Parsing: BeautifulSoup4 for SkillsMP scraping
  • APIs: GitHub REST API v3 (JSON)
  • Async: asyncio for concurrent requests
  • Pattern Matching: Python regex for text cleaning
  • Proxy: Apify Proxy with residential support
  • Logging: Apify Actor logging system
  • Platform: Apify Actor serverless environment
  • Timeout: 20 seconds per request

๐ŸŽฏ Use Cases

  • Skill Inventory Building โ€“ Create comprehensive Skill Curator inventories
  • Learning Resource Curation โ€“ Discover quality learning materials
  • Technology Research โ€“ Research popular skill implementations
  • Competitive Analysis โ€“ Compare skills across platforms
  • Professional Development โ€“ Find resources for skill enhancement
  • Project Reference โ€“ Discover implementation examples
  • Technology Stack Planning โ€“ Evaluate skill options
  • Team Skill Assessment โ€“ Identify skill gaps and opportunities
  • Startup Research โ€“ Discover emerging skills and tools
  • Education Planning โ€“ Build curriculum with curated resources
  • Vendor Evaluation โ€“ Assess skill availability and quality
  • Open Source Discovery โ€“ Find high-quality open source projects
  • Technology Benchmarking โ€“ Compare skills across metrics
  • Knowledge Management โ€“ Build skill knowledge bases
  • Job Market Analysis โ€“ Research in-demand skills

Limit Configuration

Balanced (20 per keyword):

{
"limit_per_keyword": 20
}

Comprehensive (50 per keyword):

{
"limit_per_keyword": 50
}

๐Ÿ“ฆ Changelog

Initial Release:

  • Multi-source skill discovery (SkillsMP + GitHub)
  • SkillsMP HTML scraping for skill cards
  • GitHub API integration for repositories
  • Quality scoring algorithm (0-100 scale)
  • Star-based popularity ranking
  • License information extraction
  • Duplicate detection across sources
  • Author/owner attribution
  • Bulk keyword processing
  • Keyword-based search capability
  • Rate limiting (1 second between keywords)
  • Apify proxy support
  • Asyncio executor for non-blocking requests
  • Real-time Dataset push
  • ISO 8601 timestamp recording
  • Error handling and logging

Disclaimer: Skill Curator Scraper is provided as-is for skill discovery purposes. Users are responsible for ensuring compliance with platform ToS and laws. Always respect original authors and licenses.


๐ŸŽ‰ Get Started Today

Deploy now for skill discovery!

Use for:

  • ๐Ÿ“š Learning Resource Curation
  • ๐Ÿ” Skill Research
  • ๐Ÿ’ก Technology Intelligence
  • ๐Ÿ“‹ Skill Inventory
  • ๐ŸŽฏ Professional Development

Perfect for:

  • Learning Platforms
  • Career Coaches
  • Educators
  • Researchers
  • Product Managers

  • Smart Article Extractor
  • Business Social Media Finder
  • Fast News Content Scraper
  • Startup Company Data Collector

Your complete Apify-powered skill discovery solution! ๐Ÿš€โœจ


๐ŸŽ“ Skill Discovery Excellence

This Actor is optimized for Skill Curator discovery with:

  • โœ… Multi-source aggregation
  • โœ… Intelligent quality scoring
  • โœ… GitHub API integration
  • โœ… SkillsMP scraping
  • โœ… Duplicate detection
  • โœ… Real-time Dataset integration
  • โœ… Error recovery
  • โœ… Production-ready code

Discover and curate skills effortlessly! ๐Ÿ’Ž๐Ÿš€