Github Profile Scraper avatar
Github Profile Scraper

Pricing

$20.00/month + usage

Go to Apify Store
Github Profile Scraper

Github Profile Scraper

Developed by

VulnV

VulnV

Maintained by Community

Scrapes GitHub user profiles including bio, repositories, followers, contributions, and more. Accepts a list of usernames and extracts comprehensive profile data.

5.0 (1)

Pricing

$20.00/month + usage

1

1

1

Last modified

12 hours ago

πŸš€ GitHub Profile Scraper ⚑ Extract Developer Profiles at Scale

Overview

The GitHub Profile Scraper is a powerful Apify Actor designed to extract comprehensive data from GitHub user profiles efficiently. Perfect for recruitment, developer research, competitive analysis, or building developer databases β€” this scraper provides detailed insights into GitHub users' professional profiles, repositories, and contributions.

βœ… Bulk username processing | βœ… Comprehensive profile data | βœ… Email extraction (when public) | βœ… Repository analysis | βœ… Contribution tracking


Complete Profile Data Extraction

  • Basic Information β€” Name, username, bio, location, website
  • Contact Details β€” Email addresses (when publicly visible)
  • Professional Details β€” Company, Twitter/X handle
  • Network Statistics β€” Followers, following counts
  • Repository Data β€” Public repositories count, pinned repositories with details
  • Activity Metrics β€” Contribution counts and contribution graph data
  • Social Links β€” Website, social media profiles
  • Starred Repositories β€” List of starred projects (when accessible)

Key Features

  • Bulk Processing β€” Process multiple GitHub usernames in one run
  • Smart Email Detection β€” Extracts emails using multiple methods including itemprop="email" elements (only for publicly visible emails)
  • Proxy Support β€” Built-in Apify proxy integration for reliable scraping
  • Error Handling β€” Robust error handling with detailed status reporting
  • Clean JSON Output β€” Structured, ready-to-use data format
  • Username Validation β€” Automatic username cleaning and validation with GitHub format requirements
  • Format Flexibility β€” Accepts various username formats and automatically normalizes them

🧾 Input Configuration

Submit an array of GitHub usernames via the input schema:

{
"usernames": [
"johndeveloper",
"jane-coder",
"techexpert123",
"@another-user",
"https://github.com/some-developer"
],
"max_threads": 5,
"proxy_configuration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Note: The scraper automatically normalizes different username formats and validates them against GitHub's requirements. Invalid usernames will be skipped with warning messages.

Input Parameters

  1. Usernames (required):

    • Array of GitHub usernames to scrape
    • Supported formats: username, @username, github.com/username, https://github.com/username
    • Username requirements: Must follow GitHub's username rules (alphanumeric characters and hyphens, no consecutive hyphens, cannot start/end with hyphen, max 39 characters)
    • Invalid usernames will be automatically filtered out with warnings
  2. Max Threads (optional):

    • Number of concurrent threads for scraping (1-20)
    • Default: 5
    • Higher values = faster processing but may increase chance of rate limiting
  3. Proxy Configuration (recommended):

    • Enable Apify proxy to avoid rate limiting
    • Recommended for bulk scraping operations

πŸ“€ Output Format

Each GitHub profile returns structured data such as:

{
"username": "johndeveloper",
"status": "success",
"name": "John Developer",
"bio": "Full-stack developer passionate about open source",
"location": "San Francisco, CA",
"email": "john@example.com",
"website": "https://johndeveloper.dev",
"twitter": "john_codes",
"followers": "1234",
"following": "456",
"repos_count": "42",
"contribs": "567 contributions in the last year",
"pinnedrepos": [
{
"name": "awesome-project",
"url": "https://github.com/johndeveloper/awesome-project",
"desc": "An innovative web application framework",
"lang": "JavaScript",
"stars": "2,500",
"forks": "320"
}
],
"repos": [
{
"url": "https://github.com/johndeveloper/web-framework",
"name": "web-framework",
"desc": "Modern web development framework",
"stars": "1850",
"forks": "210",
"languages": [
{"lang": "JavaScript", "percent": "78.2%"},
{"lang": "TypeScript", "percent": "18.5%"}
]
}
],
"starred_repos_list": [
{
"url": "https://github.com/example-org/popular-tool",
"name": "popular-tool"
}
],
"contrib_matrix": [
{
"date": "2024-01-01",
"count": "3",
"level": "1"
}
]
}

Error Handling

Failed profiles return structured error information:

{
"username": "nonexistent-user",
"status": "not_found",
"message": "User not found"
}

Common Error Cases:

  • not_found β€” User doesn't exist or profile is private
  • error β€” Network issues or scraping errors
  • Invalid usernames are filtered out before processing with warning logs

πŸ’Ό Common Use Cases

Recruitment & Talent Sourcing

  • Research developer profiles and technical expertise
  • Analyze contribution patterns and project involvement
  • Build comprehensive talent pipelines with GitHub activity data
  • Assess coding skills through repository analysis

Developer Research & Analysis

  • Study open source community members and contributors
  • Analyze technology trends through developer profiles
  • Research competitor team structures and technical expertise
  • Track developer career progression and project involvement

Lead Generation & Business Development

  • Extract contact information for developer outreach
  • Build databases of potential customers in tech sectors
  • Identify decision-makers in technology companies
  • Enrich existing contact databases with GitHub profiles

Community Building & Networking

  • Find developers with specific skills or interests
  • Build communities around particular technologies
  • Identify potential collaborators for open source projects
  • Research conference speakers and industry experts

πŸ“Š Output & Export Options

Dataset Storage

  • All extracted data stored in Apify dataset
  • Each profile becomes one dataset item
  • Status tracking for successful and failed extractions

Export Formats

  • JSON β€” Raw structured data for API integration
  • CSV β€” Spreadsheet-compatible format for analysis
  • Excel β€” Formatted spreadsheet with profile data

Data Processing

  • Clean, validated usernames
  • Structured error reporting
  • Comprehensive logging for troubleshooting

⚑ Quick Start Guide

  1. Configure Input:

    • Add GitHub usernames to the usernames array
    • Set desired max_threads (recommended: 5-10)
    • Enable proxy configuration for reliable scraping
  2. Run the Actor:

    • Execute through Apify Console or API
    • Monitor progress through real-time logs
    • Review extracted data in the dataset
  3. Export Results:

    • Download data in your preferred format
    • Integrate with your existing tools and workflows

πŸ›‘οΈ Privacy & Compliance

  • Public Data Only β€” Extracts only publicly visible profile information
  • Respects Privacy Settings β€” Email extraction only works for publicly visible emails
  • Rate Limiting β€” Built-in delays and proxy support to respect GitHub's terms
  • Error Handling β€” Graceful handling of private or restricted profiles

πŸ”§ Technical Details

Built With

  • Python & BeautifulSoup β€” Efficient HTML parsing and data extraction
  • Apify SDK β€” Robust actor framework with built-in storage and proxy support
  • Multi-threading β€” Concurrent processing for improved performance
  • Request Handling β€” Smart retry mechanisms and error recovery

Performance

  • Process hundreds of profiles per run
  • Configurable concurrency for optimal speed
  • Proxy rotation for reliable access
  • Comprehensive error logging and recovery

πŸ“ˆ Example Results

Successful Profile Extraction

{
"username": "jane-coder",
"status": "success",
"name": "Jane Smith",
"bio": "Frontend developer specializing in React and TypeScript. Open source enthusiast.",
"location": "Austin, TX",
"email": null,
"website": "https://jane-codes.dev",
"followers": "3456",
"following": "234",
"repos_count": "87",
"pinnedrepos": [
{
"name": "react-toolkit",
"desc": "Comprehensive React development toolkit",
"stars": "8500",
"lang": "TypeScript"
}
]
}

πŸ’‘ Tips for Best Results

  • Enable Proxies β€” Use Apify proxy configuration for reliable large-scale scraping
  • Username Format β€” Ensure usernames follow GitHub's format rules:
    • Only alphanumeric characters and hyphens allowed
    • Cannot start or end with a hyphen
    • No consecutive hyphens (e.g., user--name is invalid)
    • Maximum 39 characters
    • Invalid usernames will be skipped with warnings
  • Monitor Rate Limits β€” Use appropriate thread counts to avoid GitHub rate limiting
  • Handle Private Profiles β€” Some data may not be available for users with privacy settings
  • Email Availability β€” Email extraction only works for publicly visible emails (most users keep emails private)

πŸ†˜ Support & Feedback

For questions, feature requests, or technical support:

  • Visit the Apify Community Forum
  • Contact us through the Apify platform
  • Submit issues for improvements and bug reports

🌟 Explore More Actors

✨ Need more scraping solutions? Discover additional actors on Apify for comprehensive web automation and data extraction. Explore our full range of tools at 🌐 Explore More Actors on Apify.

πŸ“§ For inquiries or custom development, reach out at apify@vulnv.com.