
Github Profile Scraper
Pricing
$20.00/month + usage

Github Profile Scraper
Scrapes GitHub user profiles including bio, repositories, followers, contributions, and more. Accepts a list of usernames and extracts comprehensive profile data.
5.0 (1)
Pricing
$20.00/month + usage
1
1
1
Last modified
12 hours ago
π GitHub Profile Scraper β‘ Extract Developer Profiles at Scale
Overview
The GitHub Profile Scraper is a powerful Apify Actor designed to extract comprehensive data from GitHub user profiles efficiently. Perfect for recruitment, developer research, competitive analysis, or building developer databases β this scraper provides detailed insights into GitHub users' professional profiles, repositories, and contributions.
β Bulk username processing | β Comprehensive profile data | β Email extraction (when public) | β Repository analysis | β Contribution tracking
Complete Profile Data Extraction
- Basic Information β Name, username, bio, location, website
- Contact Details β Email addresses (when publicly visible)
- Professional Details β Company, Twitter/X handle
- Network Statistics β Followers, following counts
- Repository Data β Public repositories count, pinned repositories with details
- Activity Metrics β Contribution counts and contribution graph data
- Social Links β Website, social media profiles
- Starred Repositories β List of starred projects (when accessible)
Key Features
- Bulk Processing β Process multiple GitHub usernames in one run
- Smart Email Detection β Extracts emails using multiple methods including
itemprop="email"
elements (only for publicly visible emails) - Proxy Support β Built-in Apify proxy integration for reliable scraping
- Error Handling β Robust error handling with detailed status reporting
- Clean JSON Output β Structured, ready-to-use data format
- Username Validation β Automatic username cleaning and validation with GitHub format requirements
- Format Flexibility β Accepts various username formats and automatically normalizes them
π§Ύ Input Configuration
Submit an array of GitHub usernames via the input schema:
{"usernames": ["johndeveloper","jane-coder","techexpert123","@another-user","https://github.com/some-developer"],"max_threads": 5,"proxy_configuration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Note: The scraper automatically normalizes different username formats and validates them against GitHub's requirements. Invalid usernames will be skipped with warning messages.
Input Parameters
-
Usernames (required):
- Array of GitHub usernames to scrape
- Supported formats:
username
,@username
,github.com/username
,https://github.com/username
- Username requirements: Must follow GitHub's username rules (alphanumeric characters and hyphens, no consecutive hyphens, cannot start/end with hyphen, max 39 characters)
- Invalid usernames will be automatically filtered out with warnings
-
Max Threads (optional):
- Number of concurrent threads for scraping (1-20)
- Default: 5
- Higher values = faster processing but may increase chance of rate limiting
-
Proxy Configuration (recommended):
- Enable Apify proxy to avoid rate limiting
- Recommended for bulk scraping operations
π€ Output Format
Each GitHub profile returns structured data such as:
{"username": "johndeveloper","status": "success","name": "John Developer","bio": "Full-stack developer passionate about open source","location": "San Francisco, CA","email": "john@example.com","website": "https://johndeveloper.dev","twitter": "john_codes","followers": "1234","following": "456","repos_count": "42","contribs": "567 contributions in the last year","pinnedrepos": [{"name": "awesome-project","url": "https://github.com/johndeveloper/awesome-project","desc": "An innovative web application framework","lang": "JavaScript","stars": "2,500","forks": "320"}],"repos": [{"url": "https://github.com/johndeveloper/web-framework","name": "web-framework","desc": "Modern web development framework","stars": "1850","forks": "210","languages": [{"lang": "JavaScript", "percent": "78.2%"},{"lang": "TypeScript", "percent": "18.5%"}]}],"starred_repos_list": [{"url": "https://github.com/example-org/popular-tool","name": "popular-tool"}],"contrib_matrix": [{"date": "2024-01-01","count": "3","level": "1"}]}
Error Handling
Failed profiles return structured error information:
{"username": "nonexistent-user","status": "not_found","message": "User not found"}
Common Error Cases:
not_found
β User doesn't exist or profile is privateerror
β Network issues or scraping errors- Invalid usernames are filtered out before processing with warning logs
πΌ Common Use Cases
Recruitment & Talent Sourcing
- Research developer profiles and technical expertise
- Analyze contribution patterns and project involvement
- Build comprehensive talent pipelines with GitHub activity data
- Assess coding skills through repository analysis
Developer Research & Analysis
- Study open source community members and contributors
- Analyze technology trends through developer profiles
- Research competitor team structures and technical expertise
- Track developer career progression and project involvement
Lead Generation & Business Development
- Extract contact information for developer outreach
- Build databases of potential customers in tech sectors
- Identify decision-makers in technology companies
- Enrich existing contact databases with GitHub profiles
Community Building & Networking
- Find developers with specific skills or interests
- Build communities around particular technologies
- Identify potential collaborators for open source projects
- Research conference speakers and industry experts
π Output & Export Options
Dataset Storage
- All extracted data stored in Apify dataset
- Each profile becomes one dataset item
- Status tracking for successful and failed extractions
Export Formats
- JSON β Raw structured data for API integration
- CSV β Spreadsheet-compatible format for analysis
- Excel β Formatted spreadsheet with profile data
Data Processing
- Clean, validated usernames
- Structured error reporting
- Comprehensive logging for troubleshooting
β‘ Quick Start Guide
-
Configure Input:
- Add GitHub usernames to the
usernames
array - Set desired
max_threads
(recommended: 5-10) - Enable proxy configuration for reliable scraping
- Add GitHub usernames to the
-
Run the Actor:
- Execute through Apify Console or API
- Monitor progress through real-time logs
- Review extracted data in the dataset
-
Export Results:
- Download data in your preferred format
- Integrate with your existing tools and workflows
π‘οΈ Privacy & Compliance
- Public Data Only β Extracts only publicly visible profile information
- Respects Privacy Settings β Email extraction only works for publicly visible emails
- Rate Limiting β Built-in delays and proxy support to respect GitHub's terms
- Error Handling β Graceful handling of private or restricted profiles
π§ Technical Details
Built With
- Python & BeautifulSoup β Efficient HTML parsing and data extraction
- Apify SDK β Robust actor framework with built-in storage and proxy support
- Multi-threading β Concurrent processing for improved performance
- Request Handling β Smart retry mechanisms and error recovery
Performance
- Process hundreds of profiles per run
- Configurable concurrency for optimal speed
- Proxy rotation for reliable access
- Comprehensive error logging and recovery
π Example Results
Successful Profile Extraction
{"username": "jane-coder","status": "success","name": "Jane Smith","bio": "Frontend developer specializing in React and TypeScript. Open source enthusiast.","location": "Austin, TX","email": null,"website": "https://jane-codes.dev","followers": "3456","following": "234","repos_count": "87","pinnedrepos": [{"name": "react-toolkit","desc": "Comprehensive React development toolkit","stars": "8500","lang": "TypeScript"}]}
π‘ Tips for Best Results
- Enable Proxies β Use Apify proxy configuration for reliable large-scale scraping
- Username Format β Ensure usernames follow GitHub's format rules:
- Only alphanumeric characters and hyphens allowed
- Cannot start or end with a hyphen
- No consecutive hyphens (e.g.,
user--name
is invalid) - Maximum 39 characters
- Invalid usernames will be skipped with warnings
- Monitor Rate Limits β Use appropriate thread counts to avoid GitHub rate limiting
- Handle Private Profiles β Some data may not be available for users with privacy settings
- Email Availability β Email extraction only works for publicly visible emails (most users keep emails private)
π Support & Feedback
For questions, feature requests, or technical support:
- Visit the Apify Community Forum
- Contact us through the Apify platform
- Submit issues for improvements and bug reports
π Explore More Actors
β¨ Need more scraping solutions? Discover additional actors on Apify for comprehensive web automation and data extraction. Explore our full range of tools at π Explore More Actors on Apify.
π§ For inquiries or custom development, reach out at apify@vulnv.com.