Medium Scraper
No credit card required
This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?
See alternative ActorsMedium Scraper
No credit card required
Medium Profile Scraper A robust asynchronous scraper for Medium profiles and articles, built with Python. A base scraper for finding Medium users and their articles A detailed profile scraper for gathering comprehensive user information Author Afnan Khan GitHub: 2Cloud-S LinkedIn: afnankhan-ak
Medium Profile Scraper
A robust asynchronous scraper for Medium profiles and articles, built with Python. This project consists of two main components:
- A base scraper for finding Medium users and their articles
- A detailed profile scraper for gathering comprehensive user information
Author
Afnan Khan
- GitHub: 2Cloud-S
- LinkedIn: afnankhan-ak
Features
Base Scraper (medium.py)
- Asynchronous scraping of Medium profiles
- Topic-based user discovery
- Premium content detection
- Website and email extraction
- Progress tracking and resumable scraping
- CSV export with duplicate prevention
Profile Scraper (profile_scraper.py)
- Detailed profile information extraction
- Bio and social links
- Article statistics and history
- User interests and topics
- Batch processing with rate limiting
- Structured data export
Installation
- Clone the repository:
1git clone https://github.com/2Cloud-S/medium-scraper.git 2cd medium-scraper
- Create and activate a virtual environment:
1python -m venv venv 2source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Usage
1. Base Scraper
Run the base scraper to collect Medium users:
python medium.py
This will:
- Scrape users from specified topics
- Save progress to
data/medium_users_progress.csv
- Export final results to
data/medium_users_final.csv
2. Profile Scraper
After collecting users, run the profile scraper:
python profile_scraper.py
This will:
- Read users from
medium_users_final.csv
- Collect detailed profile information
- Save results to
data/medium_profiles_detailed.csv
Data Structure
Base Scraper Output
- username
- is_premium
- has_newsletter
- website
- website_emails
- follower_count
- article_count
- premium_articles
Profile Scraper Output
- username
- bio
- total_claps
- total_responses
- following_count
- top_writer_in
- member_since
- last_active
- social_links
- interests
- latest_articles
Configuration
Modify the following in medium.py
:
topics
: List of topics to scrapeheaders
: Update cookies for authenticated requestssearch_paths
: Customize URL patterns
Rate Limiting
The scrapers implement:
- Random delays between requests
- Batch processing
- Error handling with retries
- Session management
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built with aiohttp for async operations
- Uses BeautifulSoup4 for HTML parsing
- Implements best practices for web scraping
Disclaimer
This tool is for educational purposes only. Be sure to comply with Medium's terms of service and implement appropriate delays between requests.
Actor Metrics
1 monthly user
-
0 No stars yet
>99% runs succeeded
Created in Feb 2025
Modified 6 days ago