Medium Scraper avatar

Medium Scraper

Under maintenance
Go to Store
This Actor is under maintenance.

This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?

See alternative Actors
Medium Scraper

Medium Scraper

2clouds/medium-scraper

Medium Profile Scraper A robust asynchronous scraper for Medium profiles and articles, built with Python. A base scraper for finding Medium users and their articles A detailed profile scraper for gathering comprehensive user information Author Afnan Khan GitHub: 2Cloud-S LinkedIn: afnankhan-ak

Developer
Maintained by Community

Medium Profile Scraper

A robust asynchronous scraper for Medium profiles and articles, built with Python. This project consists of two main components:

  1. A base scraper for finding Medium users and their articles
  2. A detailed profile scraper for gathering comprehensive user information

Author

Afnan Khan

Features

Base Scraper (medium.py)

  • Asynchronous scraping of Medium profiles
  • Topic-based user discovery
  • Premium content detection
  • Website and email extraction
  • Progress tracking and resumable scraping
  • CSV export with duplicate prevention

Profile Scraper (profile_scraper.py)

  • Detailed profile information extraction
  • Bio and social links
  • Article statistics and history
  • User interests and topics
  • Batch processing with rate limiting
  • Structured data export

Installation

  1. Clone the repository:
1git clone https://github.com/2Cloud-S/medium-scraper.git
2cd medium-scraper
  1. Create and activate a virtual environment:
1python -m venv venv
2source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

1. Base Scraper

Run the base scraper to collect Medium users:

python medium.py

This will:

  • Scrape users from specified topics
  • Save progress to data/medium_users_progress.csv
  • Export final results to data/medium_users_final.csv

2. Profile Scraper

After collecting users, run the profile scraper:

python profile_scraper.py

This will:

  • Read users from medium_users_final.csv
  • Collect detailed profile information
  • Save results to data/medium_profiles_detailed.csv

Data Structure

Base Scraper Output

  • username
  • is_premium
  • has_newsletter
  • email
  • website
  • website_emails
  • follower_count
  • article_count
  • premium_articles

Profile Scraper Output

  • username
  • bio
  • total_claps
  • total_responses
  • following_count
  • top_writer_in
  • member_since
  • last_active
  • social_links
  • interests
  • latest_articles

Configuration

Modify the following in medium.py:

  • topics: List of topics to scrape
  • headers: Update cookies for authenticated requests
  • search_paths: Customize URL patterns

Rate Limiting

The scrapers implement:

  • Random delays between requests
  • Batch processing
  • Error handling with retries
  • Session management

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built with aiohttp for async operations
  • Uses BeautifulSoup4 for HTML parsing
  • Implements best practices for web scraping

Disclaimer

This tool is for educational purposes only. Be sure to comply with Medium's terms of service and implement appropriate delays between requests.