Job Listings Aggregator Pro avatar
Job Listings Aggregator Pro

Pricing

$10.00 / 1,000 results

Go to Store
Job Listings Aggregator Pro

Job Listings Aggregator Pro

Developed by

Gideon Nesh

Gideon Nesh

Maintained by Community

Job Listings Aggregator – Find Jobs Fast! Search 8+ top job boards (LinkedIn, Indeed, RemoteOK, Dice, more) in one click. Get Python, tech & remote roles with smart deduplication, keyword filters & instant results. Supercharge your job hunt with this powerful, all-in-one Python scraper!

0.0 (0)

Pricing

$10.00 / 1,000 results

0

Total users

1

Monthly users

1

Last modified

a day ago

Job Listings Aggregator

A powerful Python application that crawls and aggregates job listings from multiple public job boards, providing a unified interface to search, filter, and manage job opportunities.

Features

Core Features

  • Multi-source scraping: Fetch jobs from RemoteOK, We Work Remotely, and easily extensible to other job boards
  • Intelligent data processing: Automatic normalization and deduplication of job listings
  • Flexible storage: Support for both JSON files and SQLite database storage
  • Advanced filtering: Filter by keywords, location, job type, company, and more
  • Export capabilities: Export results to CSV or JSON formats
  • CLI interface: Easy-to-use command-line interface

Advanced Features

  • Automated scheduling: Set up daily automated scraping
  • Data normalization: Consistent formatting across different job sources
  • Duplicate detection: Smart deduplication based on job similarity
  • Comprehensive logging: Detailed logging for monitoring and debugging
  • Modular architecture: Easy to extend with new job boards

Installation

  1. Clone the repository:
git clone <repository-url>
cd job-listings-aggregator
  1. Install dependencies:
$pip install -r requirements.txt
  1. Install Playwright browsers (optional, for JS-heavy sites):
$playwright install

Configuration

The application can be configured by modifying config.py:

# Storage settings
STORAGE_TYPE = 'json' # or 'sqlite'
# Scraping settings
MAX_JOBS_PER_BOARD = 100
REQUEST_DELAY = 1 # seconds between requests
# Scheduler settings
SCHEDULE_TIME = '09:00' # Daily scraping time
ENABLE_SCHEDULER = True

Usage

Command Line Interface

1. Scrape Jobs

# Scrape all jobs from enabled job boards
python main.py scrape
# Scrape with specific keywords
python main.py scrape -k python -k developer -k remote
# Limit jobs per board
python main.py scrape -m 50

2. Search Jobs

# Search by keyword
python main.py search -k python
# Search for remote jobs
python main.py search --remote
# Search by location
python main.py search -l "San Francisco"
# Complex search with multiple filters
python main.py search -k "data scientist" -l remote --company google
# Export search results
python main.py search -k python --export results.csv
python main.py search --remote --export jobs.json

3. View Statistics

# Show comprehensive statistics
python main.py stats

4. Manage Data

# Clear all saved jobs
python main.py clear

5. Scheduler Management

# Start automated daily scraping
python main.py schedule --start
# Stop the scheduler
python main.py schedule --stop
# Check scheduler status
python main.py schedule --status
# Run scraping immediately
python main.py schedule --run-now

Python API

You can also use the application programmatically:

from main import JobAggregator
# Initialize the aggregator
app = JobAggregator()
# Scrape jobs with keywords
jobs = app.scrape_all_jobs(keywords=['python', 'remote'])
# Process and save jobs
processed_jobs = app.process_jobs(jobs)
app.save_jobs(processed_jobs)
# Search saved jobs
remote_jobs = app.search_jobs(remote_only=True, keyword='developer')
# Print job details
for job in remote_jobs[:5]:
print(f"{job.title} at {job.company}")
print(f"Location: {job.location}")
print(f"Link: {job.application_link}")
print("---")

Architecture

Project Structure

job-listings-aggregator/
├── main.py # CLI interface and main application
├── config.py # Configuration settings
├── requirements.txt # Python dependencies
├── models/
│ ├── __init__.py
│ └── job_listing.py # Job data model
├── scrapers/
│ ├── __init__.py
│ ├── base_scraper.py # Abstract scraper base class
│ ├── remoteok_scraper.py
│ └── weworkremotely_scraper.py
├── storage/
│ ├── __init__.py
│ ├── base_storage.py # Abstract storage interface
│ ├── json_storage.py # JSON file storage
│ └── sqlite_storage.py # SQLite database storage
├── utils/
│ ├── __init__.py
│ ├── normalizer.py # Data normalization utilities
│ └── deduplicator.py # Duplicate detection
├── filters/
│ ├── __init__.py
│ └── job_filters.py # Advanced filtering
├── scheduler/
│ ├── __init__.py
│ └── job_scheduler.py # Automated scheduling
└── data/ # Generated data directory
├── job_listings.json
└── job_listings.db

Data Model

Each job listing contains:

  • title: Job title
  • company: Company name
  • location: Job location
  • job_type: Employment type (Full-time, Remote, etc.)
  • date_posted: When the job was posted
  • description: Job description snippet
  • application_link: URL to apply
  • source: Which job board it came from
  • id: Unique identifier
  • scraped_at: When it was scraped

Adding New Job Boards

To add a new job board, create a new scraper class:

# scrapers/newboard_scraper.py
from .base_scraper import BaseScraper
from models.job_listing import JobListing
class NewBoardScraper(BaseScraper):
def __init__(self):
super().__init__('newboard')
def scrape_jobs(self, keywords=None, max_jobs=None):
# Implement scraping logic
jobs = []
# ... scraping code ...
return jobs
def parse_job_element(self, job_element):
# Implement parsing logic for individual job elements
# Return JobListing object
pass

Then update config.py and the main application to include the new scraper.

Filtering Options

The application supports various filtering criteria:

  • keyword: Match in title, description, or job type
  • keywords: Multiple keywords (match any or all)
  • location: Match job location
  • remote_only: Filter for remote jobs only
  • source: Filter by job board source
  • company: Filter by company name
  • job_type: Filter by employment type
  • date_range: Jobs posted within N days
  • exclude_keywords: Exclude jobs with certain keywords
  • min_description_length: Minimum description length
  • custom_filter: Apply custom filter function

Storage Options

JSON Storage (Default)

  • Human-readable format
  • Easy to inspect and edit
  • Good for smaller datasets
  • Portable across systems

SQLite Storage

  • Better performance for large datasets
  • SQL query capabilities
  • ACID compliance
  • Indexing for faster searches

Switch between storage types by setting STORAGE_TYPE in config.py.

Logging

The application provides comprehensive logging:

  • Console output for user feedback
  • File logging to job_aggregator.log
  • Different log levels for components
  • Structured logging for monitoring

Best Practices

Respectful Scraping

  • Built-in delays between requests
  • Retry logic with exponential backoff
  • User-Agent headers
  • Respect robots.txt (implement if needed)

Data Quality

  • Automatic data normalization
  • Duplicate detection and removal
  • Input validation and cleaning
  • Error handling and logging

Performance

  • Configurable limits on jobs per board
  • Efficient storage options
  • Background scheduling
  • Memory-conscious processing

Troubleshooting

Common Issues

  1. No jobs found: Check if job boards have changed their HTML structure
  2. Connection errors: Verify internet connection and site availability
  3. Import errors: Ensure all dependencies are installed
  4. Permission errors: Check file/directory permissions for data storage

Debug Mode

Run with verbose logging:

$python main.py -v scrape

Logs

Check the log file for detailed error information:

$tail -f job_aggregator.log

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add new scrapers, filters, or storage backends
  4. Include tests for new functionality
  5. Submit a pull request

License

This project is open source. Please use responsibly and respect job board terms of service.

Disclaimer

This tool is for educational and personal use. Always check and comply with the terms of service of the job boards you're scraping. Be respectful with request rates and consider using official APIs when available.