Job Listings Aggregator Pro
Pricing
$10.00 / 1,000 results
Job Listings Aggregator Pro
Job Listings Aggregator – Find Jobs Fast! Search 8+ top job boards (LinkedIn, Indeed, RemoteOK, Dice, more) in one click. Get Python, tech & remote roles with smart deduplication, keyword filters & instant results. Supercharge your job hunt with this powerful, all-in-one Python scraper!
0.0 (0)
Pricing
$10.00 / 1,000 results
0
Total users
1
Monthly users
1
Last modified
a day ago
Job Listings Aggregator
A powerful Python application that crawls and aggregates job listings from multiple public job boards, providing a unified interface to search, filter, and manage job opportunities.
Features
Core Features
- Multi-source scraping: Fetch jobs from RemoteOK, We Work Remotely, and easily extensible to other job boards
- Intelligent data processing: Automatic normalization and deduplication of job listings
- Flexible storage: Support for both JSON files and SQLite database storage
- Advanced filtering: Filter by keywords, location, job type, company, and more
- Export capabilities: Export results to CSV or JSON formats
- CLI interface: Easy-to-use command-line interface
Advanced Features
- Automated scheduling: Set up daily automated scraping
- Data normalization: Consistent formatting across different job sources
- Duplicate detection: Smart deduplication based on job similarity
- Comprehensive logging: Detailed logging for monitoring and debugging
- Modular architecture: Easy to extend with new job boards
Installation
- Clone the repository:
git clone <repository-url>cd job-listings-aggregator
- Install dependencies:
$pip install -r requirements.txt
- Install Playwright browsers (optional, for JS-heavy sites):
$playwright install
Configuration
The application can be configured by modifying config.py
:
# Storage settingsSTORAGE_TYPE = 'json' # or 'sqlite'# Scraping settingsMAX_JOBS_PER_BOARD = 100REQUEST_DELAY = 1 # seconds between requests# Scheduler settingsSCHEDULE_TIME = '09:00' # Daily scraping timeENABLE_SCHEDULER = True
Usage
Command Line Interface
1. Scrape Jobs
# Scrape all jobs from enabled job boardspython main.py scrape# Scrape with specific keywordspython main.py scrape -k python -k developer -k remote# Limit jobs per boardpython main.py scrape -m 50
2. Search Jobs
# Search by keywordpython main.py search -k python# Search for remote jobspython main.py search --remote# Search by locationpython main.py search -l "San Francisco"# Complex search with multiple filterspython main.py search -k "data scientist" -l remote --company google# Export search resultspython main.py search -k python --export results.csvpython main.py search --remote --export jobs.json
3. View Statistics
# Show comprehensive statisticspython main.py stats
4. Manage Data
# Clear all saved jobspython main.py clear
5. Scheduler Management
# Start automated daily scrapingpython main.py schedule --start# Stop the schedulerpython main.py schedule --stop# Check scheduler statuspython main.py schedule --status# Run scraping immediatelypython main.py schedule --run-now
Python API
You can also use the application programmatically:
from main import JobAggregator# Initialize the aggregatorapp = JobAggregator()# Scrape jobs with keywordsjobs = app.scrape_all_jobs(keywords=['python', 'remote'])# Process and save jobsprocessed_jobs = app.process_jobs(jobs)app.save_jobs(processed_jobs)# Search saved jobsremote_jobs = app.search_jobs(remote_only=True, keyword='developer')# Print job detailsfor job in remote_jobs[:5]:print(f"{job.title} at {job.company}")print(f"Location: {job.location}")print(f"Link: {job.application_link}")print("---")
Architecture
Project Structure
job-listings-aggregator/├── main.py # CLI interface and main application├── config.py # Configuration settings├── requirements.txt # Python dependencies├── models/│ ├── __init__.py│ └── job_listing.py # Job data model├── scrapers/│ ├── __init__.py│ ├── base_scraper.py # Abstract scraper base class│ ├── remoteok_scraper.py│ └── weworkremotely_scraper.py├── storage/│ ├── __init__.py│ ├── base_storage.py # Abstract storage interface│ ├── json_storage.py # JSON file storage│ └── sqlite_storage.py # SQLite database storage├── utils/│ ├── __init__.py│ ├── normalizer.py # Data normalization utilities│ └── deduplicator.py # Duplicate detection├── filters/│ ├── __init__.py│ └── job_filters.py # Advanced filtering├── scheduler/│ ├── __init__.py│ └── job_scheduler.py # Automated scheduling└── data/ # Generated data directory├── job_listings.json└── job_listings.db
Data Model
Each job listing contains:
- title: Job title
- company: Company name
- location: Job location
- job_type: Employment type (Full-time, Remote, etc.)
- date_posted: When the job was posted
- description: Job description snippet
- application_link: URL to apply
- source: Which job board it came from
- id: Unique identifier
- scraped_at: When it was scraped
Adding New Job Boards
To add a new job board, create a new scraper class:
# scrapers/newboard_scraper.pyfrom .base_scraper import BaseScraperfrom models.job_listing import JobListingclass NewBoardScraper(BaseScraper):def __init__(self):super().__init__('newboard')def scrape_jobs(self, keywords=None, max_jobs=None):# Implement scraping logicjobs = []# ... scraping code ...return jobsdef parse_job_element(self, job_element):# Implement parsing logic for individual job elements# Return JobListing objectpass
Then update config.py
and the main application to include the new scraper.
Filtering Options
The application supports various filtering criteria:
- keyword: Match in title, description, or job type
- keywords: Multiple keywords (match any or all)
- location: Match job location
- remote_only: Filter for remote jobs only
- source: Filter by job board source
- company: Filter by company name
- job_type: Filter by employment type
- date_range: Jobs posted within N days
- exclude_keywords: Exclude jobs with certain keywords
- min_description_length: Minimum description length
- custom_filter: Apply custom filter function
Storage Options
JSON Storage (Default)
- Human-readable format
- Easy to inspect and edit
- Good for smaller datasets
- Portable across systems
SQLite Storage
- Better performance for large datasets
- SQL query capabilities
- ACID compliance
- Indexing for faster searches
Switch between storage types by setting STORAGE_TYPE
in config.py
.
Logging
The application provides comprehensive logging:
- Console output for user feedback
- File logging to
job_aggregator.log
- Different log levels for components
- Structured logging for monitoring
Best Practices
Respectful Scraping
- Built-in delays between requests
- Retry logic with exponential backoff
- User-Agent headers
- Respect robots.txt (implement if needed)
Data Quality
- Automatic data normalization
- Duplicate detection and removal
- Input validation and cleaning
- Error handling and logging
Performance
- Configurable limits on jobs per board
- Efficient storage options
- Background scheduling
- Memory-conscious processing
Troubleshooting
Common Issues
- No jobs found: Check if job boards have changed their HTML structure
- Connection errors: Verify internet connection and site availability
- Import errors: Ensure all dependencies are installed
- Permission errors: Check file/directory permissions for data storage
Debug Mode
Run with verbose logging:
$python main.py -v scrape
Logs
Check the log file for detailed error information:
$tail -f job_aggregator.log
Contributing
- Fork the repository
- Create a feature branch
- Add new scrapers, filters, or storage backends
- Include tests for new functionality
- Submit a pull request
License
This project is open source. Please use responsibly and respect job board terms of service.
Disclaimer
This tool is for educational and personal use. Always check and comply with the terms of service of the job boards you're scraping. Be respectful with request rates and consider using official APIs when available.
On this page
Share Actor: