Philosophy: Removed all unused fields to focus exclusively on aggregate salary intelligence - our core differentiator in a saturated job scraping market.
Removed Fields
company - Not available on aggregate salary pages
experienceLevel - Not exposed by Indeed's salary pages
Improved Hash Generation: Now includes salary values to detect actual compensation changes
Simplified Dataset Views: Reduced from 5 views to 2 essential views (Overview, Salary Ranges)
Updated Documentation: Removed job posting references, focused on salary aggregation use cases
Why This Matters
Market Positioning: Doubled down on salary intelligence niche vs. competing with 100+ job scrapers
Performance: Faster, cleaner output without unnecessary complexity
User Experience: Clear value proposition - "Fast, reliable salary aggregation for compensation benchmarking"
Version 1.0.16 (2025-10-09)
Fixed
Aggressive Deduplication: Changed deduplication key from jobTitle|company|location|experienceLevel to just jobTitle|location to avoid over-deduplication when fields are null (was causing 103 out of 104 records to be marked as duplicates)
NULL Values in Output: Improved scraping selectors to extract salary ranges (min/max) and sample sizes from Indeed's current HTML structure
Dataset Schema: Simplified from 5 views to 2 essential views (Overview, Salary Ranges) by removing irrelevant tabs that used fields the scraper doesn't populate (benefits, remotePolicy, industryAverage)
Improved
Scraper now extracts complete salary data including min, max, average, and sample size
Better parsing of Indeed's salary display elements with multiple fallback methods
More reliable page loading with domcontentloaded wait strategy