All notable changes to the Infojobs Scraper Actor will be documented in this file.
The format is based on Keep a Changelog ,
and this project adheres to Semantic Versioning .
API-based scraping : Replaced Playwright browser automation with direct HTTP requests to InfoJobs API
Performance : ~10x faster scraping (no browser overhead)
Pagination logic : Now stops when API returns 0 results (no more jobs) instead of relying on numPages
Location filter : Now uses location parameter with Spanish province names, maps to provinceIds API param
Work model filter : Maps to teleworkingIds API param with numeric IDs (1=onsite, 2=remote, 3=hybrid)
Optional parameters : Empty strings ("" ) are treated as undefined - filters only added when provided
Spanish provinces constant : src/constants/provinces.ts with province ID mappings
Work model constant : WORK_MODEL_TO_API_ID mapping for API parameter conversion
Location select dropdown : All 52 Spanish provinces in input schema
Work model select dropdown : remote, hybrid, onsite options in input schema
Better logging : Clear indication of search filters and pagination progress
Playwright dependency : No longer needed for browser automation
Smart scroll loading : Not required with API approach
JobParser class : Logic integrated into InfojobsApiScraper
New runtime : Uses Bun instead of Node.js
Simplified architecture : Single scraper class instead of multiple classes
Faster iteration : Jobs collected at ~20-22 per page with minimal delay
Input validation with Zod schemas : All input parameters are now validated using Zod for type safety and automatic error handling
Output validation : All scraped jobs are validated before being pushed to the dataset
Job schema validation : Jobs are validated against jobSchema to ensure data integrity
JobParser class : Extracted parsing logic into a dedicated class following Single Responsibility Principle
Technology extraction : Automatic detection of 400+ technologies including programming languages, frameworks, cloud platforms, and tools
Graceful abort handling : Actor handles the aborting event for clean shutdown
Smart scroll loading : Intelligent scroll mechanism that detects when new content stops loading
Input schema documentation : Complete input schema with descriptions and defaults
Dataset schema : Organized dataset output schema with table view
Default jobsNumber : Changed from 50 to 100, then to 200
Minimum jobsNumber : Set to 20
Removed maximum jobsNumber : No longer limited to 400
Scroll strategy : Improved from fixed 5 scrolls to smart scrolling with content detection
Link extraction limit : Increased from 25 to 50 job links per page
Refactored InfojobsScraper : Uses JobParser for cleaner separation of concerns
Hardcoded pagination : buildSearchUrls now correctly uses the jobsNumber parameter
Type safety : All internal types now properly exported and used
ESLint compliance : Fixed all linting errors including unused variables and import sorting
Dependencies : Added zod for schema validation
Project structure : Organized into classes, schemas, constants, and utils directories
Generated by : Updated to track actor generation source
[0.0.1] - Initial Release
Basic job scraping from InfoJobs
Keyword-based search
Location filtering (Spanish provinces)
Work model filtering (remote, hybrid, onsite)
Company information extraction
Salary data extraction
Posted date parsing
Playwright-based browser automation