Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog ,
and this project adheres to Semantic Versioning .
[2.1.0] - 2025-01-14
🆕 Business Intelligence & Pricing Data
New Company Fields
Min project size : Minimum project budget requirement ($USD)
Hourly rate range : From/to hourly rates ($USD/hr)
Employees count : Team size range (e.g., "50-249", "2-9")
Year founded : Company establishment year
Client testimonials : "What Clients Have Said" summary with pricing insights
Enhanced Review Analysis
Project Summary : Detailed project description extracted from reviews
Feedback Summary : Results and outcomes summary
Detailed Ratings : Quality, Schedule, Cost ratings with individual comments
Full Review : Complete review text with improved text cleaning
Technical Improvements
Advanced text cleaning : Removes newlines, tabs, excessive whitespace from all fields
Position-based parsing : More reliable extraction using HTML structure analysis
Robust fallback logic : Multiple extraction strategies for maximum data coverage
Universal parsing : Works across different company types and profiles
🔧 Output Schema Updates
Added new fields to CSV/JSON export
Enhanced Apify Dataset Schema with new pricing and review fields
Updated overview and reviews views in Apify Console UI
Improved field labels and formatting
[2.0.0] - 2025-01-10
🚀 Major Enhancements
New Data Fields
Domain extraction : Real company domains with intelligent parsing
Numeric fields : minProjectSize
, hourlyRateFrom/To
, employeesFrom/To
as numbers
Industries breakdown : Complete industry focus with percentages
Client focus : Client size/type distribution with percentages
Most common project size : Extracted as numeric value
Timezone : Company timezone information
Multiple addresses : Support for multiple office locations
Enhanced verification : Detailed business entity and legal filing status
Intelligent deduplication : Global tracking prevents duplicate companies across pages
Follow redirects : Option to follow redirects for accurate domain extraction
Robust error handling : Graceful handling of frame detachment and timeout errors
Increased timeouts : Protocol timeout increased to 3 minutes for stability
Smart retry logic : Failed requests retry up to 3 times
Export Improvements
CSV Excellence : Switched to csv-writer
library for perfect CSV generation
Array formatting : Services, industries as separate columns with percentages
HTML entity decoding : All special characters properly decoded
Indexed columns : Reviews and portfolio items in numbered columns
Clean data : Aggressive whitespace and quote cleaning
New Features
LIST_WEBSITES mode : Optimized scraping mode for faster results
Review sorting : Sort reviews by relevance, date, or rating
Detailed statistics : Comprehensive runtime statistics with:
Duplicate count tracking
Error categorization
Performance metrics
Beautiful console output
📊 Statistics Dashboard
📊 SUMMARY STATISTICS :
• Total unique companies collected
• Duplicates removed
• Runtime and speed metrics
• Error breakdown by type
• Performance analysis
🔧 Technical Improvements
Enhanced TypeScript types
Better memory management
Optimized concurrency settings
Improved logging with progress tracking
Clean code architecture
🐛 Bug Fixes
Fixed double-saving of companies in LIST_DETAIL mode
Resolved CSV export issues with nested objects
Fixed review parsing selectors
Corrected domain extraction logic
Fixed race conditions in parallel processing
[1.0.0] - 2024-01-09
Added
Initial release of Clutch.co Scraper
Support for scraping company listings with filters
Detailed company profile extraction including:
Basic information (name, website, tagline, rating)
Verification status and details
Business entity information
Portfolio items
Client reviews with ratings
Service focus areas with percentages
Founded date and full description
Multiple scraping modes: LIST, LIST_DETAIL
Pagination support with customizable limits
Review scraping with configurable max reviews per company
Portfolio scraping (optional)
Custom data extension via extendOutputFunction
Dataset clearing option for development
Support for both headless and headful browser modes
Comprehensive error handling and retry logic
TypeScript support with full type definitions
Optimized Docker image with Puppeteer pre-installed
Features
Flexible URL support : Direct profile URLs or filtered list pages
Smart pagination : Handles Clutch's pagination with configurable limits
Data enrichment : Merges data from list and detail pages
Performance optimized : Configurable concurrency and request limits
Developer friendly : TypeScript, detailed logging, clear error messages
Technical Details
Built with Crawlee (formerly Apify SDK v3)
Uses PuppeteerCrawler for JavaScript rendering
Implements request routing for different page types
Includes anti-blocking measures (proxy support, fingerprinting)
[0.1.0] - 2024-01-08
Added
Initial development version
Basic scraping functionality
Testing and debugging features