πΏ iHerb Product Scraper
Pricing
Pay per usage
πΏ iHerb Product Scraper
Rapidly extract product listing details from iHerb. This actor is optimized for speed, gathering essential data like prices, ratings, and stock status directly from list views. For the best results and to ensure uninterrupted scraping, the use of residential proxies is strongly advised.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Shahid Irfan
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
iHerb Product Scraper
Extract comprehensive product data from iHerb including supplements, vitamins, health foods, beauty products, and sports nutrition. Get structured product information with prices, ratings, reviews, ingredients, and detailed specifications.
What is iHerb Product Scraper?
iHerb Product Scraper is a powerful extraction tool designed to collect detailed product information from iHerb.com, one of the world's largest online retailers of natural health products and supplements. This scraper enables automated data collection for market research, price monitoring, product analysis, and competitive intelligence.
Key Benefits
- Comprehensive Data - Extract complete product details including titles, prices, ratings, reviews, ingredients, and specifications
- Fast Performance - Uses JSON API extraction as primary method for 10-50x faster data collection
- Reliable Results - Multiple extraction strategies ensure consistent data retrieval
- Price Monitoring - Track pricing changes, discounts, and special offers automatically
- Market Intelligence - Analyze product trends, ratings, and customer reviews
- Flexible Filtering - Sort and filter products by rating, price, popularity, and availability
Features
Product Information Extracted
Advanced Capabilities
- JSON API Extraction - Primary method using iHerb's internal APIs for fast, reliable data collection
- HTML Parsing Fallback - Automatic fallback to HTML parsing ensures data extraction even when APIs change
- Smart Pagination - Automatically navigates through multiple pages to collect all matching products
- Duplicate Prevention - Built-in deduplication prevents collecting the same product multiple times
- Stealth Technology - Uses advanced browser fingerprinting evasion for undetected scraping
- Parallel Processing - Concurrent data enrichment for faster detailed product information retrieval
Quick Start
Running on Apify Platform
- Open the Actor in Apify Console
- Configure your scraping parameters:
- Enter category URL or select category (e.g., "supplements", "vitamins")
- Set maximum products to scrape
- Choose sorting order and filters
- Enable product enrichment for detailed data (optional)
- Click Start and wait for results
- Download your data in JSON, CSV, Excel, or other formats
Input Configuration Example
{"categoryUrl": "https://www.iherb.com/c/supplements","maxProducts": 100,"sortBy": "rating","inStockOnly": true}
Input Parameters
Configure the scraper behavior with these parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
categoryUrl | String | No | Direct iHerb category URL to scrape. If provided, other parameters are ignored. Example: https://www.iherb.com/c/supplements |
category | String | No | Product category to scrape (default: "supplements"). Examples: "vitamins", "beauty", "sports-nutrition", "grocery" |
maxProducts | Integer | No | Maximum number of products to scrape (default: 50, 0 = unlimited, max: 10000) |
sortBy | String | No | Sort order: "relevance", "price-asc", "price-desc", "rating", "newest", "bestselling" (default: "relevance") |
inStockOnly | Boolean | No | Show only products currently in stock (default: true) |
enrichProducts | Boolean | No | Fetch complete details including descriptions and ingredients. Slower but more comprehensive (default: false) |
proxyConfiguration | Object | No | Proxy settings for reliable scraping (recommended for large-scale operations) |
Output Data
Each product includes structured data in the following format:
{"title": "California Gold Nutrition, Vitamin C, 1,000 mg, 60 Veggie Capsules","productId": "92891","brand": "California Gold Nutrition","price": "6.50","rating": 4.6,"reviews": 1847,"inStock": true,"imageUrl": "https://s3.images-iherb.com/cog/cog92891/y/15.jpg","url": "https://www.iherb.com/pr/california-gold-nutrition-vitamin-c-1-000-mg-60-veggie-capsules/92891","category": "supplements","scrapedAt": "2026-01-01T10:30:00.000Z"}
Data Fields
| Field | Type | Description |
|---|---|---|
title | String | Full product name and description |
productId | String | Unique iHerb product identifier |
brand | String | Manufacturer or brand name |
price | String | Current selling price |
rating | Number | Average customer rating (1-5 stars) |
reviews | Number | Total number of customer reviews |
inStock | Boolean | Product availability status |
imageUrl | String | Product image URL |
url | String | Direct link to product page |
category | String | Product category extracted from URL |
scrapedAt | String | ISO timestamp when data was collected |
Export Formats
Download your scraped data in multiple formats:
- JSON - Structured data for applications and APIs
- CSV - Spreadsheet compatible format
- Excel - Advanced data analysis and reporting
- XML - Enterprise system integration
- RSS - Feed subscriptions and monitoring
- HTML Table - Web display and embedding
Usage Examples
Example 1: Top-Rated Vitamin C Products
{"categoryUrl": "https://www.iherb.com/c/vitamin-c","maxProducts": 50,"sortBy": "rating","inStockOnly": true}
Example 2: Price Monitoring for Supplements
{"category": "supplements","maxProducts": 200,"sortBy": "bestselling","enrichProducts": false}
Example 3: Detailed Beauty Product Analysis
{"category": "beauty","maxProducts": 100,"sortBy": "newest","enrichProducts": true}
Example 4: Sports Nutrition Market Research
{"categoryUrl": "https://www.iherb.com/c/sports-nutrition","maxProducts": 500,"sortBy": "price-asc","inStockOnly": true}
Integration
Apify API
Access scraped data programmatically:
$curl "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs/last/dataset/items?token=YOUR_API_TOKEN"
JavaScript Integration
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('YOUR_ACTOR_ID').call({category: 'supplements',maxProducts: 100,sortBy: 'rating'});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python Integration
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('YOUR_ACTOR_ID').call(run_input={'category': 'vitamins','maxProducts': 100,'sortBy': 'bestselling','inStockOnly': True})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
Automation and Scheduling
Set Up Automated Scraping
Monitor product data continuously by scheduling regular runs:
- Navigate to Schedules in Apify Console
- Create a new schedule (hourly, daily, weekly, or custom cron expression)
- Configure your input parameters
- Enable notifications for run completion
- Connect to downstream systems via webhooks
Integration Options
- Webhooks - Trigger actions when scraping completes
- Zapier - Connect to 5000+ apps without coding
- Make (Integromat) - Build complex automation workflows
- Google Sheets - Auto-export data to spreadsheets
- Slack/Discord - Receive notifications with results
- Database Integration - Push data to PostgreSQL, MongoDB, MySQL
Use Cases
Price Monitoring and Competitive Analysis
Track pricing changes across thousands of products to optimize your pricing strategy. Monitor competitor prices, discount patterns, and identify market opportunities.
Product Research and Development
Analyze successful products, customer reviews, and ratings to inform new product development. Identify gaps in the market and trending ingredients.
Market Intelligence
Understand market trends, popular categories, and emerging brands. Track product launches, seasonal patterns, and consumer preferences.
Affiliate Marketing
Build product databases for affiliate websites. Automatically update product information, prices, and availability for your comparison sites or review platforms.
Inventory Management
Monitor product availability and stock status across categories. Receive alerts when products come back in stock or go out of stock.
Customer Sentiment Analysis
Collect review data and ratings to analyze customer satisfaction trends. Identify product quality issues and customer pain points.
E-commerce Platform Integration
Power your e-commerce platform with fresh product data. Sync inventory, pricing, and product information automatically.
Performance and Reliability
Scraping Performance
Reliability Features
- Automatic Retries - Failed requests are automatically retried with exponential backoff
- Duplicate Detection - Prevents collecting the same product multiple times
- Error Recovery - Continues scraping even if individual products fail
- State Persistence - Can resume from where it left off after interruptions
- Proxy Support - Built-in proxy rotation for reliable data collection
Technical Implementation
Extraction Methodology
1. JSON API Extraction (Primary Method)
The scraper monitors network requests and extracts data directly from iHerb's internal JSON APIs. This method provides:
- 10-50x faster performance than HTML parsing
- Structured, clean data without parsing overhead
- Reliable field extraction without selector maintenance
2. HTML Parsing (Fallback Method)
When API extraction is unavailable, the scraper uses intelligent HTML parsing with multiple selector strategies:
- Primary selectors targeting standard class names
- Data attribute extraction for structured content
- Generic fallback patterns for layout changes
Architecture Benefits
| Strategy | Speed | Reliability | Use Case |
|---|---|---|---|
| JSON API | β‘β‘β‘β‘β‘ Very Fast | β β β β Excellent | Primary extraction method |
| HTML Parsing | β‘β‘β‘ Fast | β β β Good | Fallback when APIs unavailable |
| Product Enrichment | β‘β‘ Moderate | β β β Good | Detailed data collection |
Data Quality Assurance
- Field Validation - All extracted fields validated before storage
- Type Checking - Ensures correct data types (numbers, booleans, strings)
- Deduplication - Product IDs and URLs checked to prevent duplicates
- Normalization - Whitespace trimming, price formatting, consistent structure
- Timestamps - Automatic ISO 8601 timestamps for tracking data freshness
Best Practices
Optimizing Scraping Performance
- β Start Small - Test with low maxProducts values before large-scale runs
- β Use Proxies - Enable Apify proxies for reliable, uninterrupted scraping
- β Disable Enrichment - For faster results, set enrichProducts to false
- β Filter Early - Use inStockOnly to reduce unnecessary data collection
- β Schedule Wisely - Run during off-peak hours for better performance
Data Management
- β Export Regularly - Download results promptly to avoid storage limits
- β Version Control - Track data collection timestamps for historical analysis
- β Validate Output - Check sample results before processing large datasets
- β Handle Errors - Implement error handling in downstream integrations
Frequently Asked Questions
Support and Resources
Get Help
- π Documentation: Apify Documentation
- π¬ Community: Join Discord Server
- π Bug Reports: Submit via Actor feedback in Console
- π§ Support: Contact through Apify Console
Related Resources
License
This Actor is licensed under the Apache License 2.0.
Keywords
iherb scraper, product scraper, supplements data, vitamin extractor, health products, nutrition data, e-commerce scraper, price monitoring, product research, market analysis, competitive intelligence, affiliate data, inventory tracking, customer reviews, product ratings
Start scraping iHerb products today!
Open in Apify Console
Architecture Benefits
| Strategy | Speed | Reliability | Complexity |
|---|---|---|---|
| API Detection | β‘β‘β‘β‘β‘ | β | Low |
| HTML Parsing | β‘β‘β‘ | β β β | Medium |
| Camoufox Bypass | β‘β‘ | β β β β | High |
How It Works
- Page Navigation β Uses Camoufox for transparent Cloudflare bypass
- API Detection β Listens to network traffic for JSON API calls
- Data Extraction β
- Primary: Extract from captured APIs (if available)
- Fallback: Parse HTML with intelligent selectors
- Pagination β Automatically follows search results pages
- Data Storage β Pushes clean, validated data to dataset
Performance Characteristics
- Small Runs (< 100 jobs): ~1-2 minutes
- Medium Runs (100-500 jobs): ~3-5 minutes
- Large Runs (500+ jobs): ~10-15 minutes
- Cloudflare Bypass: Automatic, no manual intervention needed
- Proxy Support: Full proxy rotation for reliability
π Quick Start
Running on Apify Platform
- Open the Actor in Apify Console
- Configure your search parameters:
- Enter your search query (e.g., "Software Engineer")
- Set location (e.g., "New York, USA")
- Adjust additional filters as needed
- Click "Start" and wait for results
- Download your data in your preferred format
Input Configuration
{"searchQuery": "data analyst","location": "London, UK","maxJobs": 100,"newJobsOnly": true,"jobType": "permanent","radius": "30","salaryMin": 50000,"sortBy": "date"}
π₯ Input Parameters
Configure the scraper behavior with the following parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
searchQuery | String | β Yes | Job title or keywords (e.g., "administrator", "software engineer") |
location | String | β Yes | Location to search (e.g., "USA", "London, UK", "New York") |
maxJobs | Integer | β No | Maximum number of jobs to scrape (default: 100, 0 = unlimited) |
newJobsOnly | Boolean | β No | Show only recently posted jobs (default: true) |
jobType | String | β No | Employment type: all, permanent, contract, temp, parttime, internship (default: all) |
radius | String | β No | Search radius: 0, 10, 20, 30, 50, 100, 200 km/miles (default: 50) |
salaryMin | Integer | β No | Minimum annual salary filter |
sortBy | String | β No | Sort order: relevance or date (default: relevance) |
proxyConfiguration | Object | β No | Proxy settings for reliable scraping (recommended) |
π€ Output Data
Each job listing includes the following structured data:
{"title": "Senior Data Analyst","company": "Tech Solutions Inc.","location": "New York, NY","salary": "$80,000 - $100,000 per year","jobType": "Permanent","postedDate": "2 days ago","description": "We are seeking an experienced Data Analyst to join our growing team...","descriptionHtml": "<p>We are seeking an experienced Data Analyst...</p>","descriptionText": "We are seeking an experienced Data Analyst to join our growing team...","url": "https://www.careerjet.com/jobad/...","scrapedAt": "2024-12-20T10:30:00.000Z"}
Data Fields
| Field | Type | Description |
|---|---|---|
title | String | Job position title |
company | String | Hiring company name |
location | String | Job location (city, state, country) |
salary | String | Salary range or "Not specified" |
jobType | String | Employment type (Permanent, Contract, etc.) |
postedDate | String | When the job was posted |
description | String | Job description and requirements |
descriptionHtml | String | Raw HTML version of job description |
descriptionText | String | Plain text version of job description |
url | String | Direct link to job posting |
scrapedAt | String | ISO timestamp of data extraction |
π Export Formats
Download your scraped data in multiple formats:
- JSON - Structured data for applications
- CSV - Spreadsheet compatible
- Excel - Advanced data analysis
- XML - Enterprise integration
- RSS - Feed subscriptions
- HTML - Web display
π‘ Usage Examples
Example 1: Tech Jobs in San Francisco
{"searchQuery": "software engineer","location": "San Francisco, CA","maxJobs": 50,"newJobsOnly": true,"salaryMin": 120000,"sortBy": "date"}
Example 2: Remote Marketing Positions
{"searchQuery": "digital marketing","location": "Remote","jobType": "permanent","radius": "0","maxJobs": 100}
Example 3: Entry-Level Internships
{"searchQuery": "business analyst","location": "London, UK","jobType": "internship","newJobsOnly": true,"maxJobs": 30}
π§ Integration
Apify API
Access your scraped data programmatically:
$curl "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs/last/dataset/items?token=YOUR_API_TOKEN"
JavaScript/Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('YOUR_ACTOR_ID').call({searchQuery: 'data scientist',location: 'USA',maxJobs: 100});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('YOUR_ACTOR_ID').call(run_input={'searchQuery': 'python developer','location': 'Berlin, Germany','maxJobs': 50})dataset_items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(dataset_items)
π Automation & Scheduling
Integration Options
- Webhooks - Trigger actions on scraping completion
- Zapier - Connect to 5000+ apps without coding
- Make (Integromat) - Build complex automation workflows
- Google Sheets - Auto-export to spreadsheets
- Slack/Discord - Get notifications with results
π οΈ Technical Details
Scraping Engine Architecture
Network Monitoring & API Detection
The scraper actively monitors all network requests to Careerjet servers and automatically detects internal JSON APIs used to load job listings. When an API is found, the scraper uses it for direct data extractionβachieving 10-50x faster performance than HTML parsing alone.
How API Detection Works:
- Listener captures all HTTP requests/responses during page load
- Responses are analyzed for JSON data containing job information
- API endpoints are logged for performance metrics
- Data is extracted directly from structured API responses
HTML Parsing with Fallback Selectors
If no API endpoint is detected, the scraper uses intelligent CSS selectors with multiple fallback patterns to extract job data from the rendered HTML. This ensures compatibility even if page structure changes.
Selector Strategy:
- Primary selectors: Standard class names and semantic HTML
- Data attributes:
data-*attributes for structured data - Fallback patterns: Generic selectors matching common markup patterns
- Multiple selector attempts for each field
Cloudflare Bypass Technology
Camoufox - a privacy-focused Firefox fork - handles Cloudflare protection transparently:
- Randomized browser fingerprinting
- Dynamic OS and screen resolution
- Realistic timezone and locale
- Anti-detection headers
- No manual challenge solving required
Performance Optimizations
| Optimization | Impact | Implementation |
|---|---|---|
| API First | 10-50x faster | Network monitoring |
| Smart Caching | Reduced requests | Browser context reuse |
| Pagination | Complete data | Automatic next page detection |
| Proxy Rotation | Reliability | Apify proxy integration |
| Concurrent Processing | Throughput | Controlled concurrency (1-5 concurrent) |
Data Quality Assurance
- Field Validation - All extracted fields are validated before storage
- Deduplication - URLs are checked to prevent duplicate entries
- Cleanup - Whitespace trimming and text normalization
- Fallbacks - Missing optional fields default to "Not specified"
- Timestamps - Automatic ISO 8601 timestamps for all records
βοΈ Configuration Tips
Maximizing Results
- β Use specific keywords for better targeting
- β Enable proxies for reliable scraping
- β Set reasonable max jobs limits for faster runs
- β Use "New Jobs Only" for frequent scraping
- β Combine with location radius for broader coverage
Performance Optimization
- Small Runs (< 100 jobs): Fast results in 1-2 minutes
- Medium Runs (100-500 jobs): Typically 3-5 minutes
- Large Runs (500+ jobs): May take 10-15 minutes
π Use Cases & Applications
1. Recruitment & Talent Acquisition
Build a pipeline of qualified candidates by monitoring job postings for competitor companies and identifying in-demand skills.
2. Market Intelligence
Track hiring trends, salary ranges, and skill requirements across industries to inform business strategy.
3. Job Board Aggregation
Automatically populate your job board platform with fresh listings from Careerjet.
4. Career Research
Analyze job market conditions, growth sectors, and location-based opportunities for career guidance.
5. Salary Benchmarking
Gather compensation data across roles and locations for HR analytics and competitive salary structuring.
π οΈ Technical Details
Rate Limiting & Best Practices
- Respectful scraping with built-in delays
- Proxy rotation to avoid blocks
- Error handling and retry logic
- Cloudflare bypass capabilities
Data Quality
- Structured data extraction
- Duplicate detection
- Field validation
- Clean, normalized output
β FAQ
π€ Support & Feedback
π License
This Actor is licensed under the Apache License 2.0. See the LICENSE file for details.
π·οΈ Keywords
job scraper, careerjet, employment data, job search, recruitment automation, job listings, career data, hiring trends, job aggregator, salary data, job board, talent acquisition, hr analytics, job market research, employment search