🌿 iHerb Product Scraper avatar
🌿 iHerb Product Scraper

Pricing

Pay per usage

Go to Apify Store
🌿 iHerb Product Scraper

🌿 iHerb Product Scraper

Rapidly extract product listing details from iHerb. This actor is optimized for speed, gathering essential data like prices, ratings, and stock status directly from list views. For the best results and to ensure uninterrupted scraping, the use of residential proxies is strongly advised.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

iHerb Product Scraper

Extract comprehensive product data from iHerb including supplements, vitamins, health foods, beauty products, and sports nutrition. Get structured product information with prices, ratings, reviews, ingredients, and detailed specifications.


What is iHerb Product Scraper?

iHerb Product Scraper is a powerful extraction tool designed to collect detailed product information from iHerb.com, one of the world's largest online retailers of natural health products and supplements. This scraper enables automated data collection for market research, price monitoring, product analysis, and competitive intelligence.

Key Benefits

  • Comprehensive Data - Extract complete product details including titles, prices, ratings, reviews, ingredients, and specifications
  • Fast Performance - Uses JSON API extraction as primary method for 10-50x faster data collection
  • Reliable Results - Multiple extraction strategies ensure consistent data retrieval
  • Price Monitoring - Track pricing changes, discounts, and special offers automatically
  • Market Intelligence - Analyze product trends, ratings, and customer reviews
  • Flexible Filtering - Sort and filter products by rating, price, popularity, and availability

Features

Product Information Extracted

Basic Information

  • Product title and ID
  • Brand/manufacturer name
  • Product category
  • Product URL and images

Pricing Data

  • Current price
  • Original price
  • Discount percentage
  • Stock availability

Customer Feedback

  • Average rating (stars)
  • Number of reviews
  • Customer testimonials

Product Details (Optional)

  • Full product description
  • Ingredients list
  • Product specifications
  • Supplement facts

Advanced Capabilities

  • JSON API Extraction - Primary method using iHerb's internal APIs for fast, reliable data collection
  • HTML Parsing Fallback - Automatic fallback to HTML parsing ensures data extraction even when APIs change
  • Smart Pagination - Automatically navigates through multiple pages to collect all matching products
  • Duplicate Prevention - Built-in deduplication prevents collecting the same product multiple times
  • Stealth Technology - Uses advanced browser fingerprinting evasion for undetected scraping
  • Parallel Processing - Concurrent data enrichment for faster detailed product information retrieval

Quick Start

Running on Apify Platform

  1. Open the Actor in Apify Console
  2. Configure your scraping parameters:
    • Enter category URL or select category (e.g., "supplements", "vitamins")
    • Set maximum products to scrape
    • Choose sorting order and filters
    • Enable product enrichment for detailed data (optional)
  3. Click Start and wait for results
  4. Download your data in JSON, CSV, Excel, or other formats

Input Configuration Example

{
"categoryUrl": "https://www.iherb.com/c/supplements",
"maxProducts": 100,
"sortBy": "rating",
"inStockOnly": true
}

Input Parameters

Configure the scraper behavior with these parameters:

ParameterTypeRequiredDescription
categoryUrlStringNoDirect iHerb category URL to scrape. If provided, other parameters are ignored. Example: https://www.iherb.com/c/supplements
categoryStringNoProduct category to scrape (default: "supplements"). Examples: "vitamins", "beauty", "sports-nutrition", "grocery"
maxProductsIntegerNoMaximum number of products to scrape (default: 50, 0 = unlimited, max: 10000)
sortByStringNoSort order: "relevance", "price-asc", "price-desc", "rating", "newest", "bestselling" (default: "relevance")
inStockOnlyBooleanNoShow only products currently in stock (default: true)
enrichProductsBooleanNoFetch complete details including descriptions and ingredients. Slower but more comprehensive (default: false)
proxyConfigurationObjectNoProxy settings for reliable scraping (recommended for large-scale operations)

Output Data

Each product includes structured data in the following format:

{
"title": "California Gold Nutrition, Vitamin C, 1,000 mg, 60 Veggie Capsules",
"productId": "92891",
"brand": "California Gold Nutrition",
"price": "6.50",
"rating": 4.6,
"reviews": 1847,
"inStock": true,
"imageUrl": "https://s3.images-iherb.com/cog/cog92891/y/15.jpg",
"url": "https://www.iherb.com/pr/california-gold-nutrition-vitamin-c-1-000-mg-60-veggie-capsules/92891",
"category": "supplements",
"scrapedAt": "2026-01-01T10:30:00.000Z"
}

Data Fields

FieldTypeDescription
titleStringFull product name and description
productIdStringUnique iHerb product identifier
brandStringManufacturer or brand name
priceStringCurrent selling price
ratingNumberAverage customer rating (1-5 stars)
reviewsNumberTotal number of customer reviews
inStockBooleanProduct availability status
imageUrlStringProduct image URL
urlStringDirect link to product page
categoryStringProduct category extracted from URL
scrapedAtStringISO timestamp when data was collected

Export Formats

Download your scraped data in multiple formats:

  • JSON - Structured data for applications and APIs
  • CSV - Spreadsheet compatible format
  • Excel - Advanced data analysis and reporting
  • XML - Enterprise system integration
  • RSS - Feed subscriptions and monitoring
  • HTML Table - Web display and embedding

Usage Examples

Example 1: Top-Rated Vitamin C Products

{
"categoryUrl": "https://www.iherb.com/c/vitamin-c",
"maxProducts": 50,
"sortBy": "rating",
"inStockOnly": true
}

Example 2: Price Monitoring for Supplements

{
"category": "supplements",
"maxProducts": 200,
"sortBy": "bestselling",
"enrichProducts": false
}

Example 3: Detailed Beauty Product Analysis

{
"category": "beauty",
"maxProducts": 100,
"sortBy": "newest",
"enrichProducts": true
}

Example 4: Sports Nutrition Market Research

{
"categoryUrl": "https://www.iherb.com/c/sports-nutrition",
"maxProducts": 500,
"sortBy": "price-asc",
"inStockOnly": true
}

Integration

Apify API

Access scraped data programmatically:

$curl "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs/last/dataset/items?token=YOUR_API_TOKEN"

JavaScript Integration

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('YOUR_ACTOR_ID').call({
category: 'supplements',
maxProducts: 100,
sortBy: 'rating'
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python Integration

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('YOUR_ACTOR_ID').call(run_input={
'category': 'vitamins',
'maxProducts': 100,
'sortBy': 'bestselling',
'inStockOnly': True
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

Automation and Scheduling

Set Up Automated Scraping

Monitor product data continuously by scheduling regular runs:

  1. Navigate to Schedules in Apify Console
  2. Create a new schedule (hourly, daily, weekly, or custom cron expression)
  3. Configure your input parameters
  4. Enable notifications for run completion
  5. Connect to downstream systems via webhooks

Integration Options

  • Webhooks - Trigger actions when scraping completes
  • Zapier - Connect to 5000+ apps without coding
  • Make (Integromat) - Build complex automation workflows
  • Google Sheets - Auto-export data to spreadsheets
  • Slack/Discord - Receive notifications with results
  • Database Integration - Push data to PostgreSQL, MongoDB, MySQL

Use Cases

Price Monitoring and Competitive Analysis

Track pricing changes across thousands of products to optimize your pricing strategy. Monitor competitor prices, discount patterns, and identify market opportunities.

Product Research and Development

Analyze successful products, customer reviews, and ratings to inform new product development. Identify gaps in the market and trending ingredients.

Market Intelligence

Understand market trends, popular categories, and emerging brands. Track product launches, seasonal patterns, and consumer preferences.

Affiliate Marketing

Build product databases for affiliate websites. Automatically update product information, prices, and availability for your comparison sites or review platforms.

Inventory Management

Monitor product availability and stock status across categories. Receive alerts when products come back in stock or go out of stock.

Customer Sentiment Analysis

Collect review data and ratings to analyze customer satisfaction trends. Identify product quality issues and customer pain points.

E-commerce Platform Integration

Power your e-commerce platform with fresh product data. Sync inventory, pricing, and product information automatically.


Performance and Reliability

Scraping Performance

Small Runs (< 100 products)

  • Duration: 1-2 minutes
  • Best for: Quick price checks, category sampling
  • Cost: Minimal compute units

Large Runs (500+ products)

  • Duration: 5-15 minutes
  • Best for: Comprehensive catalogs, market analysis
  • Cost: Moderate compute units

Reliability Features

  • Automatic Retries - Failed requests are automatically retried with exponential backoff
  • Duplicate Detection - Prevents collecting the same product multiple times
  • Error Recovery - Continues scraping even if individual products fail
  • State Persistence - Can resume from where it left off after interruptions
  • Proxy Support - Built-in proxy rotation for reliable data collection

Technical Implementation

Extraction Methodology

1. JSON API Extraction (Primary Method)

The scraper monitors network requests and extracts data directly from iHerb's internal JSON APIs. This method provides:

  • 10-50x faster performance than HTML parsing
  • Structured, clean data without parsing overhead
  • Reliable field extraction without selector maintenance

2. HTML Parsing (Fallback Method)

When API extraction is unavailable, the scraper uses intelligent HTML parsing with multiple selector strategies:

  • Primary selectors targeting standard class names
  • Data attribute extraction for structured content
  • Generic fallback patterns for layout changes

Architecture Benefits

StrategySpeedReliabilityUse Case
JSON API⚑⚑⚑⚑⚑ Very Fastβœ…βœ…βœ…βœ… ExcellentPrimary extraction method
HTML Parsing⚑⚑⚑ Fastβœ…βœ…βœ… GoodFallback when APIs unavailable
Product Enrichment⚑⚑ Moderateβœ…βœ…βœ… GoodDetailed data collection

Data Quality Assurance

  • Field Validation - All extracted fields validated before storage
  • Type Checking - Ensures correct data types (numbers, booleans, strings)
  • Deduplication - Product IDs and URLs checked to prevent duplicates
  • Normalization - Whitespace trimming, price formatting, consistent structure
  • Timestamps - Automatic ISO 8601 timestamps for tracking data freshness

Best Practices

Optimizing Scraping Performance

  • βœ… Start Small - Test with low maxProducts values before large-scale runs
  • βœ… Use Proxies - Enable Apify proxies for reliable, uninterrupted scraping
  • βœ… Disable Enrichment - For faster results, set enrichProducts to false
  • βœ… Filter Early - Use inStockOnly to reduce unnecessary data collection
  • βœ… Schedule Wisely - Run during off-peak hours for better performance

Data Management

  • βœ… Export Regularly - Download results promptly to avoid storage limits
  • βœ… Version Control - Track data collection timestamps for historical analysis
  • βœ… Validate Output - Check sample results before processing large datasets
  • βœ… Handle Errors - Implement error handling in downstream integrations

Frequently Asked Questions


Support and Resources

Get Help

  • πŸ“– Documentation: Apify Documentation
  • πŸ’¬ Community: Join Discord Server
  • πŸ› Bug Reports: Submit via Actor feedback in Console
  • πŸ“§ Support: Contact through Apify Console

License

This Actor is licensed under the Apache License 2.0.


Keywords

iherb scraper, product scraper, supplements data, vitamin extractor, health products, nutrition data, e-commerce scraper, price monitoring, product research, market analysis, competitive intelligence, affiliate data, inventory tracking, customer reviews, product ratings


Start scraping iHerb products today!
Open in Apify Console

- Dynamic screen resolution simulation - Timezone and locale randomization - Realistic browser behavior - Transparent bypass without requiring manual solving - **Success Rate**: 99.9% Cloudflare bypass

Architecture Benefits

StrategySpeedReliabilityComplexity
API Detectionβš‘βš‘βš‘βš‘βš‘βœ…Low
HTML Parsingβš‘βš‘βš‘βœ…βœ…βœ…Medium
Camoufox Bypassβš‘βš‘βœ…βœ…βœ…βœ…High

How It Works

  1. Page Navigation β†’ Uses Camoufox for transparent Cloudflare bypass
  2. API Detection β†’ Listens to network traffic for JSON API calls
  3. Data Extraction β†’
    • Primary: Extract from captured APIs (if available)
    • Fallback: Parse HTML with intelligent selectors
  4. Pagination β†’ Automatically follows search results pages
  5. Data Storage β†’ Pushes clean, validated data to dataset

Performance Characteristics

  • Small Runs (< 100 jobs): ~1-2 minutes
  • Medium Runs (100-500 jobs): ~3-5 minutes
  • Large Runs (500+ jobs): ~10-15 minutes
  • Cloudflare Bypass: Automatic, no manual intervention needed
  • Proxy Support: Full proxy rotation for reliability

πŸš€ Quick Start

Running on Apify Platform

  1. Open the Actor in Apify Console
  2. Configure your search parameters:
    • Enter your search query (e.g., "Software Engineer")
    • Set location (e.g., "New York, USA")
    • Adjust additional filters as needed
  3. Click "Start" and wait for results
  4. Download your data in your preferred format

Input Configuration

{
"searchQuery": "data analyst",
"location": "London, UK",
"maxJobs": 100,
"newJobsOnly": true,
"jobType": "permanent",
"radius": "30",
"salaryMin": 50000,
"sortBy": "date"
}

πŸ“₯ Input Parameters

Configure the scraper behavior with the following parameters:

ParameterTypeRequiredDescription
searchQueryStringβœ… YesJob title or keywords (e.g., "administrator", "software engineer")
locationStringβœ… YesLocation to search (e.g., "USA", "London, UK", "New York")
maxJobsInteger❌ NoMaximum number of jobs to scrape (default: 100, 0 = unlimited)
newJobsOnlyBoolean❌ NoShow only recently posted jobs (default: true)
jobTypeString❌ NoEmployment type: all, permanent, contract, temp, parttime, internship (default: all)
radiusString❌ NoSearch radius: 0, 10, 20, 30, 50, 100, 200 km/miles (default: 50)
salaryMinInteger❌ NoMinimum annual salary filter
sortByString❌ NoSort order: relevance or date (default: relevance)
proxyConfigurationObject❌ NoProxy settings for reliable scraping (recommended)

πŸ“€ Output Data

Each job listing includes the following structured data:

{
"title": "Senior Data Analyst",
"company": "Tech Solutions Inc.",
"location": "New York, NY",
"salary": "$80,000 - $100,000 per year",
"jobType": "Permanent",
"postedDate": "2 days ago",
"description": "We are seeking an experienced Data Analyst to join our growing team...",
"descriptionHtml": "<p>We are seeking an experienced Data Analyst...</p>",
"descriptionText": "We are seeking an experienced Data Analyst to join our growing team...",
"url": "https://www.careerjet.com/jobad/...",
"scrapedAt": "2024-12-20T10:30:00.000Z"
}

Data Fields

FieldTypeDescription
titleStringJob position title
companyStringHiring company name
locationStringJob location (city, state, country)
salaryStringSalary range or "Not specified"
jobTypeStringEmployment type (Permanent, Contract, etc.)
postedDateStringWhen the job was posted
descriptionStringJob description and requirements
descriptionHtmlStringRaw HTML version of job description
descriptionTextStringPlain text version of job description
urlStringDirect link to job posting
scrapedAtStringISO timestamp of data extraction

πŸ“Š Export Formats

Download your scraped data in multiple formats:

  • JSON - Structured data for applications
  • CSV - Spreadsheet compatible
  • Excel - Advanced data analysis
  • XML - Enterprise integration
  • RSS - Feed subscriptions
  • HTML - Web display

πŸ’‘ Usage Examples

Example 1: Tech Jobs in San Francisco

{
"searchQuery": "software engineer",
"location": "San Francisco, CA",
"maxJobs": 50,
"newJobsOnly": true,
"salaryMin": 120000,
"sortBy": "date"
}

Example 2: Remote Marketing Positions

{
"searchQuery": "digital marketing",
"location": "Remote",
"jobType": "permanent",
"radius": "0",
"maxJobs": 100
}

Example 3: Entry-Level Internships

{
"searchQuery": "business analyst",
"location": "London, UK",
"jobType": "internship",
"newJobsOnly": true,
"maxJobs": 30
}

πŸ”§ Integration

Apify API

Access your scraped data programmatically:

$curl "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs/last/dataset/items?token=YOUR_API_TOKEN"

JavaScript/Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('YOUR_ACTOR_ID').call({
searchQuery: 'data scientist',
location: 'USA',
maxJobs: 100
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('YOUR_ACTOR_ID').call(run_input={
'searchQuery': 'python developer',
'location': 'Berlin, Germany',
'maxJobs': 50
})
dataset_items = client.dataset(run['defaultDatasetId']).list_items().items
print(dataset_items)

πŸ”„ Automation & Scheduling

Integration Options

  • Webhooks - Trigger actions on scraping completion
  • Zapier - Connect to 5000+ apps without coding
  • Make (Integromat) - Build complex automation workflows
  • Google Sheets - Auto-export to spreadsheets
  • Slack/Discord - Get notifications with results

πŸ› οΈ Technical Details

Scraping Engine Architecture

Network Monitoring & API Detection

The scraper actively monitors all network requests to Careerjet servers and automatically detects internal JSON APIs used to load job listings. When an API is found, the scraper uses it for direct data extractionβ€”achieving 10-50x faster performance than HTML parsing alone.

How API Detection Works:

  1. Listener captures all HTTP requests/responses during page load
  2. Responses are analyzed for JSON data containing job information
  3. API endpoints are logged for performance metrics
  4. Data is extracted directly from structured API responses

HTML Parsing with Fallback Selectors

If no API endpoint is detected, the scraper uses intelligent CSS selectors with multiple fallback patterns to extract job data from the rendered HTML. This ensures compatibility even if page structure changes.

Selector Strategy:

  • Primary selectors: Standard class names and semantic HTML
  • Data attributes: data-* attributes for structured data
  • Fallback patterns: Generic selectors matching common markup patterns
  • Multiple selector attempts for each field

Cloudflare Bypass Technology

Camoufox - a privacy-focused Firefox fork - handles Cloudflare protection transparently:

  • Randomized browser fingerprinting
  • Dynamic OS and screen resolution
  • Realistic timezone and locale
  • Anti-detection headers
  • No manual challenge solving required

Performance Optimizations

OptimizationImpactImplementation
API First10-50x fasterNetwork monitoring
Smart CachingReduced requestsBrowser context reuse
PaginationComplete dataAutomatic next page detection
Proxy RotationReliabilityApify proxy integration
Concurrent ProcessingThroughputControlled concurrency (1-5 concurrent)

Data Quality Assurance

  1. Field Validation - All extracted fields are validated before storage
  2. Deduplication - URLs are checked to prevent duplicate entries
  3. Cleanup - Whitespace trimming and text normalization
  4. Fallbacks - Missing optional fields default to "Not specified"
  5. Timestamps - Automatic ISO 8601 timestamps for all records

βš™οΈ Configuration Tips

Maximizing Results

  • βœ… Use specific keywords for better targeting
  • βœ… Enable proxies for reliable scraping
  • βœ… Set reasonable max jobs limits for faster runs
  • βœ… Use "New Jobs Only" for frequent scraping
  • βœ… Combine with location radius for broader coverage

Performance Optimization

  • Small Runs (< 100 jobs): Fast results in 1-2 minutes
  • Medium Runs (100-500 jobs): Typically 3-5 minutes
  • Large Runs (500+ jobs): May take 10-15 minutes

πŸ“ˆ Use Cases & Applications

1. Recruitment & Talent Acquisition

Build a pipeline of qualified candidates by monitoring job postings for competitor companies and identifying in-demand skills.

2. Market Intelligence

Track hiring trends, salary ranges, and skill requirements across industries to inform business strategy.

3. Job Board Aggregation

Automatically populate your job board platform with fresh listings from Careerjet.

4. Career Research

Analyze job market conditions, growth sectors, and location-based opportunities for career guidance.

5. Salary Benchmarking

Gather compensation data across roles and locations for HR analytics and competitive salary structuring.


πŸ› οΈ Technical Details

Rate Limiting & Best Practices

  • Respectful scraping with built-in delays
  • Proxy rotation to avoid blocks
  • Error handling and retry logic
  • Cloudflare bypass capabilities

Data Quality

  • Structured data extraction
  • Duplicate detection
  • Field validation
  • Clean, normalized output

❓ FAQ


🀝 Support & Feedback


πŸ“„ License

This Actor is licensed under the Apache License 2.0. See the LICENSE file for details.


🏷️ Keywords

job scraper, careerjet, employment data, job search, recruitment automation, job listings, career data, hiring trends, job aggregator, salary data, job board, talent acquisition, hr analytics, job market research, employment search