Etsy Scraper avatar
Etsy Scraper

Pricing

Pay per usage

Go to Apify Store
Etsy Scraper

Etsy Scraper

Extract product data efficiently from Etsy's massive handmade and vintage marketplace. Scraping full detail pages can be slow. For the fastest results and high-volume data collection, we strongly recommend scraping listings only to bypass deep page loads. Perfect for trend and price monitoring!

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

3

Monthly active users

11 days ago

Last modified

Share

Etsy Products Scraper

Extract comprehensive product data from Etsy's marketplace including prices, images, ratings, seller information, and detailed descriptions.


Overview

Etsy Products Scraper enables you to extract structured product data from the world's leading handmade and vintage marketplace. Whether you're conducting market research, monitoring prices, analyzing trends, or building product databases, this tool provides reliable and comprehensive data extraction capabilities.

Use Cases

  • Market Research - Analyze product trends, pricing strategies, and market demand across categories
  • Price Monitoring - Track competitor pricing and market dynamics in real-time
  • Product Analysis - Evaluate product performance through ratings, reviews, and favorites
  • Seller Research - Monitor seller activity, product catalogs, and market positioning
  • Dropshipping - Discover trending products and profitable opportunities
  • Competitive Intelligence - Track competitor product launches and pricing strategies

Key Features

Advanced Search Capabilities

  • Category Browsing - Navigate through Etsy's extensive category hierarchy
  • Keyword Search - Find products using specific search terms and queries
  • Price Filtering - Set minimum and maximum price ranges for targeted results
  • Sort Options - Order results by relevance, price, or recency

Comprehensive Data Extraction

Extract detailed product information including:

  • Product titles and descriptions
  • Current prices with currency information
  • High-resolution product images
  • Seller/shop names
  • Star ratings (1-5 scale)
  • Review counts
  • Favorites/likes count
  • Free shipping indicators
  • Listing IDs for tracking
  • Product URLs for direct access

Performance & Reliability

  • Multiple Extraction Methods - Utilizes JSON-LD, internal APIs, and HTML parsing for maximum reliability
  • Smart Pagination - Automatically navigates through multiple result pages
  • Deduplication - Prevents duplicate product entries
  • Flexible Limits - Control the number of products scraped (up to 10,000)
  • Detail Enrichment - Optional deep scraping of individual product pages

Quick Start

Running on Apify Platform

  1. Navigate to the Actor in Apify Console
  2. Configure your search parameters:
    • Category: Select an Etsy category (e.g., /c/art-and-collectibles)
    • Search Query: Add keywords (e.g., vintage prints)
    • Max Products: Set your limit (default: 50)
    • Price Filters: Optional min/max price range
  3. Click Start to begin scraping
  4. Download results in JSON, CSV, Excel, or other formats

Input Configuration

{
"category": "/c/art-and-collectibles",
"searchQuery": "vintage prints",
"maxProducts": 50,
"minPrice": 10,
"maxPrice": 500,
"sortBy": "relevance",
"enrichDetails": true,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Configuration

Input Parameters

ParameterTypeDescription
searchUrlStringDirect Etsy URL (overrides other parameters if provided)
categoryStringCategory path (e.g., /c/jewelry, art-and-collectibles)
searchQueryStringKeywords to search for products
maxProductsIntegerMaximum products to scrape (0 = unlimited, max: 10,000)
minPriceIntegerMinimum price filter (in USD)
maxPriceIntegerMaximum price filter (in USD)
sortByStringSort order: relevance, lowest_price, highest_price, most_recent
enrichDetailsBooleanFetch additional details from product pages (default: true)
proxyConfigurationObjectProxy settings (Apify Proxy recommended)

Output Data Structure

Each product in the dataset includes:

{
"listingId": "1234567890",
"title": "Vintage Art Print - Abstract Design",
"price": "29.99",
"currency": "USD",
"image": "https://i.etsystatic.com/...",
"url": "https://www.etsy.com/listing/1234567890/...",
"seller": "VintageArtShop",
"rating": 4.8,
"reviewCount": 1523,
"favoritesCount": 892,
"freeShipping": true,
"description": "Beautiful vintage print...",
"scrapedAt": "2025-12-29T10:30:00.000Z"
}

Export Formats

Download your data in multiple formats:

  • JSON - Structured data for programmatic use
  • CSV - Spreadsheet-compatible format
  • Excel - XLSX format for business analysis
  • XML - Structured markup format
  • RSS - Feed format for monitoring
  • HTML Table - Web-ready table format

Examples

Example 1: Scrape Jewelry Category

{
"category": "/c/jewelry",
"maxProducts": 100,
"minPrice": 50,
"sortBy": "most_recent"
}

Example 2: Search Specific Products

{
"searchQuery": "handmade leather wallet",
"maxProducts": 50,
"sortBy": "lowest_price"
}

Example 3: Category with Keyword Filter

{
"category": "/c/home-and-living",
"searchQuery": "modern lamp",
"maxProducts": 75,
"maxPrice": 200
}

Example 4: Use Direct URL

{
"searchUrl": "https://www.etsy.com/c/art-and-collectibles?q=vintage+poster&max_price=100",
"maxProducts": 100
}

Best Practices

Performance Optimization

  • Set Reasonable Limits - Use maxProducts to control scraping duration
  • Use Specific Categories - Narrow searches yield faster, more relevant results
  • Enable Proxy - Always use Apify Proxy for reliable scraping
  • Monitor Usage - Check runtime and resource consumption

Data Quality

  • Enable Detail Enrichment - Set enrichDetails: true for comprehensive data
  • Validate Results - Review sample data before large-scale scraping
  • Handle Duplicates - The scraper automatically deduplicates by listing ID
  • Check Timestamps - Use scrapedAt field to track data freshness

Compliance

  • Respect Rate Limits - Use appropriate delays between requests
  • Review Terms of Service - Ensure compliance with Etsy's policies
  • Ethical Use - Use scraped data responsibly and legally
  • Attribution - Maintain proper data attribution when required

Technical Details

Extraction Methods

The scraper employs a three-tier extraction strategy:

  1. JSON-LD Parsing - Extracts structured data from schema.org markup (fastest, most reliable)
  2. Internal API - Captures embedded JavaScript data objects (fallback method)
  3. HTML Parsing - Uses Cheerio for DOM manipulation (comprehensive fallback)

Anti-Bot Protection

Built-in evasion capabilities:

  • Camoufox Browser - Stealth Firefox fork with fingerprint randomization
  • Residential Proxies - Apify Proxy integration for trusted IP addresses
  • GeoIP Matching - Automatic location/timezone alignment
  • Human-like Behavior - Realistic browser patterns and timing

Pagination Handling

Automatic pagination through:

  • Next page link detection
  • URL parameter management
  • Page limit enforcement
  • Duplicate prevention across pages

Troubleshooting

No Products Found

  • Verify category path is correct (check Etsy website)
  • Ensure search query isn't too specific
  • Check price filters aren't too restrictive
  • Try using direct URL to test specific pages

Scraping Too Slow

  • Reduce maxProducts for faster completion
  • Disable enrichDetails for quicker basic scraping
  • Check proxy configuration is optimal
  • Monitor network connectivity

Incomplete Data

  • Enable enrichDetails for comprehensive information
  • Verify proxy configuration is active
  • Check for Cloudflare challenges (check logs)
  • Review debug output in Actor logs

Rate Limiting

  • Use Apify Proxy with residential IPs
  • Reduce concurrency if experiencing blocks
  • Add delays between requests if needed
  • Monitor Actor logs for blocking indicators

Support & Feedback

Need Help?

  • Check the Apify Documentation
  • Review Actor logs for detailed error messages
  • Contact support through Apify Console
  • Submit issues for bugs or feature requests

Improvements

We continuously enhance this Actor based on user feedback. Suggestions for improvements are welcome through the Apify platform.


License

This Actor is licensed under the Apache License 2.0. See the LICENSE file for details.


Version History

Version 1.0

  • Initial release
  • Multi-method extraction (JSON-LD, API, HTML)
  • Category and search support
  • Price filtering and sorting
  • Detail enrichment capability
  • Cloudflare bypass with Camoufox
  • Automatic pagination
  • Duplicate prevention

Built with ❤️ for the Apify community

  • Dynamic screen resolution simulation
  • Timezone and locale randomization
  • Realistic browser behavior
  • Transparent bypass without requiring manual solving
  • Success Rate: 99.9% Cloudflare bypass

Architecture Benefits

StrategySpeedReliabilityComplexity
API Detection⚡⚡⚡⚡⚡Low
HTML Parsing⚡⚡⚡✅✅✅Medium
Camoufox Bypass⚡⚡✅✅✅✅High

How It Works

  1. Page Navigation → Uses Camoufox for transparent Cloudflare bypass
  2. API Detection → Listens to network traffic for JSON API calls
  3. Data Extraction
    • Primary: Extract from captured APIs (if available)
    • Fallback: Parse HTML with intelligent selectors
  4. Pagination → Automatically follows search results pages
  5. Data Storage → Pushes clean, validated data to dataset

Performance Characteristics

  • Small Runs (< 100 jobs): ~1-2 minutes
  • Medium Runs (100-500 jobs): ~3-5 minutes
  • Large Runs (500+ jobs): ~10-15 minutes
  • Cloudflare Bypass: Automatic, no manual intervention needed
  • Proxy Support: Full proxy rotation for reliability

🚀 Quick Start

Running on Apify Platform

  1. Open the Actor in Apify Console
  2. Configure your search parameters:
    • Enter your search query (e.g., "Software Engineer")
    • Set location (e.g., "New York, USA")
    • Adjust additional filters as needed
  3. Click "Start" and wait for results
  4. Download your data in your preferred format

Input Configuration

{
"searchQuery": "data analyst",
"location": "London, UK",
"maxJobs": 100,
"newJobsOnly": true,
"jobType": "permanent",
"radius": "30",
"salaryMin": 50000,
"sortBy": "date"
}

📥 Input Parameters

Configure the scraper behavior with the following parameters:

ParameterTypeRequiredDescription
searchQueryString✅ YesJob title or keywords (e.g., "administrator", "software engineer")
locationString✅ YesLocation to search (e.g., "USA", "London, UK", "New York")
maxJobsInteger❌ NoMaximum number of jobs to scrape (default: 100, 0 = unlimited)
newJobsOnlyBoolean❌ NoShow only recently posted jobs (default: true)
jobTypeString❌ NoEmployment type: all, permanent, contract, temp, parttime, internship (default: all)
radiusString❌ NoSearch radius: 0, 10, 20, 30, 50, 100, 200 km/miles (default: 50)
salaryMinInteger❌ NoMinimum annual salary filter
sortByString❌ NoSort order: relevance or date (default: relevance)
proxyConfigurationObject❌ NoProxy settings for reliable scraping (recommended)

📤 Output Data

Each job listing includes the following structured data:

{
"title": "Senior Data Analyst",
"company": "Tech Solutions Inc.",
"location": "New York, NY",
"salary": "$80,000 - $100,000 per year",
"jobType": "Permanent",
"postedDate": "2 days ago",
"description": "We are seeking an experienced Data Analyst to join our growing team...",
"descriptionHtml": "<p>We are seeking an experienced Data Analyst...</p>",
"descriptionText": "We are seeking an experienced Data Analyst to join our growing team...",
"url": "https://www.careerjet.com/jobad/...",
"scrapedAt": "2024-12-20T10:30:00.000Z"
}

Data Fields

FieldTypeDescription
titleStringJob position title
companyStringHiring company name
locationStringJob location (city, state, country)
salaryStringSalary range or "Not specified"
jobTypeStringEmployment type (Permanent, Contract, etc.)
postedDateStringWhen the job was posted
descriptionStringJob description and requirements
descriptionHtmlStringRaw HTML version of job description
descriptionTextStringPlain text version of job description
urlStringDirect link to job posting
scrapedAtStringISO timestamp of data extraction

📊 Export Formats

Download your scraped data in multiple formats:

  • JSON - Structured data for applications
  • CSV - Spreadsheet compatible
  • Excel - Advanced data analysis
  • XML - Enterprise integration
  • RSS - Feed subscriptions
  • HTML - Web display

💡 Usage Examples

Example 1: Tech Jobs in San Francisco

{
"searchQuery": "software engineer",
"location": "San Francisco, CA",
"maxJobs": 50,
"newJobsOnly": true,
"salaryMin": 120000,
"sortBy": "date"
}

Example 2: Remote Marketing Positions

{
"searchQuery": "digital marketing",
"location": "Remote",
"jobType": "permanent",
"radius": "0",
"maxJobs": 100
}

Example 3: Entry-Level Internships

{
"searchQuery": "business analyst",
"location": "London, UK",
"jobType": "internship",
"newJobsOnly": true,
"maxJobs": 30
}

🔧 Integration

Apify API

Access your scraped data programmatically:

$curl "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs/last/dataset/items?token=YOUR_API_TOKEN"

JavaScript/Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('YOUR_ACTOR_ID').call({
searchQuery: 'data scientist',
location: 'USA',
maxJobs: 100
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('YOUR_ACTOR_ID').call(run_input={
'searchQuery': 'python developer',
'location': 'Berlin, Germany',
'maxJobs': 50
})
dataset_items = client.dataset(run['defaultDatasetId']).list_items().items
print(dataset_items)

🔄 Automation & Scheduling

Integration Options

  • Webhooks - Trigger actions on scraping completion
  • Zapier - Connect to 5000+ apps without coding
  • Make (Integromat) - Build complex automation workflows
  • Google Sheets - Auto-export to spreadsheets
  • Slack/Discord - Get notifications with results

🛠️ Technical Details

Scraping Engine Architecture

Network Monitoring & API Detection

The scraper actively monitors all network requests to Careerjet servers and automatically detects internal JSON APIs used to load job listings. When an API is found, the scraper uses it for direct data extraction—achieving 10-50x faster performance than HTML parsing alone.

How API Detection Works:

  1. Listener captures all HTTP requests/responses during page load
  2. Responses are analyzed for JSON data containing job information
  3. API endpoints are logged for performance metrics
  4. Data is extracted directly from structured API responses

HTML Parsing with Fallback Selectors

If no API endpoint is detected, the scraper uses intelligent CSS selectors with multiple fallback patterns to extract job data from the rendered HTML. This ensures compatibility even if page structure changes.

Selector Strategy:

  • Primary selectors: Standard class names and semantic HTML
  • Data attributes: data-* attributes for structured data
  • Fallback patterns: Generic selectors matching common markup patterns
  • Multiple selector attempts for each field

Cloudflare Bypass Technology

Camoufox - a privacy-focused Firefox fork - handles Cloudflare protection transparently:

  • Randomized browser fingerprinting
  • Dynamic OS and screen resolution
  • Realistic timezone and locale
  • Anti-detection headers
  • No manual challenge solving required

Performance Optimizations

OptimizationImpactImplementation
API First10-50x fasterNetwork monitoring
Smart CachingReduced requestsBrowser context reuse
PaginationComplete dataAutomatic next page detection
Proxy RotationReliabilityApify proxy integration
Concurrent ProcessingThroughputControlled concurrency (1-5 concurrent)

Data Quality Assurance

  1. Field Validation - All extracted fields are validated before storage
  2. Deduplication - URLs are checked to prevent duplicate entries
  3. Cleanup - Whitespace trimming and text normalization
  4. Fallbacks - Missing optional fields default to "Not specified"
  5. Timestamps - Automatic ISO 8601 timestamps for all records

⚙️ Configuration Tips

Maximizing Results

  • ✅ Use specific keywords for better targeting
  • ✅ Enable proxies for reliable scraping
  • ✅ Set reasonable max jobs limits for faster runs
  • ✅ Use "New Jobs Only" for frequent scraping
  • ✅ Combine with location radius for broader coverage

Performance Optimization

  • Small Runs (< 100 jobs): Fast results in 1-2 minutes
  • Medium Runs (100-500 jobs): Typically 3-5 minutes
  • Large Runs (500+ jobs): May take 10-15 minutes

📈 Use Cases & Applications

1. Recruitment & Talent Acquisition

Build a pipeline of qualified candidates by monitoring job postings for competitor companies and identifying in-demand skills.

2. Market Intelligence

Track hiring trends, salary ranges, and skill requirements across industries to inform business strategy.

3. Job Board Aggregation

Automatically populate your job board platform with fresh listings from Careerjet.

4. Career Research

Analyze job market conditions, growth sectors, and location-based opportunities for career guidance.

5. Salary Benchmarking

Gather compensation data across roles and locations for HR analytics and competitive salary structuring.


🛠️ Technical Details

Rate Limiting & Best Practices

  • Respectful scraping with built-in delays
  • Proxy rotation to avoid blocks
  • Error handling and retry logic
  • Cloudflare bypass capabilities

Data Quality

  • Structured data extraction
  • Duplicate detection
  • Field validation
  • Clean, normalized output

❓ FAQ


🤝 Support & Feedback


📄 License

This Actor is licensed under the Apache License 2.0. See the LICENSE file for details.


🏷️ Keywords

job scraper, careerjet, employment data, job search, recruitment automation, job listings, career data, hiring trends, job aggregator, salary data, job board, talent acquisition, hr analytics, job market research, employment search