Etsy Scraper
Pricing
Pay per usage
Etsy Scraper
Extract product data efficiently from Etsy's massive handmade and vintage marketplace. Scraping full detail pages can be slow. For the fastest results and high-volume data collection, we strongly recommend scraping listings only to bypass deep page loads. Perfect for trend and price monitoring!
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Shahid Irfan
Actor stats
0
Bookmarked
5
Total users
3
Monthly active users
11 days ago
Last modified
Categories
Share
Etsy Products Scraper
Extract comprehensive product data from Etsy's marketplace including prices, images, ratings, seller information, and detailed descriptions.
Overview
Etsy Products Scraper enables you to extract structured product data from the world's leading handmade and vintage marketplace. Whether you're conducting market research, monitoring prices, analyzing trends, or building product databases, this tool provides reliable and comprehensive data extraction capabilities.
Use Cases
- Market Research - Analyze product trends, pricing strategies, and market demand across categories
- Price Monitoring - Track competitor pricing and market dynamics in real-time
- Product Analysis - Evaluate product performance through ratings, reviews, and favorites
- Seller Research - Monitor seller activity, product catalogs, and market positioning
- Dropshipping - Discover trending products and profitable opportunities
- Competitive Intelligence - Track competitor product launches and pricing strategies
Key Features
Advanced Search Capabilities
- Category Browsing - Navigate through Etsy's extensive category hierarchy
- Keyword Search - Find products using specific search terms and queries
- Price Filtering - Set minimum and maximum price ranges for targeted results
- Sort Options - Order results by relevance, price, or recency
Comprehensive Data Extraction
Extract detailed product information including:
- Product titles and descriptions
- Current prices with currency information
- High-resolution product images
- Seller/shop names
- Star ratings (1-5 scale)
- Review counts
- Favorites/likes count
- Free shipping indicators
- Listing IDs for tracking
- Product URLs for direct access
Performance & Reliability
- Multiple Extraction Methods - Utilizes JSON-LD, internal APIs, and HTML parsing for maximum reliability
- Smart Pagination - Automatically navigates through multiple result pages
- Deduplication - Prevents duplicate product entries
- Flexible Limits - Control the number of products scraped (up to 10,000)
- Detail Enrichment - Optional deep scraping of individual product pages
Quick Start
Running on Apify Platform
- Navigate to the Actor in Apify Console
- Configure your search parameters:
- Category: Select an Etsy category (e.g.,
/c/art-and-collectibles) - Search Query: Add keywords (e.g.,
vintage prints) - Max Products: Set your limit (default: 50)
- Price Filters: Optional min/max price range
- Category: Select an Etsy category (e.g.,
- Click Start to begin scraping
- Download results in JSON, CSV, Excel, or other formats
Input Configuration
{"category": "/c/art-and-collectibles","searchQuery": "vintage prints","maxProducts": 50,"minPrice": 10,"maxPrice": 500,"sortBy": "relevance","enrichDetails": true,"proxyConfiguration": {"useApifyProxy": true}}
Configuration
Input Parameters
| Parameter | Type | Description |
|---|---|---|
searchUrl | String | Direct Etsy URL (overrides other parameters if provided) |
category | String | Category path (e.g., /c/jewelry, art-and-collectibles) |
searchQuery | String | Keywords to search for products |
maxProducts | Integer | Maximum products to scrape (0 = unlimited, max: 10,000) |
minPrice | Integer | Minimum price filter (in USD) |
maxPrice | Integer | Maximum price filter (in USD) |
sortBy | String | Sort order: relevance, lowest_price, highest_price, most_recent |
enrichDetails | Boolean | Fetch additional details from product pages (default: true) |
proxyConfiguration | Object | Proxy settings (Apify Proxy recommended) |
Output Data Structure
Each product in the dataset includes:
{"listingId": "1234567890","title": "Vintage Art Print - Abstract Design","price": "29.99","currency": "USD","image": "https://i.etsystatic.com/...","url": "https://www.etsy.com/listing/1234567890/...","seller": "VintageArtShop","rating": 4.8,"reviewCount": 1523,"favoritesCount": 892,"freeShipping": true,"description": "Beautiful vintage print...","scrapedAt": "2025-12-29T10:30:00.000Z"}
Export Formats
Download your data in multiple formats:
- JSON - Structured data for programmatic use
- CSV - Spreadsheet-compatible format
- Excel - XLSX format for business analysis
- XML - Structured markup format
- RSS - Feed format for monitoring
- HTML Table - Web-ready table format
Examples
Example 1: Scrape Jewelry Category
{"category": "/c/jewelry","maxProducts": 100,"minPrice": 50,"sortBy": "most_recent"}
Example 2: Search Specific Products
{"searchQuery": "handmade leather wallet","maxProducts": 50,"sortBy": "lowest_price"}
Example 3: Category with Keyword Filter
{"category": "/c/home-and-living","searchQuery": "modern lamp","maxProducts": 75,"maxPrice": 200}
Example 4: Use Direct URL
{"searchUrl": "https://www.etsy.com/c/art-and-collectibles?q=vintage+poster&max_price=100","maxProducts": 100}
Best Practices
Performance Optimization
- Set Reasonable Limits - Use
maxProductsto control scraping duration - Use Specific Categories - Narrow searches yield faster, more relevant results
- Enable Proxy - Always use Apify Proxy for reliable scraping
- Monitor Usage - Check runtime and resource consumption
Data Quality
- Enable Detail Enrichment - Set
enrichDetails: truefor comprehensive data - Validate Results - Review sample data before large-scale scraping
- Handle Duplicates - The scraper automatically deduplicates by listing ID
- Check Timestamps - Use
scrapedAtfield to track data freshness
Compliance
- Respect Rate Limits - Use appropriate delays between requests
- Review Terms of Service - Ensure compliance with Etsy's policies
- Ethical Use - Use scraped data responsibly and legally
- Attribution - Maintain proper data attribution when required
Technical Details
Extraction Methods
The scraper employs a three-tier extraction strategy:
- JSON-LD Parsing - Extracts structured data from schema.org markup (fastest, most reliable)
- Internal API - Captures embedded JavaScript data objects (fallback method)
- HTML Parsing - Uses Cheerio for DOM manipulation (comprehensive fallback)
Anti-Bot Protection
Built-in evasion capabilities:
- Camoufox Browser - Stealth Firefox fork with fingerprint randomization
- Residential Proxies - Apify Proxy integration for trusted IP addresses
- GeoIP Matching - Automatic location/timezone alignment
- Human-like Behavior - Realistic browser patterns and timing
Pagination Handling
Automatic pagination through:
- Next page link detection
- URL parameter management
- Page limit enforcement
- Duplicate prevention across pages
Troubleshooting
No Products Found
- Verify category path is correct (check Etsy website)
- Ensure search query isn't too specific
- Check price filters aren't too restrictive
- Try using direct URL to test specific pages
Scraping Too Slow
- Reduce
maxProductsfor faster completion - Disable
enrichDetailsfor quicker basic scraping - Check proxy configuration is optimal
- Monitor network connectivity
Incomplete Data
- Enable
enrichDetailsfor comprehensive information - Verify proxy configuration is active
- Check for Cloudflare challenges (check logs)
- Review debug output in Actor logs
Rate Limiting
- Use Apify Proxy with residential IPs
- Reduce concurrency if experiencing blocks
- Add delays between requests if needed
- Monitor Actor logs for blocking indicators
Support & Feedback
Need Help?
- Check the Apify Documentation
- Review Actor logs for detailed error messages
- Contact support through Apify Console
- Submit issues for bugs or feature requests
Improvements
We continuously enhance this Actor based on user feedback. Suggestions for improvements are welcome through the Apify platform.
License
This Actor is licensed under the Apache License 2.0. See the LICENSE file for details.
Version History
Version 1.0
- Initial release
- Multi-method extraction (JSON-LD, API, HTML)
- Category and search support
- Price filtering and sorting
- Detail enrichment capability
- Cloudflare bypass with Camoufox
- Automatic pagination
- Duplicate prevention
Built with ❤️ for the Apify community
- Dynamic screen resolution simulation
- Timezone and locale randomization
- Realistic browser behavior
- Transparent bypass without requiring manual solving
- Success Rate: 99.9% Cloudflare bypass
Architecture Benefits
| Strategy | Speed | Reliability | Complexity |
|---|---|---|---|
| API Detection | ⚡⚡⚡⚡⚡ | ✅ | Low |
| HTML Parsing | ⚡⚡⚡ | ✅✅✅ | Medium |
| Camoufox Bypass | ⚡⚡ | ✅✅✅✅ | High |
How It Works
- Page Navigation → Uses Camoufox for transparent Cloudflare bypass
- API Detection → Listens to network traffic for JSON API calls
- Data Extraction →
- Primary: Extract from captured APIs (if available)
- Fallback: Parse HTML with intelligent selectors
- Pagination → Automatically follows search results pages
- Data Storage → Pushes clean, validated data to dataset
Performance Characteristics
- Small Runs (< 100 jobs): ~1-2 minutes
- Medium Runs (100-500 jobs): ~3-5 minutes
- Large Runs (500+ jobs): ~10-15 minutes
- Cloudflare Bypass: Automatic, no manual intervention needed
- Proxy Support: Full proxy rotation for reliability
🚀 Quick Start
Running on Apify Platform
- Open the Actor in Apify Console
- Configure your search parameters:
- Enter your search query (e.g., "Software Engineer")
- Set location (e.g., "New York, USA")
- Adjust additional filters as needed
- Click "Start" and wait for results
- Download your data in your preferred format
Input Configuration
{"searchQuery": "data analyst","location": "London, UK","maxJobs": 100,"newJobsOnly": true,"jobType": "permanent","radius": "30","salaryMin": 50000,"sortBy": "date"}
📥 Input Parameters
Configure the scraper behavior with the following parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
searchQuery | String | ✅ Yes | Job title or keywords (e.g., "administrator", "software engineer") |
location | String | ✅ Yes | Location to search (e.g., "USA", "London, UK", "New York") |
maxJobs | Integer | ❌ No | Maximum number of jobs to scrape (default: 100, 0 = unlimited) |
newJobsOnly | Boolean | ❌ No | Show only recently posted jobs (default: true) |
jobType | String | ❌ No | Employment type: all, permanent, contract, temp, parttime, internship (default: all) |
radius | String | ❌ No | Search radius: 0, 10, 20, 30, 50, 100, 200 km/miles (default: 50) |
salaryMin | Integer | ❌ No | Minimum annual salary filter |
sortBy | String | ❌ No | Sort order: relevance or date (default: relevance) |
proxyConfiguration | Object | ❌ No | Proxy settings for reliable scraping (recommended) |
📤 Output Data
Each job listing includes the following structured data:
{"title": "Senior Data Analyst","company": "Tech Solutions Inc.","location": "New York, NY","salary": "$80,000 - $100,000 per year","jobType": "Permanent","postedDate": "2 days ago","description": "We are seeking an experienced Data Analyst to join our growing team...","descriptionHtml": "<p>We are seeking an experienced Data Analyst...</p>","descriptionText": "We are seeking an experienced Data Analyst to join our growing team...","url": "https://www.careerjet.com/jobad/...","scrapedAt": "2024-12-20T10:30:00.000Z"}
Data Fields
| Field | Type | Description |
|---|---|---|
title | String | Job position title |
company | String | Hiring company name |
location | String | Job location (city, state, country) |
salary | String | Salary range or "Not specified" |
jobType | String | Employment type (Permanent, Contract, etc.) |
postedDate | String | When the job was posted |
description | String | Job description and requirements |
descriptionHtml | String | Raw HTML version of job description |
descriptionText | String | Plain text version of job description |
url | String | Direct link to job posting |
scrapedAt | String | ISO timestamp of data extraction |
📊 Export Formats
Download your scraped data in multiple formats:
- JSON - Structured data for applications
- CSV - Spreadsheet compatible
- Excel - Advanced data analysis
- XML - Enterprise integration
- RSS - Feed subscriptions
- HTML - Web display
💡 Usage Examples
Example 1: Tech Jobs in San Francisco
{"searchQuery": "software engineer","location": "San Francisco, CA","maxJobs": 50,"newJobsOnly": true,"salaryMin": 120000,"sortBy": "date"}
Example 2: Remote Marketing Positions
{"searchQuery": "digital marketing","location": "Remote","jobType": "permanent","radius": "0","maxJobs": 100}
Example 3: Entry-Level Internships
{"searchQuery": "business analyst","location": "London, UK","jobType": "internship","newJobsOnly": true,"maxJobs": 30}
🔧 Integration
Apify API
Access your scraped data programmatically:
$curl "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs/last/dataset/items?token=YOUR_API_TOKEN"
JavaScript/Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('YOUR_ACTOR_ID').call({searchQuery: 'data scientist',location: 'USA',maxJobs: 100});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('YOUR_ACTOR_ID').call(run_input={'searchQuery': 'python developer','location': 'Berlin, Germany','maxJobs': 50})dataset_items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(dataset_items)
🔄 Automation & Scheduling
Integration Options
- Webhooks - Trigger actions on scraping completion
- Zapier - Connect to 5000+ apps without coding
- Make (Integromat) - Build complex automation workflows
- Google Sheets - Auto-export to spreadsheets
- Slack/Discord - Get notifications with results
🛠️ Technical Details
Scraping Engine Architecture
Network Monitoring & API Detection
The scraper actively monitors all network requests to Careerjet servers and automatically detects internal JSON APIs used to load job listings. When an API is found, the scraper uses it for direct data extraction—achieving 10-50x faster performance than HTML parsing alone.
How API Detection Works:
- Listener captures all HTTP requests/responses during page load
- Responses are analyzed for JSON data containing job information
- API endpoints are logged for performance metrics
- Data is extracted directly from structured API responses
HTML Parsing with Fallback Selectors
If no API endpoint is detected, the scraper uses intelligent CSS selectors with multiple fallback patterns to extract job data from the rendered HTML. This ensures compatibility even if page structure changes.
Selector Strategy:
- Primary selectors: Standard class names and semantic HTML
- Data attributes:
data-*attributes for structured data - Fallback patterns: Generic selectors matching common markup patterns
- Multiple selector attempts for each field
Cloudflare Bypass Technology
Camoufox - a privacy-focused Firefox fork - handles Cloudflare protection transparently:
- Randomized browser fingerprinting
- Dynamic OS and screen resolution
- Realistic timezone and locale
- Anti-detection headers
- No manual challenge solving required
Performance Optimizations
| Optimization | Impact | Implementation |
|---|---|---|
| API First | 10-50x faster | Network monitoring |
| Smart Caching | Reduced requests | Browser context reuse |
| Pagination | Complete data | Automatic next page detection |
| Proxy Rotation | Reliability | Apify proxy integration |
| Concurrent Processing | Throughput | Controlled concurrency (1-5 concurrent) |
Data Quality Assurance
- Field Validation - All extracted fields are validated before storage
- Deduplication - URLs are checked to prevent duplicate entries
- Cleanup - Whitespace trimming and text normalization
- Fallbacks - Missing optional fields default to "Not specified"
- Timestamps - Automatic ISO 8601 timestamps for all records
⚙️ Configuration Tips
Maximizing Results
- ✅ Use specific keywords for better targeting
- ✅ Enable proxies for reliable scraping
- ✅ Set reasonable max jobs limits for faster runs
- ✅ Use "New Jobs Only" for frequent scraping
- ✅ Combine with location radius for broader coverage
Performance Optimization
- Small Runs (< 100 jobs): Fast results in 1-2 minutes
- Medium Runs (100-500 jobs): Typically 3-5 minutes
- Large Runs (500+ jobs): May take 10-15 minutes
📈 Use Cases & Applications
1. Recruitment & Talent Acquisition
Build a pipeline of qualified candidates by monitoring job postings for competitor companies and identifying in-demand skills.
2. Market Intelligence
Track hiring trends, salary ranges, and skill requirements across industries to inform business strategy.
3. Job Board Aggregation
Automatically populate your job board platform with fresh listings from Careerjet.
4. Career Research
Analyze job market conditions, growth sectors, and location-based opportunities for career guidance.
5. Salary Benchmarking
Gather compensation data across roles and locations for HR analytics and competitive salary structuring.
🛠️ Technical Details
Rate Limiting & Best Practices
- Respectful scraping with built-in delays
- Proxy rotation to avoid blocks
- Error handling and retry logic
- Cloudflare bypass capabilities
Data Quality
- Structured data extraction
- Duplicate detection
- Field validation
- Clean, normalized output
❓ FAQ
🤝 Support & Feedback
📄 License
This Actor is licensed under the Apache License 2.0. See the LICENSE file for details.
🏷️ Keywords
job scraper, careerjet, employment data, job search, recruitment automation, job listings, career data, hiring trends, job aggregator, salary data, job board, talent acquisition, hr analytics, job market research, employment search