Pricing

from $1.00 / 1,000 product details

Product Finder Plus: Crawler & Extractor

Product Finder Plus is a high-end e-commerce crawler built for websites where standard scraping tools fall short. It is designed to extract structured product data from complex, dynamic e-commerce stores and platforms.

Pricing

from $1.00 / 1,000 product details

Rating

0.0

(0)

Developer

Datavault

Actor stats

Bookmarked

Total users

Monthly active users

23 days ago

Last modified

Product Finder Plus - Crawler & Extractor

Recommendation: For simpler sites, we highly recommend trying the Product Finder Crawler & Extractor as a first step. It is generally faster and more cost-effective. This "Plus" version is designed for sites that require more complex solutions, specifically those with dynamic content or advanced anti-bot protections.

The Product Finder Crawler & Extractor Plus is an enhanced, high-performance implementation of our versatile e-commerce scraper. It is designed to extract product information from virtually any website, including modern Single Page Applications (SPAs) and PWA-based stores. It leverages multi-threaded concurrency and sophisticated parsing strategies (JSON-LD, Microdata, and JS-Global objects) to ensure maximum data yield with minimal overhead.

Features

High-Performance Concurrency: Uses a worker pool to crawl multiple pages in parallel, significantly reducing total execution time.
State Persistence & Resume: Automatically saves crawl progress (visited URLs and queue) to the Apify Key-Value Store. If the run is interrupted, it resumes exactly where it left off.
Comprehensive Product Discovery: Automatically identifies and extracts products using Schema.org (JSON-LD, Microdata), Meta Tags, and Next.js __NEXT_DATA__.
Dynamic JS-Object Extraction: Specifically tuned for ScandiPWA and React stores by extracting data directly from window.actionName and other global JavaScript objects.
Multi-Country Proxy Support: Fully integrated with Apify Proxy to bypass geo-blocks and analyze price differences across regions.
Pay-per-event (PPE) Integration: Fully compatible with Apify's PPE model, charging only for successful page loads and products found.
Configurable Limits: Control maxPagesPerCrawl, maxConcurrency, and maxRetries to manage depth and operational costs.

Input Parameters

startUrls: An array of URLs to start the crawl.
crawlSubpages: If checked (default: true), the crawler will follow links found on the pages.
maxPagesPerCrawl: The maximum number of pages to visit in a single run.
maxConcurrency: How many pages to process in parallel (Default: 5).
maxRetries: Number of times to retry a failed page fetch (Default: 3).
minRequestDelay: Minimum time in milliseconds to wait between requests.
proxyConfiguration: Apify Proxy configuration. Recommended for residential proxies on protected sites.

Output

The scraper outputs a dataset where each item represents a found product. Fields include:

url: The product page URL.
name: Product name.
description: Product description.
sku: Stock Keeping Unit.
brand: Brand name.
price: Product price.
currency: Currency code (e.g., USD, NOK).
image: URL of the product image.
availability: Availability status (e.g., InStock).
gtin: Global Trade Item Number (GTIN) such as EAN, UPC, ISBN.
rawSchema: The full extracted object for debugging or extra fields.

Sample Input

{
    "startUrls": [
        { "url": "https://www.example-store.com" }
    ],
    "crawlSubpages": true,
    "maxPagesPerCrawl": 200,
    "maxConcurrency": 5,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": ["RESIDENTIAL"]
    }
}

How it works

Initialization: The crawler loads any existing state and charges the apify-actor-start event.
Concurrent Fetching: Workers pick URLs from the queue and fetch them using a persistent HTTP client.
Advanced Parsing: It parses the page content using various strategies:
- Schema.org (JSON-LD, Microdata)
- Next.js and ScandiPWA data structures
- Global JavaScript objects and Meta Tags
Resilient Storage: Products are pushed to the Apify Dataset, and the crawl state is periodically saved to the Key-Value Store.
Smart Discovery: New links are identified from both HTML anchors and dynamic JavaScript content to ensure deep coverage.

Common issue when there is no result

Blocking: Some sites might require Residential Proxies or specific User-Agent headers.
Non-Standard Structures: If a site doesn't use standard markup or common HTML patterns, generic extraction might fail.

Tip

Try setting just one URL of your site in the list of startUrls and set crawlSubpages to false. See if you get any result before scaling up the crawl.

Feedback & Improvements If the results don't align with your goals, please reach out and leave us a message. We use your feedback to continuously update and refine our extraction engine, helping us make the Product Finder better for everyone.

Product Finder: Crawler & Extractor

datavault/product-finder-crawler-extractor

The Product Finder Crawler & Extractor is a versatile e-commerce scraper designed to extract product information from virtually any website but with a focus on e-commerce. Comprehensive Product Discovery, Up-to-Date Pricing, Multi-Country Price Comparison

Datavault

E-commerce Scraping Tool

apify/e-commerce-scraping-tool

Scrape data from e-commerce websites with E-commerce Scraping Tool. Scrape almost any retail site in minutes, extract e-commerce data and use it to monitor price details over time or compare different e-commerce sites’ offerings.

Apify

4.9K

4.5

Ecommerce-Product-Scraper

digicovai/ecommerce-product-scraper

Digi Covai

E-commerce Email Scraper 🔍🛒📧 - Cheap & Advanced

scrapestorm/e-commerce-email-scraper---cheap-advanced

🔍 Scrape E-commerce Emails Easily Enter your search parameters (e.g product keywords, email domains & platform) to collect verified seller or store contacts along with product title, store description & more 📊 Perfect for e-commerce lead generation, B2B outreach, product research & market analysis

Storm_Scraper

5.0

E-commerce

powerful_platypus/my-actor

E-commerce assistance AI agent

GOUNTANTE yendoukoa

Johnlewis Products Scraper

alkausari_mujahid/johnlewis-products-scraper

This scraper is designed to extract complete product and variant data from John Lewis product pages with high accuracy and structured output, ready for automation, analytics, or e-commerce sync.

Alkausari M

Magento E-Commerce Scraper 🚧

jupri/magento-scraper

Scrape data about product price, description and other information from Magento E-Commerce websites.

cat

447

Smart Product Scraper

devnaz/smart-product-scraper

Extract data from ANY e-commerce product in less than 2 seconds

DevnaZ

E-Commerce Price Extractor

smacient/e-commerce-price-extractor

Advanced AI powered price scraping tool, that works across most of the E-Commerce Platforms. Perfect for price extraction, comparison, competitor analysis, and flexible pricing optimization

Smacient

5.0

E-commerce Price Tracker Actor

kingofthejunes/e-commerce-price-tracker-actor

E-commerce Price Tracker - Never Miss a Sale Again! Automatically track product prices across multiple e-commerce platforms and get notified when prices drop or items go on sale.