Findify Best avatar
Findify Best

Pricing

$20.00 / 1,000 Results

Go to Store
Findify Best

Findify Best

gnyselcuk/findify-best

Developed by

selçuk güney

Maintained by Community

🔍 AI-powered e-commerce scraper that extracts detailed product data from any online store. Uses LLMs (Mistral/Gemini) for intelligent extraction, handles pagination, variants & CAPTCHAs. Perfect for price monitoring, market research & competitive analysis. #webscraping #ecommerce

0.0 (0)

Pricing

$20.00 / 1,000 Results

0

Monthly users

1

Runs succeeded

>99%

Last modified

12 hours ago

Findify.best - AI-Powered E-Commerce Data Solution

Version: 1.1

Findify.best is a powerful Apify Actor powered by artificial intelligence that automatically extracts product data from e-commerce sites. Using advanced language models like Mistral AI and Google Gemini, you can easily collect product name, price, description, SKU, brand, and more from ANY e-commerce site. It even works on popular sites like Amazon, Trendyol, and Hepsiburada!

Why Choose Findify.best?

Data Extraction from Any E-Commerce Site: Collect data from ANYWHERE you want, from a single product page to entire category pages.

AI-Powered Solution: No matter the site structure, our AI technology finds and extracts the right data.

Automatic Pagination: Automatically detects and follows "Next Page" links on category pages.

Variant Detection: Automatically extracts product variants like color, size, and model in a structured format.

Bot Protection Bypass: Works even on sites with strong bot protection like Amazon, thanks to Playwright integration.

CAPTCHA Detection and Bypass: Automatically detects CAPTCHA barriers and tries to bypass them with proxy rotation.

Proxy Support: Overcomes geographical restrictions and blocks with Apify Proxy (Datacenter, Residential).

Customizable Output: You decide which data fields you want to extract.

Secure API Key Management: API keys are included, no extra configuration needed.

Robust Error Handling: Works continuously with automatic retry and backup mechanisms.

Who Is It Ideal For?

🔹 E-Commerce Businesses: For competitor analysis and price tracking 🔹 Market Researchers: For collecting market trends and product data 🔹 Price Tracking Services: For automatic price monitoring solutions 🔹 Data Analysts: For creating e-commerce datasets 🔹 Marketing Specialists: For product information and lead generation

How to Use?

Input Settings

  • startUrls: List of URLs to scan. Can be product pages or category pages.
  • targetDataFields: Data fields to extract. Options:
    • productName
    • price
    • currency
    • description
    • brand
    • imageUrls
    • availability
    • variants
    • ratingValue
    • reviewCount
    • sku
    • categoryPath
    • specifications
  • enablePagination: When enabled, follows pagination links on category pages.
  • usePlaywright: Recommended for sites with strong bot protection like Amazon.
  • llmProvider: AI model to use:
    • Mistral: Uses Mistral AI API.
    • Gemini: Uses Google Gemini API.
    • Auto (Default): Tries Mistral first, switches to Gemini if unsuccessful.
  • maxConcurrency: Maximum number of pages to process in parallel.

Note: Mistral and Gemini API keys are included, no extra configuration needed.

Output Data

The actor saves the extracted data to the Apify Dataset. Each item represents data extracted from a URL.

1{
2  "scrapedUrl": "https://...", // Processed URL
3  "llmUsed": "Mistral " / "Gemini ", // AI model used
4  "extractionTimestamp": "YYYY-MM-DDTHH:mm:ss.sssZ", // Timestamp of extraction attempt
5  // --- Extracted Data Fields (based on targetDataFields input) ---
6  "productName": "Example Product",
7  "price": 29.99,
8  "currency": "USD",
9  "description": "This is a great product...",
10  "sku": "EXAMPLE-123",
11  "brand": "ExampleBrand",
12  "imageUrls": ["https://.../img1.jpg", "https://.../img2.jpg"],
13  "availability": "In Stock",
14  "variants": [
15    {
16      "name": "Small Red",
17      "size": "S",
18      "color": "Red",
19      "price": 19.99,
20      "currency": "USD",
21      "availability": "In Stock",
22      "sku": "PROD-S-RED"
23    }
24  ],
25  "ratingValue": 4.5,
26  "reviewCount": 105,
27  // --- Status & Error ---
28  "status": "Success" / "Failed - ...",
29  "error": null / "Error message..."
30}

Usage Tips

  • Accuracy: Data extraction accuracy depends on HTML quality and the selected model. Results may vary from site to site.
  • CAPTCHA Handling: The actor can detect common CAPTCHA challenges and tries to bypass them using proxy rotation or Playwright. Success rate varies depending on the target website.
  • Playwright Integration: When usePlaywright is enabled, the actor helps bypass complex bot protection mechanisms by simulating real user behavior. This increases the success rate for sites with strong anti-bot measures like Amazon.
  • Pagination: When enablePagination is enabled, the actor tries to detect and follow common pagination patterns (Next links, numbered pagination). This feature works best on standard e-commerce sites.
  • Compliance: It is your responsibility to ensure that your use of this actor complies with the terms of service of the websites you scan and the LLM providers. Avoid collecting personal data.

Example Usage Scenarios

Scenario 1: Basic Product Data Extraction

To extract basic product information from specific product URLs:

1{
2  "startUrls": [
3    { "url": "https://www.amazon.com/Apple-iPhone-13-128GB-Blue/dp/B09G9HD6PD" },
4    { "url": "https://www.bestbuy.com/site/samsung-galaxy-s21-5g-128gb-phantom-gray-unlocked/6448113.p" }
5  ],
6  "targetDataFields": ["productName", "price", "currency", "brand", "imageUrls"],
7  "usePlaywright": true
8}

Scenario 2: Category Page Scanning with Pagination

To extract products from a category page, including all pagination pages:

1{
2  "startUrls": [
3    { "url": "https://www.amazon.com/s?k=laptops" }
4  ],
5  "targetDataFields": ["productName", "price", "currency", "availability"],
6  "enablePagination": true,
7  "usePlaywright": true
8}

Scenario 3: Detailed Product Analysis with Variants

For a comprehensive analysis of products including their variants:

1{
2  "startUrls": [
3    { "url": "https://www.amazon.com/Apple-iPhone-13-128GB-Blue/dp/B09G9HD6PD" }
4  ],
5  "targetDataFields": ["productName", "price", "currency", "description", "brand", "variants", "ratingValue", "reviewCount"],
6  "usePlaywright": true
7}

Scenario 4: Scraping Amazon with Bot Protection Bypass

To extract product data from Amazon, which has sophisticated bot protection:

1{
2  "startUrls": [
3    { "url": "https://www.amazon.com/Apple-iPad-10-9-inch-Wi-Fi-64GB/dp/B09G9FPHY6" }
4  ],
5  "targetDataFields": ["productName", "price", "currency", "description", "brand", "variants"],
6  "usePlaywright": true,
7  "useApifyProxy": true,
8  "proxyGroups": ["RESIDENTIAL"]
9}

Quick Start

  1. Configure the actor

    • Add the product or category pages you want to scan to the startUrls field.
    • Select the data fields you want to extract from the targetDataFields field.
    • Enable the usePlaywright option for sites with strong bot measures.
    • Adjust other settings like maxConcurrency if desired.
  2. Run the actor

    • Click the "Start" button to begin the scanning process.
    • Monitor the run logs to see progress and potential issues.
    • When completed, download your data in your preferred format (JSON, CSV, Excel).

Troubleshooting

If you encounter issues with the actor, try these solutions:

  1. Browser Automation Issues:

    • Enable the usePlaywright option - this significantly improves scanning for complex websites.
    • Try using a proxy - enable the useApifyProxy option and select "RESIDENTIAL" for proxyGroups.
  2. Data Extraction Issues:

    • Try a different LLM provider - change the llmProvider setting.
    • Request fewer data fields - shorten the targetDataFields list.
  3. Pagination Issues:

    • Some sites use non-standard pagination - in this case, manually add each page to the startUrls list.
  4. CAPTCHA Issues:

    • Use a residential proxy - select "RESIDENTIAL" for proxyGroups.
    • Increase the captchaMaxAttempts value.

What Can You Do with Findify.best?

🛍️ Competitor Analysis: Automatically track your competitors' product prices, stock status, and features.

📊 Market Research: Conduct market analyses by collecting all products and prices in a specific product category.

💰 Price Monitoring: Regularly track prices of specific products to catch price changes.

📱 Product Comparison: Compare prices and conditions offered by different sellers for the same product.

🔍 Data Mining: Create structured datasets from e-commerce sites.

🤖 Automatic Catalog Creation: Create digital catalogs by extracting bulk product information.

Findify.best is a reliable, fast, and easy-to-use solution for your e-commerce data needs. Try it now and collect your data effortlessly!

Recent Updates

Version 1.1

  • Added Playwright integration to extract data from sites with strong bot protection like Amazon
  • Developed automatic pagination support for category pages
  • Added advanced detection mechanism for product variants (color, size, model)
  • Improved CAPTCHA detection and bypass mechanisms
  • Strengthened error handling and retry mechanisms
  • Updated Gemini API to use the latest model
  • Improved CAPTCHA detection and handling
  • Enhanced variant detection and extraction
  • Added support for running with local IP (without proxy) for testing purposes
  • Fixed various bugs and improved error handling

Version 1.0

  • Initial release with basic LLM-powered extraction
  • Support for Mistral and Gemini APIs
  • HTML cleaning and preprocessing
  • Pagination support
  • CAPTCHA detection with proxy rotation

Pricing

Pricing model

Pay per result 

This Actor is paid per result. You are not charged for the Apify platform usage, but only a fixed price for each dataset of 1,000 items in the Actor outputs.

Price per 1,000 items

$20.00