Amazon AI Product Intelligence avatar
Amazon AI Product Intelligence
Under maintenance

Pricing

Pay per event

Go to Apify Store
Amazon AI Product Intelligence

Amazon AI Product Intelligence

Under maintenance

Developed by

bySeitz AI & Automation

bySeitz AI & Automation

Maintained by Community

Amazon AI Product Intelligence Stream is an advanced, AI-driven Actor designed to provide deep, structured intelligence from the global Amazon marketplace. It is built for targeted competitive and market analysis on e-commerce products.

0.0 (0)

Pricing

Pay per event

1

1

1

Last modified

a day ago

๐Ÿง  Amazon AI Product Intelligence Stream

This Actor performs advanced, structured data extraction and synthesis on Amazon product pages. It uses Playwright for targeted, stealthy scraping and leverages large language models (LLMs) via LangChain's structured output feature to convert raw HTML product details into actionable, clean JSON data and a final business report.

The Actor is designed for maximum reliability and flexibility, using a robust, two-tier processing system (Crawl Only Mode and Local Structured AI Mode).


๐Ÿš€ Key Features and Improvements

  • Local Structured AI Mode (Tier 2): Replaced the unstable external ChatKit API workflow with reliable local structured extraction using LangChain and OpenAI. This eliminates HTTP 404 errors and ensures predictable JSON output.
  • Dynamic Schema Selection: Automatically switches the LLM's output schema based on the user's Analysis Objective (Prompt Selection). This provides precise, dedicated structured output for technical specifications (AmazonTechnicalSpecs) and general data (AmazonProductData).
  • Complete Data Output: The final dataset now includes the single Aggregate Synthesis Report plus individual Structured Item Reports for every successfully processed product, offering both macro and micro data views.
  • Price & ASIN Robustness: Includes advanced Playwright selectors and injection logic to maximize the capture rate of dynamic data like Price and ASIN before passing content to the LLM for structuring.
  • Improved User Experience: The input interface is optimized with emojis and user-friendly editors, including a multi-select for search queries (stringList) and a dropdown for LLM model selection (including GPT-5) and Amazon domains.

โš™๏ธ Configuration and Input

The Actor's input is defined via input_schema.json, providing a user-friendly interface divided into three sections:

1. ๐Ÿ” Search Configuration

FieldTypeDescription
amazonSearchQueriesarray (stringList)The keywords to search for (one query per line).
amazonDomainstring (select)The Amazon marketplace to target (e.g., com, co.uk, jp).
maxTotalProductsintegerMax total unique product pages to process in the run.
maxProductsPerPageintegerMax product links to pull from each search result page.

2. ๐Ÿง  Analysis & AI Control

FieldTypeDescription
enableAISynthesisbooleanIf true (default): Runs the full LLM-based structured extraction and synthesis (Tier 2).
promptSelectionstring (select)Defines the analysis objective (e.g., core_summary, technical_specs, customer_sentiment, or custom_input).
customPromptstring (textarea)Used by the LLM when custom_input is selected (e.g., "Extract the screen size and processor model.").
llmModelstring (select)Selects the GPT model (e.g., gpt-4o-mini, gpt-4o, gpt-5) for all extraction and synthesis tasks.
verboseLogbooleanEnables detailed debug logging for troubleshooting.

๐Ÿ“Š Output Structure

The Actor pushes multiple JSON objects to the default Dataset, ensuring a comprehensive output:

Item 1: Final Synthesis Report (_tier: AI_SYNTHESIS_REPORT)

This is the single aggregate summary of all products processed for the original query.

FieldDescription
reportThe comprehensive, synthesized final business summary generated by the LLM.
sourcesArray of all product URLs used in the report.
extra_specs_jsonA single JSON string summarizing the most common miscellaneous specifications found across all products.

Subsequent Items: Individual Product Reports (_tier: AI_SYNTHESIS)

These contain the raw, structured data extracted from each successful product page.

FieldDescription
product_titleThe title of the product.
asinThe product's ASIN.
reportA short, human-readable summary of the structured data extracted for this specific product.
core_data_point / price_with_currency / etc.The specific structured data fields defined by the chosen analysis objective.

Fallback Items (_tier: CRAWL_ONLY_FALLBACK)

These items are pushed if the LLM extraction fails (e.g., API error or Pydantic error), providing the raw HTML/Markdown content for manual review.


๐Ÿ› ๏ธ Developer Notes

  • Model IDs: The _initialize_llm function automatically strips the redundant "openai/" prefix from the model name selected in the input UI to prevent Invalid Model ID errors when calling the OpenAI API.
  • Schema Handling: The scraper_logic.py dynamically selects and converts between Pydantic models (AmazonProductData, AmazonTechnicalSpecs, FinalReportSchema) and Python dictionaries using .model_dump() to ensure clean data flow and prevent Pydantic validation errors during aggregation.
  • Dependencies: The requirements.txt includes necessary asynchronous libraries (playwright, httpx) and the LangChain/OpenAI stack (langchain-openai) for robust execution.