Amazon AI Product Intelligence
Pricing
Pay per event
Amazon AI Product Intelligence
Under maintenanceAmazon AI Product Intelligence Stream is an advanced, AI-driven Actor designed to provide deep, structured intelligence from the global Amazon marketplace. It is built for targeted competitive and market analysis on e-commerce products.
0.0 (0)
Pricing
Pay per event
1
1
1
Last modified
a day ago
๐ง Amazon AI Product Intelligence Stream
This Actor performs advanced, structured data extraction and synthesis on Amazon product pages. It uses Playwright for targeted, stealthy scraping and leverages large language models (LLMs) via LangChain's structured output feature to convert raw HTML product details into actionable, clean JSON data and a final business report.
The Actor is designed for maximum reliability and flexibility, using a robust, two-tier processing system (Crawl Only Mode and Local Structured AI Mode).
๐ Key Features and Improvements
- Local Structured AI Mode (Tier 2): Replaced the unstable external ChatKit API workflow with reliable local structured extraction using LangChain and OpenAI. This eliminates
HTTP 404errors and ensures predictable JSON output. - Dynamic Schema Selection: Automatically switches the LLM's output schema based on the user's Analysis Objective (Prompt Selection). This provides precise, dedicated structured output for technical specifications (
AmazonTechnicalSpecs) and general data (AmazonProductData). - Complete Data Output: The final dataset now includes the single Aggregate Synthesis Report plus individual Structured Item Reports for every successfully processed product, offering both macro and micro data views.
- Price & ASIN Robustness: Includes advanced Playwright selectors and injection logic to maximize the capture rate of dynamic data like Price and ASIN before passing content to the LLM for structuring.
- Improved User Experience: The input interface is optimized with emojis and user-friendly editors, including a multi-select for search queries (
stringList) and a dropdown for LLM model selection (including GPT-5) and Amazon domains.
โ๏ธ Configuration and Input
The Actor's input is defined via input_schema.json, providing a user-friendly interface divided into three sections:
1. ๐ Search Configuration
| Field | Type | Description |
|---|---|---|
amazonSearchQueries | array (stringList) | The keywords to search for (one query per line). |
amazonDomain | string (select) | The Amazon marketplace to target (e.g., com, co.uk, jp). |
maxTotalProducts | integer | Max total unique product pages to process in the run. |
maxProductsPerPage | integer | Max product links to pull from each search result page. |
2. ๐ง Analysis & AI Control
| Field | Type | Description |
|---|---|---|
enableAISynthesis | boolean | If true (default): Runs the full LLM-based structured extraction and synthesis (Tier 2). |
promptSelection | string (select) | Defines the analysis objective (e.g., core_summary, technical_specs, customer_sentiment, or custom_input). |
customPrompt | string (textarea) | Used by the LLM when custom_input is selected (e.g., "Extract the screen size and processor model."). |
llmModel | string (select) | Selects the GPT model (e.g., gpt-4o-mini, gpt-4o, gpt-5) for all extraction and synthesis tasks. |
verboseLog | boolean | Enables detailed debug logging for troubleshooting. |
๐ Output Structure
The Actor pushes multiple JSON objects to the default Dataset, ensuring a comprehensive output:
Item 1: Final Synthesis Report (_tier: AI_SYNTHESIS_REPORT)
This is the single aggregate summary of all products processed for the original query.
| Field | Description |
|---|---|
report | The comprehensive, synthesized final business summary generated by the LLM. |
sources | Array of all product URLs used in the report. |
extra_specs_json | A single JSON string summarizing the most common miscellaneous specifications found across all products. |
Subsequent Items: Individual Product Reports (_tier: AI_SYNTHESIS)
These contain the raw, structured data extracted from each successful product page.
| Field | Description |
|---|---|
product_title | The title of the product. |
asin | The product's ASIN. |
report | A short, human-readable summary of the structured data extracted for this specific product. |
core_data_point / price_with_currency / etc. | The specific structured data fields defined by the chosen analysis objective. |
Fallback Items (_tier: CRAWL_ONLY_FALLBACK)
These items are pushed if the LLM extraction fails (e.g., API error or Pydantic error), providing the raw HTML/Markdown content for manual review.
๐ ๏ธ Developer Notes
- Model IDs: The
_initialize_llmfunction automatically strips the redundant"openai/"prefix from the model name selected in the input UI to prevent Invalid Model ID errors when calling the OpenAI API. - Schema Handling: The
scraper_logic.pydynamically selects and converts between Pydantic models (AmazonProductData,AmazonTechnicalSpecs,FinalReportSchema) and Python dictionaries using.model_dump()to ensure clean data flow and prevent Pydantic validation errors during aggregation. - Dependencies: The
requirements.txtincludes necessary asynchronous libraries (playwright,httpx) and the LangChain/OpenAI stack (langchain-openai) for robust execution.
On this page
Share Actor:


