
Ask Any Site: AI Q&A from Web Pages or Google
Pricing
$5.00/month + usage

Ask Any Site: AI Q&A from Web Pages or Google
Get precise AI answers from URLs or site-specific Google searches. Outputs strict JSON with answer, confidence, sources, and context. Supports Gemini/OpenAI, concurrency, proxy, and regional search—perfect for research, SEO, and content summarization.
5.0 (1)
Pricing
$5.00/month + usage
1
5
2
Last modified
13 days ago
Unified Website/Google → AI Answer (Apify Actor)
What this Actor does
- Answers questions about a website’s content (website mode) or from Google search results (google mode).
- For each input URL and question, it produces one dataset item with the AI’s answer and metadata.
- Google mode: Performs a site-specific search (site:
- Website mode: Loads a single page, structures its content (title, meta, headings, paragraphs, links, images, etc.), and asks the AI to answer strictly in JSON using only that content.
- Runs multiple URL–question pairs concurrently.
- Uses Apify Proxy automatically in Google mode if a proxy URL is not provided.
Input Provide a JSON input object with these fields:
-
source: string
- Selects the mode of operation.
- "google" (default): Site-specific Google search + AI answer.
- "website": Single page content analysis + AI answer.
-
start_urls: array of objects
- Each item is { "url": string }.
- Used in both modes. In google mode, the domain of each URL is used for the site-specific query. In website mode, the exact page is analyzed.
-
questions: array of strings
- One dataset item is produced per (URL × question).
-
AIModel: string
- Must contain "gemini" for Google Gemini models (e.g., "gemini-1.5-flash") or "gpt" for OpenAI models (e.g., "gpt-4o-mini").
-
ApiKey: string
- API key for the selected AI provider (Gemini or OpenAI).
-
Instruction: string (optional)
- Additional natural-language instruction appended to the system prompt, applied verbatim.
-
proxy_url: string (optional)
- HTTP proxy URL for Google mode search requests.
- If omitted, the Actor will create an Apify Residential proxy URL automatically.
-
num_results: integer (optional, google mode)
- Target number of Google results to provide to the AI per query. Default is 5.
-
region: string (optional, google mode)
- Two-letter region/country code (e.g., "US") for Google search localization.
-
max_concurrents: integer (optional)
- Limit of concurrent URL–question processing tasks. Default is 5.
-
topNLinksToExplore: integer (optional)
- Accepted by the input; current behavior analyzes only the provided page in website mode.
Example inputs
Google mode (site-specific search):
{"source": "google","start_urls": [{ "url": "https://apify.com" },{ "url": "https://example.com" }],"questions": ["What does this company do?","Where is the headquarters located?"],"AIModel": "gpt-4o-mini","ApiKey": "sk-***","Instruction": "Be concise.","num_results": 5,"region": "US","proxy_url": "http://auto:<YOUR_PROXY_PASSWORD>@proxy.apify.com:8000","max_concurrents": 5}
Website mode (single-page analysis):
{"source": "website","start_urls": [{ "url": "https://example.com/about" }],"questions": ["What is the company's name?","Provide the official contact email."],"AIModel": "gemini-1.5-flash","ApiKey": "AIza***","Instruction": "Return direct answers only.","topNLinksToExplore": 3,"max_concurrents": 3}
Output For each URL × question pair, one item is pushed to the default dataset with the following structure:
Top-level fields (always present):
- url: string — The input URL.
- domain: string — Domain extracted from the URL.
- question: string — The question being answered.
- mode: "google" | "website" — Which mode produced the answer.
- result: object — The AI’s JSON-only answer object (schema differs slightly by mode, see below).
AI result object schema (google mode):
- answer: string — The answer text.
- confidence: number (0.0–1.0) — AI’s confidence estimate.
- basis: string — Short description of which results supported the answer.
- source: "google"
- results_used: array — Indices or identifiers of the used search results.
- domain: string — Domain context used in the query (added by the Actor).
AI result object schema (website mode):
- answer: string — The answer text.
- confidence: number (0.0–1.0) — AI’s confidence estimate.
- basis: string — Short description of which page fields supported the answer.
- source: "website"
- links: array — Relevant links (if any) pertaining to the answer.
- domain: string — Domain of the analyzed page (added by the Actor).
Fallback behavior:
- If no definitive answer is found, the Actor instructs the AI to return:
- google mode: {"answer":"NO DEFINITIVE DATA FOUND","confidence":0.0,"basis":"none","source":"google","results_used":[]}
- website mode: {"answer":"NO DEFINITIVE DATA FOUND","confidence":0.0,"basis":"none","source":"website","links":[]}
- In case of invalid URLs, an item is still pushed with an "error" field at the top level indicating "Invalid URL".
- If the AI returns non-JSON content, the Actor falls back to returning the raw text under answer with confidence 0.0.
Example dataset items
Google mode:
{"url": "https://apify.com","domain": "apify.com","question": "What does this company do?","result": {"answer": "Apify provides a web scraping and automation platform.","confidence": 0.92,"basis": "Used top search results referencing Apify's official site and docs.","source": "google","results_used": [0, 1],"domain": "apify.com"},"mode": "google"}
Website mode:
{"url": "https://example.com/about","domain": "example.com","question": "Provide the official contact email.","result": {"answer": "contact@example.com","confidence": 0.85,"basis": "Found in page headings and paragraphs.","source": "website","links": ["https://example.com/contact"],"domain": "example.com"},"mode": "website"}
Notes on behavior
- Concurrency: Processing of URL–question pairs is parallelized and controlled by max_concurrents.
- Proxy:
- Google mode uses the provided proxy_url if present; otherwise it automatically uses an Apify Residential proxy.
- Website mode loads the single page directly.
- Strict JSON answers: The AI is instructed to return JSON only; the Actor attempts to parse and enforce this format for reliable downstream processing.
How to run on Apify Console
- Create a new Actor and upload this project.
- Open the Input tab and provide your JSON input (see examples above).
- Click Run. Monitor logs in the Run console.
- After completion, open the Dataset tab to download results in JSON, CSV, or other formats.