Ask Any Site: AI Q&A from Web Pages or Google avatar
Ask Any Site: AI Q&A from Web Pages or Google

Pricing

$5.00/month + usage

Go to Apify Store
Ask Any Site: AI Q&A from Web Pages or Google

Ask Any Site: AI Q&A from Web Pages or Google

Developed by

youssef farhan

youssef farhan

Maintained by Community

Get precise AI answers from URLs or site-specific Google searches. Outputs strict JSON with answer, confidence, sources, and context. Supports Gemini/OpenAI, concurrency, proxy, and regional search—perfect for research, SEO, and content summarization.

5.0 (1)

Pricing

$5.00/month + usage

1

5

2

Last modified

13 days ago

Unified Website/Google → AI Answer (Apify Actor)

What this Actor does

  • Answers questions about a website’s content (website mode) or from Google search results (google mode).
  • For each input URL and question, it produces one dataset item with the AI’s answer and metadata.
  • Google mode: Performs a site-specific search (site:
  • Website mode: Loads a single page, structures its content (title, meta, headings, paragraphs, links, images, etc.), and asks the AI to answer strictly in JSON using only that content.
  • Runs multiple URL–question pairs concurrently.
  • Uses Apify Proxy automatically in Google mode if a proxy URL is not provided.

Input Provide a JSON input object with these fields:

  • source: string

    • Selects the mode of operation.
    • "google" (default): Site-specific Google search + AI answer.
    • "website": Single page content analysis + AI answer.
  • start_urls: array of objects

    • Each item is { "url": string }.
    • Used in both modes. In google mode, the domain of each URL is used for the site-specific query. In website mode, the exact page is analyzed.
  • questions: array of strings

    • One dataset item is produced per (URL × question).
  • AIModel: string

    • Must contain "gemini" for Google Gemini models (e.g., "gemini-1.5-flash") or "gpt" for OpenAI models (e.g., "gpt-4o-mini").
  • ApiKey: string

    • API key for the selected AI provider (Gemini or OpenAI).
  • Instruction: string (optional)

    • Additional natural-language instruction appended to the system prompt, applied verbatim.
  • proxy_url: string (optional)

    • HTTP proxy URL for Google mode search requests.
    • If omitted, the Actor will create an Apify Residential proxy URL automatically.
  • num_results: integer (optional, google mode)

    • Target number of Google results to provide to the AI per query. Default is 5.
  • region: string (optional, google mode)

    • Two-letter region/country code (e.g., "US") for Google search localization.
  • max_concurrents: integer (optional)

    • Limit of concurrent URL–question processing tasks. Default is 5.
  • topNLinksToExplore: integer (optional)

    • Accepted by the input; current behavior analyzes only the provided page in website mode.

Example inputs

Google mode (site-specific search):

{
"source": "google",
"start_urls": [
{ "url": "https://apify.com" },
{ "url": "https://example.com" }
],
"questions": [
"What does this company do?",
"Where is the headquarters located?"
],
"AIModel": "gpt-4o-mini",
"ApiKey": "sk-***",
"Instruction": "Be concise.",
"num_results": 5,
"region": "US",
"proxy_url": "http://auto:<YOUR_PROXY_PASSWORD>@proxy.apify.com:8000",
"max_concurrents": 5
}

Website mode (single-page analysis):

{
"source": "website",
"start_urls": [
{ "url": "https://example.com/about" }
],
"questions": [
"What is the company's name?",
"Provide the official contact email."
],
"AIModel": "gemini-1.5-flash",
"ApiKey": "AIza***",
"Instruction": "Return direct answers only.",
"topNLinksToExplore": 3,
"max_concurrents": 3
}

Output For each URL × question pair, one item is pushed to the default dataset with the following structure:

Top-level fields (always present):

  • url: string — The input URL.
  • domain: string — Domain extracted from the URL.
  • question: string — The question being answered.
  • mode: "google" | "website" — Which mode produced the answer.
  • result: object — The AI’s JSON-only answer object (schema differs slightly by mode, see below).

AI result object schema (google mode):

  • answer: string — The answer text.
  • confidence: number (0.0–1.0) — AI’s confidence estimate.
  • basis: string — Short description of which results supported the answer.
  • source: "google"
  • results_used: array — Indices or identifiers of the used search results.
  • domain: string — Domain context used in the query (added by the Actor).

AI result object schema (website mode):

  • answer: string — The answer text.
  • confidence: number (0.0–1.0) — AI’s confidence estimate.
  • basis: string — Short description of which page fields supported the answer.
  • source: "website"
  • links: array — Relevant links (if any) pertaining to the answer.
  • domain: string — Domain of the analyzed page (added by the Actor).

Fallback behavior:

  • If no definitive answer is found, the Actor instructs the AI to return:
    • google mode: {"answer":"NO DEFINITIVE DATA FOUND","confidence":0.0,"basis":"none","source":"google","results_used":[]}
    • website mode: {"answer":"NO DEFINITIVE DATA FOUND","confidence":0.0,"basis":"none","source":"website","links":[]}
  • In case of invalid URLs, an item is still pushed with an "error" field at the top level indicating "Invalid URL".
  • If the AI returns non-JSON content, the Actor falls back to returning the raw text under answer with confidence 0.0.

Example dataset items

Google mode:

{
"url": "https://apify.com",
"domain": "apify.com",
"question": "What does this company do?",
"result": {
"answer": "Apify provides a web scraping and automation platform.",
"confidence": 0.92,
"basis": "Used top search results referencing Apify's official site and docs.",
"source": "google",
"results_used": [0, 1],
"domain": "apify.com"
},
"mode": "google"
}

Website mode:

{
"url": "https://example.com/about",
"domain": "example.com",
"question": "Provide the official contact email.",
"result": {
"answer": "contact@example.com",
"confidence": 0.85,
"basis": "Found in page headings and paragraphs.",
"source": "website",
"links": ["https://example.com/contact"],
"domain": "example.com"
},
"mode": "website"
}

Notes on behavior

  • Concurrency: Processing of URL–question pairs is parallelized and controlled by max_concurrents.
  • Proxy:
    • Google mode uses the provided proxy_url if present; otherwise it automatically uses an Apify Residential proxy.
    • Website mode loads the single page directly.
  • Strict JSON answers: The AI is instructed to return JSON only; the Actor attempts to parse and enforce this format for reliable downstream processing.

How to run on Apify Console

  1. Create a new Actor and upload this project.
  2. Open the Input tab and provide your JSON input (see examples above).
  3. Click Run. Monitor logs in the Run console.
  4. After completion, open the Dataset tab to download results in JSON, CSV, or other formats.