Evidence-First Website Facts Extractor
Pricing
from $5.00 / 1,000 useful fact pack results
Evidence-First Website Facts Extractor
Extract source-linked pricing, FAQ, feature, policy, docs, company, or custom fact packs from public websites for AI agents and research workflows.
Extract compact, source-linked fact packs from public websites for AI agents, sales research, SEO audits, competitive research, RAG preparation, and due diligence workflows.
This Actor is intentionally evidence-first. It returns direct website snippets with source URLs and diagnostics. It does not use a private API, infer facts without evidence, enrich private individuals, or act as a broad unbounded AI web crawler.
Input
{"urls": ["https://apify.com"],"factPack": "features","customTerms": [],"maxPagesPerSite": 8,"maxFactsPerSite": 20,"requestTimeoutSecs": 20}
Supported factPack values:
pricingfaqfeaturespoliciesdocscompanycustom
For custom, provide customTerms such as ["SOC 2", "API", "enterprise"].
Output
Each website produces one fact-pack record with:
- direct facts and evidence snippets;
- source URLs for every fact;
- the sanitized starting URL plus the site origin used as the crawl boundary;
- matched terms;
- pages scanned;
- confidence and completeness scores;
- missing fields and diagnostics;
- uncharged error records for blocked or failed sites.
Pricing
$0.00005when a run starts.$0.005for each useful evidence-complete website fact pack.- Failed, weak partial, duplicate, and empty records do not trigger
useful-fact-pack-result. - There is no
apify-default-dataset-itemcharge, so writing an error record does not create a result fee.
Safety
- Public HTTP and HTTPS pages only.
- URLs with credentials, query parameters, or fragments are rejected.
- Clean path URLs such as
/pricing,/docs, or/privacyare preserved as starting pages so users can target a specific fact page. - Account, invite, reset, unsubscribe, or token-like paths are rejected and redacted to the site origin.
- Local and private-network addresses are blocked.
- Redirects must stay on the requested website.
- Each HTML page is limited to 3 MB.
- Crawling is bounded by
maxPagesPerSite.