Pricing

from $7.00 / 1,000 results

Website Categorization API — 6-LLM Consensus URL Classifier

Stop hallucinated category labels. Run URLs through 6 LLMs voting in parallel (DeepSeek-v4, Llama-4, Qwen-3.5, Nemotron-3, GLM-5.1, MiniMax) for higher-confidence taxonomy classification. Lead-gen filtering, content moderation, dataset labeling. $0.007 per URL.

Pricing

from $7.00 / 1,000 results

Rating

0.0

(0)

Developer

yanmiayn

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Multi-Model Consensus Web Page Classifier

Classify any list of URLs into your custom taxonomy using a 6-model consensus engine (open-weights frontier LLMs voting in parallel). Reduces single-model hallucination on edge cases — useful for lead-gen filtering, content moderation queues, knowledge-graph ingestion, and dataset labeling.

Why consensus?

A single LLM occasionally hallucinates labels on ambiguous pages. This actor fans out the same classification prompt to 6 independent open-weights models and returns the consensus label plus a confidence signal. When the models agree, you can trust the label; when they disagree, the row is flagged for review.

Models in the pool: DeepSeek-v4, Llama-4-maverick, Qwen-3.5, NVIDIA Nemotron-3, GLM-5.1, MiniMax-m2.7.

Pricing

Pay-per-event (no subscription):

$0.007 per URL classified (charged on each result row written)
$0.01 per run (one-time orchestration fee)

A 1,000-URL run costs ~$7.01.

Input

{
  "urls": ["https://stripe.com", "https://nytimes.com"],
  "taxonomy": ["fintech", "news_media", "developer_tools", "ecommerce", "other"],
  "consensusMode": "majority",
  "maxConcurrency": 5
}

Field	Type	Default	Description
`urls`	string[]	—	Public URLs to classify.
`taxonomy`	string[]	—	2–30 candidate categories. Should be mutually exclusive and include an `"other"` bucket.
`consensusMode`	`"majority"` \| `"deep"`	`majority`	`majority` uses fewer models (faster). `deep` uses the full pool.
`maxConcurrency`	int	5	Parallel URL fetches (1–20).

Output (per URL)

{
  "url": "https://stripe.com",
  "title": "Stripe | Financial Infrastructure for the Internet",
  "status": "ok",
  "category": "fintech",
  "confidence": null,
  "consensusMode": "majority",
  "durationMs": 1840
}

The category field returns the consensus answer when models agree, or "other" as a safe fallback when JSON parsing fails. confidence may be null while the post-processing extractor is being improved.

For URLs that take too long to fetch or where the consensus engine times out, status: "error" is returned with a reason — those rows are not charged.

Use cases

Lead-gen filtering — bucket scraped homepages by industry before SDR outreach.
Content moderation triage — pre-tag URLs in user-submitted feeds.
Dataset labeling — bootstrap a training set with consensus labels.
Affiliate / partner discovery — group competitor sites by vertical.
Compliance pre-screening — surface pages that may belong to regulated categories.

Tips

Treat the actor as a first-pass classifier: high-confidence rows go straight through, ambiguous or error rows go to a human queue.
Categories work better when they are concrete and non-overlapping. Add "other" as the safety bucket.
Heavy single-page-application URLs may exceed the 120s consensus timeout; expect a small percentage of error rows on JS-heavy targets.

How it works

Fetches each URL (10s budget, follows redirects).
Extracts title + meta description + ~250 characters of body text.
Sends a compact classification prompt to the public consensus endpoint (/v1/public), which fans out to the 6-model pool and returns the agreed JSON label.
Parses the result and pushes one row per URL to the Apify dataset.

No personal data is stored — only the public page text and your taxonomy are sent for classification. The consensus engine is rate-limited at 10 requests per IP per day on the free public endpoint.

Limitations (honest)

The actor is a fresh listing (May 2026). Accuracy claims have not been independently benchmarked yet — early users help us calibrate.
A small fraction of buyer test runs hit the public endpoint's per-IP rate limit on bursts. Use small batches (≤30 URLs/run) for now, or contact the publisher for a private endpoint key.
confidence extraction is being tightened; for now null is common.

Source

Built and maintained by yanmiayn. Bug reports and feature requests via the actor's Issues tab on Apify.

UUID Generator

apizy/uuid-generator

Generate UUID v1, v3, v4, v5 instantly. Perfect for test data, unique IDs, database seeding, and development workflows. Choose random v4 (most common), time-based v1, or deterministic v3/v5. Customize hyphen format. Export results via Dataset or API. Fast, no-code tool with scheduling and monitoring

Apizy

Deep Research Agent (Brave + Gemini 3.1/GPT-5.1/Opus4.6)

visita/deep-research-agent

🦁 Autonomous research assistant. Uses Brave Search + AI (Gemini 3.1/GPT-5.1/Opus4.6) to search, scrape, and synthesize the web into professional, fully cited reports. 📄 Features instant HTML/Markdown export and massive context windows. Perfect for market intelligence, academic research, & briefs.

Visita Intelligence

UUID Generator

rl1987/uuid-generator

Generate bulk universally unique identifiers (UUID v1, v3, v4, v5, v7) on demand. Export as JSON, CSV, Excel or plain text.

R.L.

Patreon Extractor 🎯 ⭐5.0

jupri/patreon

💫 All-in-One Patreon.com Scraper [v5.0]

cat

2.5K

4.9

UUID Generator

maximedupre/uuid-generator

Generate UUID v1, v3, v4, v5, v7, alphanumeric IDs, and sequential IDs in bulk. Validate, analyze, convert, deduplicate, and summarize UUIDs, then export clean results from Apify.

Maxime Dupré

My Actor 6

red.cars/b2b-lead-gen-suite-mcp

AutomateLab

YouTube Lead Qualifier Pro

badruddeen/youtube-lead-qualifier-pro

Instantly turn any YouTube niche into 5–30 qualified B2B leads with real business emails. AI-scores every channel 0–100 using Groq Llama 3.3 70B → delivers only the hottest ones in a ready-to-send CSV.

Badruddeen Naseem

Social Content Generator (TikTok, LinkedIn YouTube, Blog)

visita/social-content-generator

Turn global chaos into strategic content. Generate viral TikTok scripts, SEO blog outlines, and LinkedIn thought leadership from real-time intelligence using premium AI models via OpenRouter (GPT-5.1, Claude 4.6, Gemini 3.1).

Visita Intelligence

Expense Categorization API

vivid_astronaut/expense-categorization

Fabio Suizu

LLM Token Counter & Cost Estimator (Claude/GPT/Gemini/Llama)

gochujang/llm-token-counter

Count tokens for any text across 16+ models (Claude Opus/Sonnet/Haiku, GPT-4o, o3, Gemini 1.5, Llama, Mistral) and estimate per-million-token cost. Claude via Anthropic API (BYO key), GPT via tiktoken, others via heuristic. $0.001 per text counted.