Competitor Content Radar
Pricing
from $0.01 / 1,000 results
Competitor Content Radar
Monitor competitor websites and detect new pages, content updates, and topic shifts with AI-powered analysis. Get intelligent summaries, webhook notifications, and track changes over time. Perfect for SEO teams, content strategists, and competitive intelligence.
Pricing
from $0.01 / 1,000 results
Rating
5.0
(1)
Developer

intelligence automation
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
19 hours ago
Last modified
Categories
Share
Competitor Content Radar (SEO + AI)
Monitor competitor sites and blogs to detect new pages, updated content, and topic shifts with AI-powered summarization.
What it does
- Monitor competitor sites/blogs - Automatically crawl and track competitor websites
- Detect changes:
- 🆕 New pages
- ✏️ Updated content
- 🔄 Topic shifts
- AI Summarization with RAG - Get intelligent summaries enriched with historical context using OpenAI
- Notifications - Receive alerts via webhook (Slack, Discord, etc.)
Quickstart
- Add at least one URL to competitorUrls
- Keep the cost protection defaults:
- maxResults: 100
- maxCharge: 1
- Run the Actor
Optional: set openaiApiKey (or OPENAI_API_KEY) to enable AI summaries.
Who pays
- SEO consultants tracking competitor content strategies
- Content teams monitoring industry trends
- Indie SaaS founders keeping tabs on competitors
Differentiator
Not just scraping → change detection + reasoning
- Compares current crawl with previous runs
- Detects content similarity and topic shifts
- AI-powered summaries with RAG context explain what changed and why
- Weekly dataset output + webhook notifications
Features
- Change Detection: Compares content hashes and similarity scores to detect changes (works without AI)
- Topic Extraction: Automatically identifies topics from page content (basic method works without AI, enhanced with OpenAI optional)
- AI Summarization with RAG: Uses OpenAI GPT with Retrieval-Augmented Generation to generate intelligent summaries enriched with historical context from similar pages (optional - requires API key)
- Webhook Integration: Send notifications to Slack, Discord, or any webhook endpoint
- Standby Mode: Runs as a persistent service for scheduled monitoring
- Configurable Depth: Control how deep to crawl from start URLs
- Pattern Filtering: Include/exclude specific URL patterns
- Sitemap Support: Automatically discover URLs from sitemap.xml
- Structural Change Detection: Tracks new/removed internal links
- Robots.txt Compliance: Respects robots.txt rules automatically
- Rate Limiting: Intelligent adaptive rate limiting with exponential backoff
Note: The Actor works perfectly without OpenAI API key. You'll still get:
- ✅ Change detection (new pages, updated content, topic shifts)
- ✅ Basic topic extraction (TF-IDF method)
- ✅ All notifications and webhooks
- ❌ AI-generated summaries with RAG context (requires API key)
RAG (Retrieval-Augmented Generation): When OpenAI API key is provided, the Actor uses RAG to enhance summaries by:
- Retrieving context from similar pages on the same domain
- Analyzing historical content patterns
- Providing richer insights about how changes fit into the competitor's overall strategy
Input Parameters
- competitorUrls (required): List of competitor website URLs to monitor
- maxResults (required): Maximum number of results in dataset. Actor stops when this limit is reached. REQUIRED for cost protection. Default: 100, Range: 1-100000
- maxCharge (required): Maximum cost in USD. Actor stops when estimated cost reaches this limit. REQUIRED for cost protection. Pricing: $0.00005 per start + $0.00001 per result. Default: $1, Range: $0.01-$1000
- maxPagesPerSite: Maximum pages to crawl per site (default: 100, 0 = unlimited)
- crawlDepth: Maximum crawl depth from start URLs (default: 2, 0-10)
- includePatterns: URL patterns to include (e.g.,
['**/blog/**', '**/articles/**']). Leave empty to include all URLs. - excludePatterns: URL patterns to exclude (e.g.,
['**/category/**', '**/tag/**']) - useSitemap: Discover and use sitemap.xml to find URLs (default: false)
- openaiApiKey: OpenAI API key for AI summarization (optional but recommended).
- Option 1: Enter directly in the input (stored as secret)
- Option 2: Set as environment variable
OPENAI_API_KEYin Apify Console - Option 3: Use Apify Secrets (recommended for production)
- useOpenAITopics: Use OpenAI to extract topics (more accurate, uses API credits, default: false)
- openaiModel: OpenAI model to use (
gpt-4o-mini,gpt-4,gpt-3.5-turbo, default:gpt-4o-mini) - webhookUrl (deprecated): Single webhook URL - use
webhookUrlsinstead - webhookUrls: List of webhook URLs for change notifications (e.g., Slack, Discord webhooks)
- minChangeThreshold: Minimum similarity difference to consider as change (0-1, default: 0.1)
- minLastModified: Only process pages modified after this date (ISO 8601 format, e.g.,
2024-01-01T00:00:00Z) - respectRobotsTxt: Respect robots.txt rules when crawling (recommended: true, default: true)
- cleanupOldEntries: Automatically clean up old entries from Key-Value Store older than specified days (default: false)
- Important: This cleans the Key-Value Store (where historical data is stored), NOT the dataset
- The dataset is append-only and cannot be cleaned automatically
- Use this to free up storage space in the Key-Value Store
- cleanupDays: Delete Key-Value Store entries older than this many days (only if cleanupOldEntries is true, default: 30)
- Entries are checked based on their
crawledAttimestamp - Invalid entries (without
crawledAt) are also deleted
- Entries are checked based on their
- customUserAgent: Custom user agent string (leave empty to use default realistic user agent)
- requestTimeout: Timeout for each page request in seconds (default: 60, 10-300)
- proxyConfiguration: Proxy settings for anti-bot protection
How Change Detection Works
The Actor detects new pages and changes by comparing current crawl data with previous runs. Here's how it works:
Persistence Between Runs
- Key-Value Store (KVS): Each page's data is stored under a key based on the URL hash:
page_<url_hash> - Cross-run persistence: The Actor uses a persistent named Key-Value Store (so data survives between runs) and reads/writes snapshots there.
- Data Comparison: On each run, the Actor:
- Retrieves previous data from the persistent KVS
- Compares current content with previous content
- Detects changes based on similarity scores and topic shifts
- Updates KVS with new data for the next run
Detection Logic
- New Page:
previousData === null→ No data found in KVS - Updated Page: Content similarity < (1 -
minChangeThreshold) - Topic Shift: Content similarity < 0.7 AND topic overlap < 0.5
- Unchanged: Content similarity >= (1 -
minChangeThreshold)
What Gets Stored
For each page, the following data is stored in the Key-Value Store:
title: Page titlecontent: First 5000 characters of text contentcontentHash: SHA-256 hash of the contenttopics: Detected topics (array)metaDescription: Meta descriptionh1: Main H1 headingh2s: First 10 H2 headings (array)internalLinks: Internal links found on the page (array)crawledAt: Timestamp when page was crawled (ISO 8601)
Viewing Stored Data
You can view stored data in two ways:
- Apify Console: Go to your Actor → Storage → Key-Value Store → Browse keys starting with
page_ - API Endpoints: Use
/historyand/keysendpoints (see API Endpoints section below)
Cleanup
- Automatic Cleanup: Enable
cleanupOldEntries: trueto automatically remove entries older thancleanupDays(default: 30 days) - Manual Cleanup: Use the
/cleanupendpoint to manually clean old entries - What Gets Cleaned: Only Key-Value Store entries, not dataset entries (Apify datasets are append-only)
Output
Dataset
The Actor outputs a dataset with the following fields:
url: Page URLtitle: Page titlechangeType: Type of change (new,updated,topic_shift,unchanged,error)previousTitle: Previous page title (if updated)previousContentHash: Hash of previous contentcurrentContentHash: Hash of current contentsimilarityScore: Content similarity score (0-1)aiSummary: AI-generated summary of changes (enriched with RAG context if available)topics: Detected topics/themes (array)previousTopics: Previous topics (if updated, array)crawledAt: Timestamp when page was crawled (ISO 8601)competitorDomain: Domain of the competitor sitemetaDescription: Meta description from pageh1: Main H1 headingh2Count: Number of H2 headings on pageinternalLinks: Internal links found on the page (array)previousInternalLinks: Previous internal links (if updated, array)newInternalLinks: New internal links added (array)removedInternalLinks: Internal links removed (array)error: Error message if processing failed
Key-Value Store (history)
The Actor stores page snapshots in the Key-Value Store to detect changes across runs:
page_<hash>: per-URL snapshotsdomain_index_<domain>: per-domain list of recent pages for RAG context
Pricing
The Actor uses Pay per Event pricing model:
- Actor Start: $0.00005 per run (fixed cost)
- Dataset Item: $0.00001 per page crawled
Example costs:
- Small crawl (10 pages): ~$0.00015
- Medium crawl (100 pages): ~$0.001
- Large crawl (1000 pages): ~$0.01
Pricing is transparent and scales with usage. No hidden fees.
Usage
Local Development
# Install dependenciesnpm install# Run locallyapify run
Deploy to Apify
# Login to Apifyapify login# Deploy Actorapify push
Standby Mode
The Actor runs in standby mode by default, allowing it to:
- Stay running and wait for HTTP requests
- Be triggered on a schedule
- Respond to webhook triggers
API Endpoints
GET / - Health check and readiness probe
- Returns:
200 OKwith "Competitor Content Radar is ready" - Used by Apify platform for readiness checks
GET /health - Detailed health status
- Returns:
200 OKwith health metrics including memory usage and uptime
GET /metrics - Prometheus-style metrics
- Query parameters:
limit(optional, default: 10000) - Maximum number of dataset items to process for metrics (1-50000, max: 50000)
- Example with limit:
$curl "http://localhost:8080/metrics?limit=5000"
- Returns:
200 OKwith metrics in Prometheus format
GET /status - Last crawl status
- Query parameters:
limit(optional, default: 1) - Number of recent entries to return (1-100, max: 100)
- Example with limit:
$curl "http://localhost:8080/status?limit=10"
- Returns:
200 OKwith information about the last crawl
POST /crawl - Trigger a crawl
- Request body: JSON input (same format as Actor input)
- Response:
200 OKwith{ success: true, message: 'Crawl completed' } - Example:
curl -X POST http://localhost:8080/crawl \-H "Content-Type: application/json" \-d '{"competitorUrls": ["https://example.com"],"maxPagesPerSite": 50}'
GET /stats - Get crawl statistics
- Query parameters:
limit(optional, default: 10000) - Maximum number of dataset items to process for statistics (1-50000, max: 50000)
- Example with limit:
$curl "http://localhost:8080/stats?limit=5000"
- Response:
200 OKwith statistics object - Example response:
{"overall": {"total": 100,"new": 15,"updated": 8,"topicShift": 2,"unchanged": 75,"error": 0},"byDomain": [{"domain": "example.com","total": 50,"new": 8,"updated": 4,"topicShift": 1,"unchanged": 37,"error": 0}],"lastUpdated": "2024-01-15T10:00:00Z"}
GET /history?url= - Get historical data for a specific URL
- Purpose: Retrieve the stored historical data for a page from the Key-Value Store
- Query parameter:
url(required) - The exact URL to get history for (must be URL-encoded)
- Response:
200 OKwith historical data or400 Bad Requestif URL is missing - Use cases:
- Check what data was stored for a specific page
- Debug why a page is detected as "new" or "updated"
- View previous content, topics, and metadata
- Example:
# URL must be encodedcurl "http://localhost:8080/history?url=https%3A%2F%2Fexample.com%2Fpage"
- Example response (when data found):
{"url": "https://example.com/page","found": true,"storedData": {"title": "Page Title","content": "First 5000 characters of page content...","contentHash": "a1b2c3d4e5f6...","topics": ["topic1", "topic2"],"metaDescription": "Page meta description","h1": "Main Heading","h2s": ["Subheading 1", "Subheading 2"],"internalLinks": ["https://example.com/link1", "https://example.com/link2"],"crawledAt": "2024-01-15T10:00:00Z"},"latestEntry": {"url": "https://example.com/page","title": "Page Title","changeType": "updated","similarityScore": 0.85,"crawledAt": "2024-01-16T10:00:00Z"},"retrievedAt": "2024-01-16T11:00:00Z"}
- Example response (when no data found):
{"url": "https://example.com/new-page","found": false,"message": "No historical data found for this URL"}
GET /keys - List all stored page keys in Key-Value Store
- Purpose: Get an overview of all pages stored in the Key-Value Store
- Query parameters:
limit(optional, default: 100) - Maximum number of keys to return (1-1000, max: 1000)domain(optional) - Filter by domain (currently returns all page keys, domain filtering coming soon)
- Example with limit:
$curl "http://localhost:8080/keys?limit=50"
- Response:
200 OKwith list of keys and metadata - Use cases:
- See how many pages are being tracked
- Find pages by title or topics
- Monitor storage usage
- Identify old entries that might need cleanup
- Performance: For large stores (>1000 keys), use pagination by adjusting the
limitparameter - Example:
$curl "http://localhost:8080/keys?limit=50"
- Example response:
{"total": 150,"limit": 50,"keys": [{"key": "page_a1b2c3d4e5f6g7h8i9j0k1l2m3n4","title": "Page Title","crawledAt": "2024-01-15T10:00:00Z","topics": ["topic1", "topic2"],"size": 1234},{"key": "page_b2c3d4e5f6g7h8i9j0k1l2m3n4o5","title": "Another Page","crawledAt": "2024-01-14T09:00:00Z","topics": ["topic3"],"size": 987}],"retrievedAt": "2024-01-16T11:00:00Z"}
- Note: The
keyfield is the Key-Value Store key (format:page_<32-char-hash>). Use this key with/historyendpoint or Apify Console to access full data.
POST /cleanup - Manually trigger cleanup of old entries from Key-Value Store
- Purpose: Remove old entries from the Key-Value Store to free up storage space
- Important: This only cleans the Key-Value Store (historical data), NOT the dataset. Apify datasets are append-only.
- Request body: JSON with optional parameters
days(optional, default: 30) - Delete entries older than this many days (integer, 1-365)dryRun(optional, default: false) - Iftrue, only report what would be deleted without actually deleting
- Response:
200 OKwith cleanup results or500 Internal Server Errorif cleanup fails - What gets deleted:
- Entries with
crawledAtolder than the cutoff date - Invalid entries (without
crawledAtor unparseable data)
- Entries with
- Processing: Cleanup processes entries in batches of 100 to avoid memory issues
- Use cases:
- Free up storage space in Key-Value Store
- Remove stale data from old crawls
- Test cleanup before enabling automatic cleanup
- Example (dry run - safe, no deletion):
curl -X POST http://localhost:8080/cleanup \-H "Content-Type: application/json" \-d '{"days": 30, "dryRun": true}'
- Example (actual cleanup):
curl -X POST http://localhost:8080/cleanup \-H "Content-Type: application/json" \-d '{"days": 30, "dryRun": false}'
- Example response (dry run):
{"success": true,"dryRun": true,"cutoffDate": "2023-12-16T10:00:00Z","checked": 150,"found": 25,"deleted": 0,"entries": [{"key": "page_a1b2c3d4...","title": "Old Page","crawledAt": "2023-12-10T10:00:00Z"}],"message": "Found 25 entries older than 30 days (dry run)"}
- Example response (actual cleanup):
{"success": true,"dryRun": false,"cutoffDate": "2023-12-16T10:00:00Z","checked": 150,"found": 25,"deleted": 25,"message": "Deleted 25 entries older than 30 days"}
- Recommendation: Always run with
dryRun: truefirst to see what will be deleted
Example Input
Basic Example
{"competitorUrls": ["https://example.com/blog","https://competitor.com/articles"],"maxPagesPerSite": 50,"crawlDepth": 2,"minChangeThreshold": 0.15}
Advanced Example with All Features
{"competitorUrls": ["https://example.com/blog"],"maxPagesPerSite": 100,"crawlDepth": 3,"includePatterns": ["**/blog/**", "**/articles/**"],"excludePatterns": ["**/category/**", "**/tag/**"],"useSitemap": true,"openaiApiKey": "sk-...","useOpenAITopics": true,"openaiModel": "gpt-4o-mini","webhookUrls": ["https://hooks.slack.com/services/...","https://discord.com/api/webhooks/..."],"minChangeThreshold": 0.1,"minLastModified": "2024-01-01T00:00:00Z","respectRobotsTxt": true,"customUserAgent": "","requestTimeout": 60,"proxyConfiguration": {"useApifyProxy": true}}
Configuration de la clé OpenAI
Vous avez 3 options pour configurer la clé OpenAI :
Option 1 : Dans l'input (recommandé pour tests)
Entrez directement la clé dans le champ "OpenAI API Key" de l'input. La clé est stockée comme secret et ne sera pas visible dans les logs.
Option 2 : Variable d'environnement (recommandé pour production)
- Allez dans Apify Console → Votre Actor → Settings → Environment variables
- Ajoutez
OPENAI_API_KEYavec votre clé comme valeur - La clé sera automatiquement utilisée si elle n'est pas dans l'input
Option 3 : Apify Secrets (meilleure sécurité)
- Allez dans Apify Console → Settings → Secrets
- Créez un secret nommé
OPENAI_API_KEY - Utilisez
{{OPENAI_API_KEY}}dans l'input ou configurez-le comme variable d'environnement
Note : L'Actor cherche la clé dans cet ordre :
- Input
openaiApiKey - Variable d'environnement
OPENAI_API_KEY - Si aucune n'est trouvée, les fonctionnalités AI seront désactivées
Troubleshooting
Common Issues
Pages not being crawled
- Check
includePatternsandexcludePatterns- they might be filtering out URLs - Verify
crawlDepthis sufficient - Check if
maxPagesPerSitelimit is reached - Verify
respectRobotsTxtis not blocking URLs
No changes detected
- Adjust
minChangeThreshold- lower values detect smaller changes - Verify previous crawl data exists in Key-Value Store (use
/keysendpoint or check Apify Console → Storage → Key-Value Store) - Check if pages are actually changing
- Use
/history?url=<your-url>to see what data is stored for a specific page - First run: On the first run, all pages will be marked as "new" since there's no previous data
OpenAI API errors
- Verify API key is correct and has credits
- Check rate limits - the Actor uses retry with backoff
- Try a different model (
gpt-3.5-turbois faster/cheaper)
Webhook notifications not sent
- Verify webhook URLs are correct
- Check Actor logs for webhook errors
- Ensure webhook endpoints accept POST requests with JSON
- Webhooks support Slack, Discord, and custom JSON formats automatically
High memory usage
- Reduce
maxPagesPerSite - Lower
crawlDepth - Enable data compression (automatic)
Rate limiting issues
- The Actor automatically adapts delays based on server responses
- Respects
Retry-Afterheaders - Uses exponential backoff for 429 errors