Competitor Content Radar avatar
Competitor Content Radar

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Competitor Content Radar

Competitor Content Radar

Monitor competitor websites and detect new pages, content updates, and topic shifts with AI-powered analysis. Get intelligent summaries, webhook notifications, and track changes over time. Perfect for SEO teams, content strategists, and competitive intelligence.

Pricing

from $0.01 / 1,000 results

Rating

5.0

(1)

Developer

intelligence automation

intelligence automation

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

19 hours ago

Last modified

Share

Competitor Content Radar (SEO + AI)

Monitor competitor sites and blogs to detect new pages, updated content, and topic shifts with AI-powered summarization.

What it does

  • Monitor competitor sites/blogs - Automatically crawl and track competitor websites
  • Detect changes:
    • 🆕 New pages
    • ✏️ Updated content
    • 🔄 Topic shifts
  • AI Summarization with RAG - Get intelligent summaries enriched with historical context using OpenAI
  • Notifications - Receive alerts via webhook (Slack, Discord, etc.)

Quickstart

  1. Add at least one URL to competitorUrls
  2. Keep the cost protection defaults:
    • maxResults: 100
    • maxCharge: 1
  3. Run the Actor

Optional: set openaiApiKey (or OPENAI_API_KEY) to enable AI summaries.

Who pays

  • SEO consultants tracking competitor content strategies
  • Content teams monitoring industry trends
  • Indie SaaS founders keeping tabs on competitors

Differentiator

Not just scraping → change detection + reasoning

  • Compares current crawl with previous runs
  • Detects content similarity and topic shifts
  • AI-powered summaries with RAG context explain what changed and why
  • Weekly dataset output + webhook notifications

Features

  • Change Detection: Compares content hashes and similarity scores to detect changes (works without AI)
  • Topic Extraction: Automatically identifies topics from page content (basic method works without AI, enhanced with OpenAI optional)
  • AI Summarization with RAG: Uses OpenAI GPT with Retrieval-Augmented Generation to generate intelligent summaries enriched with historical context from similar pages (optional - requires API key)
  • Webhook Integration: Send notifications to Slack, Discord, or any webhook endpoint
  • Standby Mode: Runs as a persistent service for scheduled monitoring
  • Configurable Depth: Control how deep to crawl from start URLs
  • Pattern Filtering: Include/exclude specific URL patterns
  • Sitemap Support: Automatically discover URLs from sitemap.xml
  • Structural Change Detection: Tracks new/removed internal links
  • Robots.txt Compliance: Respects robots.txt rules automatically
  • Rate Limiting: Intelligent adaptive rate limiting with exponential backoff

Note: The Actor works perfectly without OpenAI API key. You'll still get:

  • ✅ Change detection (new pages, updated content, topic shifts)
  • ✅ Basic topic extraction (TF-IDF method)
  • ✅ All notifications and webhooks
  • ❌ AI-generated summaries with RAG context (requires API key)

RAG (Retrieval-Augmented Generation): When OpenAI API key is provided, the Actor uses RAG to enhance summaries by:

  • Retrieving context from similar pages on the same domain
  • Analyzing historical content patterns
  • Providing richer insights about how changes fit into the competitor's overall strategy

Input Parameters

  • competitorUrls (required): List of competitor website URLs to monitor
  • maxResults (required): Maximum number of results in dataset. Actor stops when this limit is reached. REQUIRED for cost protection. Default: 100, Range: 1-100000
  • maxCharge (required): Maximum cost in USD. Actor stops when estimated cost reaches this limit. REQUIRED for cost protection. Pricing: $0.00005 per start + $0.00001 per result. Default: $1, Range: $0.01-$1000
  • maxPagesPerSite: Maximum pages to crawl per site (default: 100, 0 = unlimited)
  • crawlDepth: Maximum crawl depth from start URLs (default: 2, 0-10)
  • includePatterns: URL patterns to include (e.g., ['**/blog/**', '**/articles/**']). Leave empty to include all URLs.
  • excludePatterns: URL patterns to exclude (e.g., ['**/category/**', '**/tag/**'])
  • useSitemap: Discover and use sitemap.xml to find URLs (default: false)
  • openaiApiKey: OpenAI API key for AI summarization (optional but recommended).
    • Option 1: Enter directly in the input (stored as secret)
    • Option 2: Set as environment variable OPENAI_API_KEY in Apify Console
    • Option 3: Use Apify Secrets (recommended for production)
  • useOpenAITopics: Use OpenAI to extract topics (more accurate, uses API credits, default: false)
  • openaiModel: OpenAI model to use (gpt-4o-mini, gpt-4, gpt-3.5-turbo, default: gpt-4o-mini)
  • webhookUrl (deprecated): Single webhook URL - use webhookUrls instead
  • webhookUrls: List of webhook URLs for change notifications (e.g., Slack, Discord webhooks)
  • minChangeThreshold: Minimum similarity difference to consider as change (0-1, default: 0.1)
  • minLastModified: Only process pages modified after this date (ISO 8601 format, e.g., 2024-01-01T00:00:00Z)
  • respectRobotsTxt: Respect robots.txt rules when crawling (recommended: true, default: true)
  • cleanupOldEntries: Automatically clean up old entries from Key-Value Store older than specified days (default: false)
    • Important: This cleans the Key-Value Store (where historical data is stored), NOT the dataset
    • The dataset is append-only and cannot be cleaned automatically
    • Use this to free up storage space in the Key-Value Store
  • cleanupDays: Delete Key-Value Store entries older than this many days (only if cleanupOldEntries is true, default: 30)
    • Entries are checked based on their crawledAt timestamp
    • Invalid entries (without crawledAt) are also deleted
  • customUserAgent: Custom user agent string (leave empty to use default realistic user agent)
  • requestTimeout: Timeout for each page request in seconds (default: 60, 10-300)
  • proxyConfiguration: Proxy settings for anti-bot protection

How Change Detection Works

The Actor detects new pages and changes by comparing current crawl data with previous runs. Here's how it works:

Persistence Between Runs

  1. Key-Value Store (KVS): Each page's data is stored under a key based on the URL hash: page_<url_hash>
  2. Cross-run persistence: The Actor uses a persistent named Key-Value Store (so data survives between runs) and reads/writes snapshots there.
  3. Data Comparison: On each run, the Actor:
    • Retrieves previous data from the persistent KVS
    • Compares current content with previous content
    • Detects changes based on similarity scores and topic shifts
    • Updates KVS with new data for the next run

Detection Logic

  • New Page: previousData === null → No data found in KVS
  • Updated Page: Content similarity < (1 - minChangeThreshold)
  • Topic Shift: Content similarity < 0.7 AND topic overlap < 0.5
  • Unchanged: Content similarity >= (1 - minChangeThreshold)

What Gets Stored

For each page, the following data is stored in the Key-Value Store:

  • title: Page title
  • content: First 5000 characters of text content
  • contentHash: SHA-256 hash of the content
  • topics: Detected topics (array)
  • metaDescription: Meta description
  • h1: Main H1 heading
  • h2s: First 10 H2 headings (array)
  • internalLinks: Internal links found on the page (array)
  • crawledAt: Timestamp when page was crawled (ISO 8601)

Viewing Stored Data

You can view stored data in two ways:

  1. Apify Console: Go to your Actor → StorageKey-Value Store → Browse keys starting with page_
  2. API Endpoints: Use /history and /keys endpoints (see API Endpoints section below)

Cleanup

  • Automatic Cleanup: Enable cleanupOldEntries: true to automatically remove entries older than cleanupDays (default: 30 days)
  • Manual Cleanup: Use the /cleanup endpoint to manually clean old entries
  • What Gets Cleaned: Only Key-Value Store entries, not dataset entries (Apify datasets are append-only)

Output

Dataset

The Actor outputs a dataset with the following fields:

  • url: Page URL
  • title: Page title
  • changeType: Type of change (new, updated, topic_shift, unchanged, error)
  • previousTitle: Previous page title (if updated)
  • previousContentHash: Hash of previous content
  • currentContentHash: Hash of current content
  • similarityScore: Content similarity score (0-1)
  • aiSummary: AI-generated summary of changes (enriched with RAG context if available)
  • topics: Detected topics/themes (array)
  • previousTopics: Previous topics (if updated, array)
  • crawledAt: Timestamp when page was crawled (ISO 8601)
  • competitorDomain: Domain of the competitor site
  • metaDescription: Meta description from page
  • h1: Main H1 heading
  • h2Count: Number of H2 headings on page
  • internalLinks: Internal links found on the page (array)
  • previousInternalLinks: Previous internal links (if updated, array)
  • newInternalLinks: New internal links added (array)
  • removedInternalLinks: Internal links removed (array)
  • error: Error message if processing failed

Key-Value Store (history)

The Actor stores page snapshots in the Key-Value Store to detect changes across runs:

  • page_<hash>: per-URL snapshots
  • domain_index_<domain>: per-domain list of recent pages for RAG context

Pricing

The Actor uses Pay per Event pricing model:

  • Actor Start: $0.00005 per run (fixed cost)
  • Dataset Item: $0.00001 per page crawled

Example costs:

  • Small crawl (10 pages): ~$0.00015
  • Medium crawl (100 pages): ~$0.001
  • Large crawl (1000 pages): ~$0.01

Pricing is transparent and scales with usage. No hidden fees.

Usage

Local Development

# Install dependencies
npm install
# Run locally
apify run

Deploy to Apify

# Login to Apify
apify login
# Deploy Actor
apify push

Standby Mode

The Actor runs in standby mode by default, allowing it to:

  • Stay running and wait for HTTP requests
  • Be triggered on a schedule
  • Respond to webhook triggers

API Endpoints

GET / - Health check and readiness probe

  • Returns: 200 OK with "Competitor Content Radar is ready"
  • Used by Apify platform for readiness checks

GET /health - Detailed health status

  • Returns: 200 OK with health metrics including memory usage and uptime

GET /metrics - Prometheus-style metrics

  • Query parameters:
    • limit (optional, default: 10000) - Maximum number of dataset items to process for metrics (1-50000, max: 50000)
  • Example with limit:
$curl "http://localhost:8080/metrics?limit=5000"
  • Returns: 200 OK with metrics in Prometheus format

GET /status - Last crawl status

  • Query parameters:
    • limit (optional, default: 1) - Number of recent entries to return (1-100, max: 100)
  • Example with limit:
$curl "http://localhost:8080/status?limit=10"
  • Returns: 200 OK with information about the last crawl

POST /crawl - Trigger a crawl

  • Request body: JSON input (same format as Actor input)
  • Response: 200 OK with { success: true, message: 'Crawl completed' }
  • Example:
curl -X POST http://localhost:8080/crawl \
-H "Content-Type: application/json" \
-d '{
"competitorUrls": ["https://example.com"],
"maxPagesPerSite": 50
}'

GET /stats - Get crawl statistics

  • Query parameters:
    • limit (optional, default: 10000) - Maximum number of dataset items to process for statistics (1-50000, max: 50000)
  • Example with limit:
$curl "http://localhost:8080/stats?limit=5000"
  • Response: 200 OK with statistics object
  • Example response:
{
"overall": {
"total": 100,
"new": 15,
"updated": 8,
"topicShift": 2,
"unchanged": 75,
"error": 0
},
"byDomain": [
{
"domain": "example.com",
"total": 50,
"new": 8,
"updated": 4,
"topicShift": 1,
"unchanged": 37,
"error": 0
}
],
"lastUpdated": "2024-01-15T10:00:00Z"
}

GET /history?url= - Get historical data for a specific URL

  • Purpose: Retrieve the stored historical data for a page from the Key-Value Store
  • Query parameter:
    • url (required) - The exact URL to get history for (must be URL-encoded)
  • Response: 200 OK with historical data or 400 Bad Request if URL is missing
  • Use cases:
    • Check what data was stored for a specific page
    • Debug why a page is detected as "new" or "updated"
    • View previous content, topics, and metadata
  • Example:
# URL must be encoded
curl "http://localhost:8080/history?url=https%3A%2F%2Fexample.com%2Fpage"
  • Example response (when data found):
{
"url": "https://example.com/page",
"found": true,
"storedData": {
"title": "Page Title",
"content": "First 5000 characters of page content...",
"contentHash": "a1b2c3d4e5f6...",
"topics": ["topic1", "topic2"],
"metaDescription": "Page meta description",
"h1": "Main Heading",
"h2s": ["Subheading 1", "Subheading 2"],
"internalLinks": ["https://example.com/link1", "https://example.com/link2"],
"crawledAt": "2024-01-15T10:00:00Z"
},
"latestEntry": {
"url": "https://example.com/page",
"title": "Page Title",
"changeType": "updated",
"similarityScore": 0.85,
"crawledAt": "2024-01-16T10:00:00Z"
},
"retrievedAt": "2024-01-16T11:00:00Z"
}
  • Example response (when no data found):
{
"url": "https://example.com/new-page",
"found": false,
"message": "No historical data found for this URL"
}

GET /keys - List all stored page keys in Key-Value Store

  • Purpose: Get an overview of all pages stored in the Key-Value Store
  • Query parameters:
    • limit (optional, default: 100) - Maximum number of keys to return (1-1000, max: 1000)
    • domain (optional) - Filter by domain (currently returns all page keys, domain filtering coming soon)
  • Example with limit:
$curl "http://localhost:8080/keys?limit=50"
  • Response: 200 OK with list of keys and metadata
  • Use cases:
    • See how many pages are being tracked
    • Find pages by title or topics
    • Monitor storage usage
    • Identify old entries that might need cleanup
  • Performance: For large stores (>1000 keys), use pagination by adjusting the limit parameter
  • Example:
$curl "http://localhost:8080/keys?limit=50"
  • Example response:
{
"total": 150,
"limit": 50,
"keys": [
{
"key": "page_a1b2c3d4e5f6g7h8i9j0k1l2m3n4",
"title": "Page Title",
"crawledAt": "2024-01-15T10:00:00Z",
"topics": ["topic1", "topic2"],
"size": 1234
},
{
"key": "page_b2c3d4e5f6g7h8i9j0k1l2m3n4o5",
"title": "Another Page",
"crawledAt": "2024-01-14T09:00:00Z",
"topics": ["topic3"],
"size": 987
}
],
"retrievedAt": "2024-01-16T11:00:00Z"
}
  • Note: The key field is the Key-Value Store key (format: page_<32-char-hash>). Use this key with /history endpoint or Apify Console to access full data.

POST /cleanup - Manually trigger cleanup of old entries from Key-Value Store

  • Purpose: Remove old entries from the Key-Value Store to free up storage space
  • Important: This only cleans the Key-Value Store (historical data), NOT the dataset. Apify datasets are append-only.
  • Request body: JSON with optional parameters
    • days (optional, default: 30) - Delete entries older than this many days (integer, 1-365)
    • dryRun (optional, default: false) - If true, only report what would be deleted without actually deleting
  • Response: 200 OK with cleanup results or 500 Internal Server Error if cleanup fails
  • What gets deleted:
    • Entries with crawledAt older than the cutoff date
    • Invalid entries (without crawledAt or unparseable data)
  • Processing: Cleanup processes entries in batches of 100 to avoid memory issues
  • Use cases:
    • Free up storage space in Key-Value Store
    • Remove stale data from old crawls
    • Test cleanup before enabling automatic cleanup
  • Example (dry run - safe, no deletion):
curl -X POST http://localhost:8080/cleanup \
-H "Content-Type: application/json" \
-d '{"days": 30, "dryRun": true}'
  • Example (actual cleanup):
curl -X POST http://localhost:8080/cleanup \
-H "Content-Type: application/json" \
-d '{"days": 30, "dryRun": false}'
  • Example response (dry run):
{
"success": true,
"dryRun": true,
"cutoffDate": "2023-12-16T10:00:00Z",
"checked": 150,
"found": 25,
"deleted": 0,
"entries": [
{
"key": "page_a1b2c3d4...",
"title": "Old Page",
"crawledAt": "2023-12-10T10:00:00Z"
}
],
"message": "Found 25 entries older than 30 days (dry run)"
}
  • Example response (actual cleanup):
{
"success": true,
"dryRun": false,
"cutoffDate": "2023-12-16T10:00:00Z",
"checked": 150,
"found": 25,
"deleted": 25,
"message": "Deleted 25 entries older than 30 days"
}
  • Recommendation: Always run with dryRun: true first to see what will be deleted

Example Input

Basic Example

{
"competitorUrls": [
"https://example.com/blog",
"https://competitor.com/articles"
],
"maxPagesPerSite": 50,
"crawlDepth": 2,
"minChangeThreshold": 0.15
}

Advanced Example with All Features

{
"competitorUrls": [
"https://example.com/blog"
],
"maxPagesPerSite": 100,
"crawlDepth": 3,
"includePatterns": ["**/blog/**", "**/articles/**"],
"excludePatterns": ["**/category/**", "**/tag/**"],
"useSitemap": true,
"openaiApiKey": "sk-...",
"useOpenAITopics": true,
"openaiModel": "gpt-4o-mini",
"webhookUrls": [
"https://hooks.slack.com/services/...",
"https://discord.com/api/webhooks/..."
],
"minChangeThreshold": 0.1,
"minLastModified": "2024-01-01T00:00:00Z",
"respectRobotsTxt": true,
"customUserAgent": "",
"requestTimeout": 60,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Configuration de la clé OpenAI

Vous avez 3 options pour configurer la clé OpenAI :

Option 1 : Dans l'input (recommandé pour tests)

Entrez directement la clé dans le champ "OpenAI API Key" de l'input. La clé est stockée comme secret et ne sera pas visible dans les logs.

Option 2 : Variable d'environnement (recommandé pour production)

  1. Allez dans Apify Console → Votre Actor → SettingsEnvironment variables
  2. Ajoutez OPENAI_API_KEY avec votre clé comme valeur
  3. La clé sera automatiquement utilisée si elle n'est pas dans l'input

Option 3 : Apify Secrets (meilleure sécurité)

  1. Allez dans Apify ConsoleSettingsSecrets
  2. Créez un secret nommé OPENAI_API_KEY
  3. Utilisez {{OPENAI_API_KEY}} dans l'input ou configurez-le comme variable d'environnement

Note : L'Actor cherche la clé dans cet ordre :

  1. Input openaiApiKey
  2. Variable d'environnement OPENAI_API_KEY
  3. Si aucune n'est trouvée, les fonctionnalités AI seront désactivées

Troubleshooting

Common Issues

Pages not being crawled

  • Check includePatterns and excludePatterns - they might be filtering out URLs
  • Verify crawlDepth is sufficient
  • Check if maxPagesPerSite limit is reached
  • Verify respectRobotsTxt is not blocking URLs

No changes detected

  • Adjust minChangeThreshold - lower values detect smaller changes
  • Verify previous crawl data exists in Key-Value Store (use /keys endpoint or check Apify Console → Storage → Key-Value Store)
  • Check if pages are actually changing
  • Use /history?url=<your-url> to see what data is stored for a specific page
  • First run: On the first run, all pages will be marked as "new" since there's no previous data

OpenAI API errors

  • Verify API key is correct and has credits
  • Check rate limits - the Actor uses retry with backoff
  • Try a different model (gpt-3.5-turbo is faster/cheaper)

Webhook notifications not sent

  • Verify webhook URLs are correct
  • Check Actor logs for webhook errors
  • Ensure webhook endpoints accept POST requests with JSON
  • Webhooks support Slack, Discord, and custom JSON formats automatically

High memory usage

  • Reduce maxPagesPerSite
  • Lower crawlDepth
  • Enable data compression (automatic)

Rate limiting issues

  • The Actor automatically adapts delays based on server responses
  • Respects Retry-After headers
  • Uses exponential backoff for 429 errors

Resources