Pricing

from $9.00 / 1,000 results

Docs & Changelog Drift Scraper

Track developer portals and software release notes to feed accurate, structured update data into your RAG pipelines and LLM models.

Pricing

from $9.00 / 1,000 results

Rating

0.0

(0)

Developer

太郎山田

Actor stats

Bookmarked

Total users

Monthly active users

5 days ago

Last modified

Docs & Changelog Drift Monitor API

Ensure your LLM applications, custom AI agents, and vector databases never hallucinate outdated technical specifications. As software ecosystems evolve, keeping internal knowledge bases accurate requires constant data extraction. This web scraper is built specifically to automate the tracking of documentation drift by actively monitoring release notes, migration guides, and developer API pages across your target websites. By running scheduled checks, it allows you to scrape crucial updates and detect new software releases, hidden deprecations, and version bumps before they break your downstream tools or AI models.

Developer relations teams, DevOps engineers, and AI application builders schedule this scraper to run continuously, seamlessly extracting precise details from unstructured web pages. Rather than manually hunting for missing technical details or searching through dense developer portals using a standard browser, you can automatically capture the exact text evidence of documentation changes. The scraped results are delivered as high-quality structured data, explicitly designed to integrate directly into RAG pipelines, automated wikis, or custom APIs. Every run provides actionable outputs, including the exact URL scraped, extracted raw text, version numbers identified, deprecation flags, and precise timestamps of the update. Schedule daily or weekly runs to maintain a living record of third-party ecosystem changes, ensuring your internal search and AI tools always serve the most up-to-date documentation.

Store Quickstart

Start with store-input.example.json for a reliable first run across two public targets.
If the output fits your workflow, switch to store-input.templates.json and choose one of:
- Quickstart (2 targets, dataset) for first success
- Recurring Product Docs Watchlist for recurring docs and migration monitoring
- Webhook Docs Drift Digest when Slack, ticketing, or internal automation should receive only summary-first action items

Key Features

🛠️ Developer-focused — CLI-ready JSON output for piping into build/CI tooling
⚡ Fast parallel scanning — Concurrent fetches with backoff for high-throughput audits
📊 Changelog-aware — Detects version bumps, new releases, and deprecations between runs
🔔 Alert integrations — Webhook delivery to Slack/PagerDuty/Opsgenie for on-call visibility
🔒 Zero-credentials — Uses only public data — no package-registry API keys required

Use Cases

Who	Why
Developers	Automate recurring data fetches without building custom scrapers
Data teams	Pipe structured output into analytics warehouses
Ops teams	Monitor changes via webhook alerts
Product managers	Track competitor/market signals without engineering time

Input

Field	Type	Default	Description
targets	array	prefilled	List of products or repos to monitor. Each item supports id, name, repo, criticality, owner, tags, releaseNotesUrl, chan
requestTimeoutSeconds	integer	`30`	Maximum time to wait for one public source request before the actor marks the surface as failed for this run.
userAgent	string	—	Optional custom User-Agent string for public HTTP requests. Leave empty to use the actor default identifier.
maxChars	integer	`35000`	Upper bound for extracted text per monitored surface before hashing and diff generation.
delivery	string	`"dataset"`	Choose whether summary-first target rows should be written to the dataset, posted to a webhook payload, or reserved for
datasetMode	string	`"changes_only"`	Controls which target rows are persisted: only action-needed rows, only changed rows, or every monitored target.
webhookUrl	string	—	Webhook destination for summary payload delivery when delivery is set to webhook.
notifyOnNoChange	boolean	`false`	If true, webhook delivery still fires even when no target crosses the change threshold in this run.

Input Example

{
  "targets": [
    {
      "id": "nextjs",
      "name": "Next.js",
      "repo": "vercel/next.js",
      "criticality": "high",
      "owner": "Frontend Platform",
      "tags": ["framework", "docs"],
      "releaseNotesUrl": "https://github.com/vercel/next.js/releases.atom",
      "migrationGuideUrl": "https://nextjs.org/docs/app/guides/upgrading/version-16",
      "docsPages": [
        {
          "id": "nextjs-caching",
          "name": "Caching Guide",
          "url": "https://nextjs.org/docs/app/guides/caching"
        }
      ]
    }
  ],
  "delivery": "dataset",
  "datasetMode": "changes_only",
  "snapshotKey": "docs-changelog-drift-nextjs",
  "diffMode": "line_summary",
  "summaryMaxLines": 12,
  "concurrency": 2
}

Input Examples

Example: Single docs page

{
  "urls": [
    "https://docs.example.com/api"
  ]
}

Example: Multi-page docs drift

{
  "urls": [
    "https://docs.example.com/api",
    "https://docs.example.com/changelog"
  ],
  "snapshotKey": "docs-state"
}

Example: Recurring API spec watch

{
  "urls": [
    "https://docs.example.com/openapi.yaml"
  ],
  "snapshotKey": "api-spec",
  "emitChangedOnly": true
}

Output

Field	Type	Description
`meta`	object
`recurringDigest`	object
`actionNeeded`	array
`results`	array
`results[].targetId`	string
`results[].targetName`	string
`results[].repo`	string
`results[].repoUrl`	string (url)
`results[].criticality`	string
`results[].owner`	string
`results[].tags`	array
`results[].status`	string
`results[].severity`	string
`results[].reason`	string
`results[].executiveSummary`	string
`results[].recommendedActions`	array
`results[].signals`	array
`results[].latestMarkers`	array
`results[].targetSummary`	object
`results[].changes`	array
`results[].surfaces`	array
`results[].checkedAt`	timestamp
`results[].error`	null

Output Example

{
  "meta": {
    "generatedAt": "2026-04-04T14:33:02.189Z",
    "now": "2026-04-04T14:33:02.182Z",
    "input": {
      "targetCount": 2,
      "surfaceCount": 6,
      "delivery": "dataset",
      "datasetMode": "changes_only",
      "diffMode": "line_summary",
      "summaryMaxLines": 12,
      "concurrency": 2,
      "batchDelayMs": 0,
      "dryRun": false
    },
    "snapshot": {
      "key": "docs-changelog-drift-monitor-local",
      "loadedFrom": "local",
      "savedTo": "local"
    },
    "warnings": [
      "surface(nextjs-release-notes): includePatterns is empty, full extracted page text will be monitored",
      "surface(nextjs-migration-guide): includePatterns is empty, full extracted page text will be monitored",
      "surface(nextjs-caching): includePatterns is empty, full extracted page text will be monitored",
      "surface(nextjs-routing): includePatterns is empty, full extracted page text will be monitored",
      "surface(crawlee-changelog): includePatterns is empty, full extracted page text will be monitored",
      "surface(crawlee-upgrade): includePatterns is empty, full extracted page text will be monitored"
    ],
    "totals": {
      "targets": 2,
      "monitoredSurfaces": 6,
      "changedTargets": 0,
      "initialTargets": 2,
      "unchangedTargets": 0,
      "partialTargets": 0,
      "errorTargets": 0,
      "actionNeededTargets": 0,
      "changedSurfaces": 0,
      "initialSurfaces": 6,
      "unchangedSurfaces": 0,

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~docs-changelog-drift-monitor/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "targets": [ { "id": "nextjs", "name": "Next.js", "repo": "vercel/next.js", "criticality": "high", "owner": "Frontend Platform", "tags": ["framework", "docs"], "releaseNotesUrl": "https://github.com/vercel/next.js/releases.atom", "migrationGuideUrl": "https://nextjs.org/docs/app/guides/upgrading/version-16", "docsPages": [ { "id": "nextjs-caching", "name": "Caching Guide", "url": "https://nextjs.org/docs/app/guides/caching" } ] } ], "delivery": "dataset", "datasetMode": "changes_only", "snapshotKey": "docs-changelog-drift-nextjs", "diffMode": "line_summary", "summaryMaxLines": 12, "concurrency": 2 }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/docs-changelog-drift-monitor").call(run_input={
  "targets": [
    {
      "id": "nextjs",
      "name": "Next.js",
      "repo": "vercel/next.js",
      "criticality": "high",
      "owner": "Frontend Platform",
      "tags": ["framework", "docs"],
      "releaseNotesUrl": "https://github.com/vercel/next.js/releases.atom",
      "migrationGuideUrl": "https://nextjs.org/docs/app/guides/upgrading/version-16",
      "docsPages": [
        {
          "id": "nextjs-caching",
          "name": "Caching Guide",
          "url": "https://nextjs.org/docs/app/guides/caching"
        }
      ]
    }
  ],
  "delivery": "dataset",
  "datasetMode": "changes_only",
  "snapshotKey": "docs-changelog-drift-nextjs",
  "diffMode": "line_summary",
  "summaryMaxLines": 12,
  "concurrency": 2
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/docs-changelog-drift-monitor').call({
  "targets": [
    {
      "id": "nextjs",
      "name": "Next.js",
      "repo": "vercel/next.js",
      "criticality": "high",
      "owner": "Frontend Platform",
      "tags": ["framework", "docs"],
      "releaseNotesUrl": "https://github.com/vercel/next.js/releases.atom",
      "migrationGuideUrl": "https://nextjs.org/docs/app/guides/upgrading/version-16",
      "docsPages": [
        {
          "id": "nextjs-caching",
          "name": "Caching Guide",
          "url": "https://nextjs.org/docs/app/guides/caching"
        }
      ]
    }
  ],
  "delivery": "dataset",
  "datasetMode": "changes_only",
  "snapshotKey": "docs-changelog-drift-nextjs",
  "diffMode": "line_summary",
  "summaryMaxLines": 12,
  "concurrency": 2
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

Run nightly as part of your supply-chain monitoring to catch new vulnerabilities early.
Pair with oss-vulnerability-monitor for CVE coverage layered on top of version tracking.
For monorepos, run per-package rather than recursing — easier to triage alerts by team owner.
Use snapshotKey to persist between runs and only alert on diffs.
Webhook delivery supports JSON payloads — pipe into your existing on-call routing.

FAQ

Is my build slowed down?

This actor runs on Apify infrastructure, not your CI runners. No impact on build times.

What's the freshness of data?

Depends on the source registry — typically 5–60 minutes behind upstream.

Can I filter by package ecosystem?

Yes — most DevOps actors accept an ecosystem or package-manager filter in their input schema.

Does this work with private registries?

No — this actor targets public registries (npm, PyPI, crates.io, etc.). Private registries require credential handling that's out of scope.

Can I integrate with GitHub Actions?

Yes — call this actor via Apify API inside a workflow job, parse the JSON output, and fail the build on threshold violations.

DevOps & Tech Intel cluster — explore related Apify tools:

🌐 DNS Propagation Checker — Check DNS propagation across 8 global resolvers (Google, Cloudflare, Quad9, OpenDNS).
🔍 Subdomain Finder — Discover subdomains for any domain using Certificate Transparency logs (crt.
🧹 CSV Data Cleaner — Clean CSV data: trim whitespace, remove empty rows, deduplicate by columns, sort.
📦 NPM Package Analyzer — Analyze npm packages: download stats, dependencies, licenses, deprecation status.
💬 Reddit Scraper — Scrape Reddit posts and comments from any subreddit via official JSON API.
GitHub Release & Changelog Monitor API — Track GitHub releases, tags, release notes, and changelog drift over time with one summary-first repository row per repo.
Tech Events Calendar API | Conferences + CFP — Aggregate tech conferences and CFPs across multiple sources into a deduplicated event calendar for DevRel and recruiting workflows.
🔒 OSS Vulnerability Monitor — Monitor open-source packages for known security vulnerabilities using OSV and GitHub Security Advisories.

Cost

Pay Per Event:

actor-start: $0.01 (flat fee per run)
dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.

GitHub Release Notes & Changelog Scraper

happyfhantum/github-changelogs

Track GitHub releases and changelogs to spot product updates, launches, and breaking changes.

Kelsey Todd

App Store Release Notes Scraper

sivakivan/release-notes-scraper

Scrapes the What's New section from Apple App Store app pages. Returns the latest version, release date, and release notes for each app.

Ivan Šivák

Web-to-Markdown Generator for AI & RAG Pipelines

profitstack/web-to-markdown-generator-for-ai-rag-pipelines

Convert any website into clean, heading-based chunking, LLM-ready Markdown for RAG and AI agents.

Manas Mantri

Website to Markdown for LLM and RAG

jeweled_jockstrap/my-actor-3

Convert any URL to clean Markdown text for AI applications. Strips HTML extracts content. For LLM training RAG pipelines and vector databases. Free Firecrawl alternative.

Juan Triviño

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

128K

4.5

Docs-to-RAG Optimizer

vamsi-krishna/docs-to-rag-optimizer

Convert public developer documentation into clean Markdown, semantic RAG chunks, token counts, duplicate hashes, JSONL exports, and quality warnings for AI assistants.