RFC Editor Index Scraper avatar

RFC Editor Index Scraper

Pricing

from $11.00 / 1,000 result items

Go to Apify Store
RFC Editor Index Scraper

RFC Editor Index Scraper

Export RFC documents from the RFC Editor index. Query 9,000+ Internet standards by RFC number, status, stream, or title keyword. Pull title, authors, status, stream, publish date, abstract, format URLs, obsoletes, updates.

Pricing

from $11.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

πŸ“œ RFC Editor Index Scraper

πŸš€ Export the full IETF standards catalog in seconds. Pull 9,700+ RFCs with titles, authors, status, stream, abstracts, and obsoletes/updates relationships. No login, no manual XML parsing, no broken anchor links.

πŸ•’ Last updated: 2026-05-22 Β· πŸ“Š 23 fields per record Β· πŸ“œ 9,700+ RFCs Β· 🌐 6 streams Β· 🏷️ 8 status tiers

The RFC Editor Scraper exports the canonical Internet standards index and returns 23 fields per record, including RFC number, title, full author list, publication status, document stream, publish date, page count, abstract, DOI, keywords, obsoletes and updates relationships, errata flags, and direct links to HTML, PDF, TXT, and JSON formats. The underlying index is the authoritative catalog of every published Request for Comments since RFC 1 in 1969.

The catalog spans 9,700+ documents, six document streams (IETF, IAB, IRTF, Independent, Editorial, Legacy), and eight publication status tiers. This Actor turns the official document index into a downloadable dataset as CSV, Excel, JSON, or XML in under five minutes. Three query modes let you pull by RFC numbers, keyword, status, or stream.

🎯 Target AudienceπŸ’‘ Primary Use Cases
Protocol developers, network engineers, security researchers, standards bodies, technical writers, library scientistsSpec auditing, obsoletes-chain tracing, BCP discovery, citation management, internal knowledge bases, compliance checklists

πŸ“‹ What the RFC Editor Scraper does

Three retrieval workflows in a single run:

  • 🎯 Direct lookup. Pass a list of RFC numbers like [791, 2616, 9110] and get those records.
  • πŸ”Ž Keyword search. Filter the index by title or abstract keyword (case-insensitive).
  • 🏷️ Status & stream filters. Restrict to Internet Standards, Proposed Standards, Best Current Practices, or any of the six streams.

Each record bundles identifiers (RFC ID, number, DOI), publishing metadata (status, stream, publish date, page count), authorship, full abstract text, relationship chains (obsoletes, obsoletedBy, updates, updatedBy), keyword tags, errata flag, and download links to HTML, PDF, plain text, and JSON formats.

πŸ’‘ Why it matters: the official index is split across XML files, sub-series indexes, and a website with no bulk export. Tracing which RFC obsoletes another, or finding every BCP in the IRTF stream, normally means writing parsers against undocumented XML. This Actor returns a clean structured dataset in one run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


βš™οΈ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
rfcNumbersarray[]Optional list of RFC numbers to fetch directly. Takes precedence over filters.
searchQuerystring""Keyword filter against the title (case-insensitive).
statusenum""One of 8 publication status tiers.
streamenum""One of 6 document streams: IETF, IAB, IRTF, Independent, Editorial, Legacy.

Example: HTTP-related standards in the IETF stream.

{
"maxItems": 50,
"searchQuery": "HTTP",
"stream": "IETF"
}

Example: fetch IP, HTTP, and TLS foundational specs.

{
"rfcNumbers": [791, 9110, 8446]
}

⚠️ Good to Know: the obsoletes/updates fields reflect the catalog at fetch time. If you build a long-running RFC tracker, schedule a daily refresh so newly published documents and re-classified entries flow through. Status names follow the canonical RFC Editor vocabulary, not informal shorthand.


πŸ“Š Output

Each RFC record contains 23 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
πŸ†” rfcIdstring"RFC9110"
πŸ”’ numbernumber9110
πŸ“ titlestring"HTTP Semantics"
πŸ‘₯ authorsarray["R. Fielding, Ed.", "M. Nottingham, Ed.", "J. Reschke, Ed."]
🏷️ statusstring"INTERNET STANDARD"
🌐 streamstring"IETF"
πŸ“… publishDatestring"June 2022"
πŸ“„ pagesnumber203
πŸ“– abstractstring"The Hypertext Transfer Protocol (HTTP) is a stateless..."
πŸ”– doistring"10.17487/RFC9110"
🏷️ keywordsarray["http", "semantics", "messaging"]
πŸ” obsoletesarray[2818, 7230, 7231, 7232, 7233, 7235, 7538, 7615, 7694]
🚫 obsoletedByarray[]
πŸ“ updatesarray[3864]
πŸ”„ updatedByarray[]
πŸ”— seeAlsoarray["STD97"]
πŸ“¦ formatsarray["ASCII", "HTML", "PDF", "XML"]
🌐 htmlUrlstring"https://www.rfc-editor.org/rfc/rfc9110.html"
πŸ“• pdfUrlstring"https://www.rfc-editor.org/rfc/rfc9110.pdf"
πŸ“ƒ txtUrlstring"https://www.rfc-editor.org/rfc/rfc9110.txt"
🧾 jsonUrlstring"https://www.rfc-editor.org/rfc/rfc9110.json"
⚠️ erratabooleantrue
πŸ•’ scrapedAtISO 8601"2026-05-22T00:00:00.000Z"
⚠️ errorstring | nullnull

πŸ“¦ Sample records


✨ Why choose this Actor

Capability
πŸ“œFull catalog. Every RFC published since 1969, from RFC 1 through the latest release, in one queryable dataset.
πŸ”Relationship chains. Obsoletes, obsoletedBy, updates, and updatedBy fields ready for graph traversal.
🏷️Stream and status filters. Slice the index by IETF, IRTF, IAB, Independent, Editorial, or Legacy streams and by 8 status tiers.
πŸ“¦All formats linked. Direct download URLs for HTML, PDF, TXT, and JSON renderings of each document.
πŸ”–DOI included. Citable identifiers for academic papers and bibliography managers.
πŸ”Always fresh. Every run reads the live index so newly published RFCs flow through automatically.
🚫No authentication. Public standards catalog. No login, no token.

πŸ“Š RFCs are the operating system of the Internet. Knowing which spec obsoletes which is the difference between a working implementation and a security advisory.


πŸ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ RFC Editor Scraper (this Actor)$5 free credit, then pay-per-use9,700+ RFCsLive per runnumbers, keyword, status, stream⚑ 2 min
Raw XML index parsingFreeFullLiveNone🐒 Days
GitHub mirror of rfc-editor-dataFreeFull but staleWeeklyNoneπŸ•’ Variable
Commercial standards database$500+/yearRFCs + ISO + ITUDailyMany⏳ Hours

Pick this Actor when you want clean structured RFC records, relationship chains, and zero parsing work.


πŸš€ How to use

  1. πŸ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the RFC Editor Index Scraper page on the Apify Store.
  3. 🎯 Set input. Pass RFC numbers directly, or filter by keyword, status, and stream.
  4. πŸš€ Run it. Click Start and let the Actor collect your dataset.
  5. πŸ“₯ Download. Grab results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


πŸ’Ό Business use cases

πŸ› οΈ Protocol Engineering

  • Trace obsoletes chains before shipping a new implementation
  • Discover every BCP relevant to a given technology
  • Track newly published RFCs against an internal feature matrix
  • Build a private knowledge base of standards your team relies on

πŸ” Security & Compliance

  • Cross-reference CVE advisories with the underlying RFC text
  • Audit deprecated specs that are still in production
  • Build TLS, OAuth, and DNS hardening checklists from primary sources
  • Map vendor disclosures back to the affected standards

πŸ“Š Research & Analytics

  • Quantify standards evolution by year and stream
  • Build citation networks across the RFC corpus
  • Compute author productivity and co-authorship graphs
  • Track how long PROPOSED takes to reach INTERNET STANDARD

πŸ“š Technical Writing & Education

  • Generate teaching materials with verified spec quotes
  • Build a syllabus around foundational networking RFCs
  • Auto-fill bibliography entries with DOI and authors
  • Power a doc-search experience over the RFC corpus

πŸ”Œ Automating RFC Editor Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟒 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • πŸ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily or weekly refreshes keep an internal RFC mirror up to date without manual intervention.


🌟 Beyond business use cases

Standards data powers more than commercial workflows. The same records support research, education, civic projects, and personal initiatives.

πŸŽ“ Research and academia

  • Citation networks across the IETF corpus
  • Standards-evolution studies for computer science theses
  • Reproducible bibliographies with cited dataset pulls
  • Open-knowledge contributions to networking education

🎨 Personal and creative

  • Build a personal RFC reading list
  • Hobby projects on protocol visualization
  • Side projects exploring obsoletes graphs
  • Curated newsletters on newly published standards

🀝 Non-profit and civic

  • Public-interest tooling around open Internet standards
  • Civic transparency about protocol governance
  • Investigative journalism on protocol vulnerabilities
  • Volunteer documentation projects

πŸ§ͺ Experimentation

  • Train spec-summarization or QA models
  • Validate documentation hypotheses against real RFC text
  • Prototype agent pipelines that answer spec questions
  • Seed graph databases with relationship chains

πŸ€– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pass RFC numbers directly or set keyword, status, and stream filters. The Actor reads the live RFC index, normalizes each entry, and emits one record per RFC. No browser automation, no captchas, no setup.

πŸ“ How accurate are the relationship chains?

The obsoletes, obsoletedBy, updates, and updatedBy lists come straight from the official index. When a new RFC re-classifies an older one, the change flows through on your next run.

πŸ” How often is the catalog refreshed?

The RFC Editor publishes updates as documents move through the process. Every run of this Actor fetches the latest index so your dataset reflects the current state.

🌐 What streams are supported?

All six: IETF, IAB, IRTF, Independent, Editorial, and Legacy. Filter to one stream or pull the whole catalog.

Yes. Every record includes URLs for the HTML, PDF, plain text, and JSON renderings of the document.

πŸ”– Can I use the DOI to cite RFCs in academic papers?

Yes. The DOI field gives you a citable identifier that resolves directly to the canonical RFC page and works with reference managers like Zotero and Mendeley.

πŸ’Ό Can I use this data commercially?

Yes. RFCs are public-domain or BCP-licensed depending on the stream and era. You are responsible for complying with any specific document's IPR statement.

πŸ’³ Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small pulls (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

πŸ” What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.

⚠️ Does it flag RFCs that have errata?

Yes. The errata boolean is set when the document has at least one reported erratum. Cross-check the RFC Editor errata page for the full report list.

πŸ†˜ What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


πŸ”Œ Integrate with any app

RFC Editor Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe RFC records into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Update an internal standards mirror, or alert your team in Slack when a watched RFC is obsoleted.


πŸ’‘ Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


πŸ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the RFC Editor, the IETF, or any of its contributors. All trademarks mentioned are the property of their respective owners. Only publicly available open standards data is collected.