RFC Editor Index Scraper
Pricing
from $11.00 / 1,000 result items
RFC Editor Index Scraper
Export RFC documents from the RFC Editor index. Query 9,000+ Internet standards by RFC number, status, stream, or title keyword. Pull title, authors, status, stream, publish date, abstract, format URLs, obsoletes, updates.
Pricing
from $11.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share

π RFC Editor Index Scraper
π Export the full IETF standards catalog in seconds. Pull 9,700+ RFCs with titles, authors, status, stream, abstracts, and obsoletes/updates relationships. No login, no manual XML parsing, no broken anchor links.
π Last updated: 2026-05-22 Β· π 23 fields per record Β· π 9,700+ RFCs Β· π 6 streams Β· π·οΈ 8 status tiers
The RFC Editor Scraper exports the canonical Internet standards index and returns 23 fields per record, including RFC number, title, full author list, publication status, document stream, publish date, page count, abstract, DOI, keywords, obsoletes and updates relationships, errata flags, and direct links to HTML, PDF, TXT, and JSON formats. The underlying index is the authoritative catalog of every published Request for Comments since RFC 1 in 1969.
The catalog spans 9,700+ documents, six document streams (IETF, IAB, IRTF, Independent, Editorial, Legacy), and eight publication status tiers. This Actor turns the official document index into a downloadable dataset as CSV, Excel, JSON, or XML in under five minutes. Three query modes let you pull by RFC numbers, keyword, status, or stream.
| π― Target Audience | π‘ Primary Use Cases |
|---|---|
| Protocol developers, network engineers, security researchers, standards bodies, technical writers, library scientists | Spec auditing, obsoletes-chain tracing, BCP discovery, citation management, internal knowledge bases, compliance checklists |
π What the RFC Editor Scraper does
Three retrieval workflows in a single run:
- π― Direct lookup. Pass a list of RFC numbers like
[791, 2616, 9110]and get those records. - π Keyword search. Filter the index by title or abstract keyword (case-insensitive).
- π·οΈ Status & stream filters. Restrict to Internet Standards, Proposed Standards, Best Current Practices, or any of the six streams.
Each record bundles identifiers (RFC ID, number, DOI), publishing metadata (status, stream, publish date, page count), authorship, full abstract text, relationship chains (obsoletes, obsoletedBy, updates, updatedBy), keyword tags, errata flag, and download links to HTML, PDF, plain text, and JSON formats.
π‘ Why it matters: the official index is split across XML files, sub-series indexes, and a website with no bulk export. Tracing which RFC obsoletes another, or finding every BCP in the IRTF stream, normally means writing parsers against undocumented XML. This Actor returns a clean structured dataset in one run.
π¬ Full Demo
π§ Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
βοΈ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
rfcNumbers | array | [] | Optional list of RFC numbers to fetch directly. Takes precedence over filters. |
searchQuery | string | "" | Keyword filter against the title (case-insensitive). |
status | enum | "" | One of 8 publication status tiers. |
stream | enum | "" | One of 6 document streams: IETF, IAB, IRTF, Independent, Editorial, Legacy. |
Example: HTTP-related standards in the IETF stream.
{"maxItems": 50,"searchQuery": "HTTP","stream": "IETF"}
Example: fetch IP, HTTP, and TLS foundational specs.
{"rfcNumbers": [791, 9110, 8446]}
β οΈ Good to Know: the obsoletes/updates fields reflect the catalog at fetch time. If you build a long-running RFC tracker, schedule a daily refresh so newly published documents and re-classified entries flow through. Status names follow the canonical RFC Editor vocabulary, not informal shorthand.
π Output
Each RFC record contains 23 fields. Download the dataset as CSV, Excel, JSON, or XML.
π§Ύ Schema
| Field | Type | Example |
|---|---|---|
π rfcId | string | "RFC9110" |
π’ number | number | 9110 |
π title | string | "HTTP Semantics" |
π₯ authors | array | ["R. Fielding, Ed.", "M. Nottingham, Ed.", "J. Reschke, Ed."] |
π·οΈ status | string | "INTERNET STANDARD" |
π stream | string | "IETF" |
π
publishDate | string | "June 2022" |
π pages | number | 203 |
π abstract | string | "The Hypertext Transfer Protocol (HTTP) is a stateless..." |
π doi | string | "10.17487/RFC9110" |
π·οΈ keywords | array | ["http", "semantics", "messaging"] |
π obsoletes | array | [2818, 7230, 7231, 7232, 7233, 7235, 7538, 7615, 7694] |
π« obsoletedBy | array | [] |
π updates | array | [3864] |
π updatedBy | array | [] |
π seeAlso | array | ["STD97"] |
π¦ formats | array | ["ASCII", "HTML", "PDF", "XML"] |
π htmlUrl | string | "https://www.rfc-editor.org/rfc/rfc9110.html" |
π pdfUrl | string | "https://www.rfc-editor.org/rfc/rfc9110.pdf" |
π txtUrl | string | "https://www.rfc-editor.org/rfc/rfc9110.txt" |
π§Ύ jsonUrl | string | "https://www.rfc-editor.org/rfc/rfc9110.json" |
β οΈ errata | boolean | true |
π scrapedAt | ISO 8601 | "2026-05-22T00:00:00.000Z" |
β οΈ error | string | null | null |
π¦ Sample records
β¨ Why choose this Actor
| Capability | |
|---|---|
| π | Full catalog. Every RFC published since 1969, from RFC 1 through the latest release, in one queryable dataset. |
| π | Relationship chains. Obsoletes, obsoletedBy, updates, and updatedBy fields ready for graph traversal. |
| π·οΈ | Stream and status filters. Slice the index by IETF, IRTF, IAB, Independent, Editorial, or Legacy streams and by 8 status tiers. |
| π¦ | All formats linked. Direct download URLs for HTML, PDF, TXT, and JSON renderings of each document. |
| π | DOI included. Citable identifiers for academic papers and bibliography managers. |
| π | Always fresh. Every run reads the live index so newly published RFCs flow through automatically. |
| π« | No authentication. Public standards catalog. No login, no token. |
π RFCs are the operating system of the Internet. Knowing which spec obsoletes which is the difference between a working implementation and a security advisory.
π How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| β RFC Editor Scraper (this Actor) | $5 free credit, then pay-per-use | 9,700+ RFCs | Live per run | numbers, keyword, status, stream | β‘ 2 min |
| Raw XML index parsing | Free | Full | Live | None | π’ Days |
| GitHub mirror of rfc-editor-data | Free | Full but stale | Weekly | None | π Variable |
| Commercial standards database | $500+/year | RFCs + ISO + ITU | Daily | Many | β³ Hours |
Pick this Actor when you want clean structured RFC records, relationship chains, and zero parsing work.
π How to use
- π Sign up. Create a free account with $5 credit (takes 2 minutes).
- π Open the Actor. Go to the RFC Editor Index Scraper page on the Apify Store.
- π― Set input. Pass RFC numbers directly, or filter by keyword, status, and stream.
- π Run it. Click Start and let the Actor collect your dataset.
- π₯ Download. Grab results in the Dataset tab as CSV, Excel, JSON, or XML.
β±οΈ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
πΌ Business use cases
π Automating RFC Editor Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- π’ Node.js. Install the
apify-clientNPM package. - π Python. Use the
apify-clientPyPI package. - π See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily or weekly refreshes keep an internal RFC mirror up to date without manual intervention.
π Beyond business use cases
Standards data powers more than commercial workflows. The same records support research, education, civic projects, and personal initiatives.
π€ Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- π¬ ChatGPT
- π§ Claude
- π Perplexity
- π Copilot
β Frequently Asked Questions
π§© How does it work?
Pass RFC numbers directly or set keyword, status, and stream filters. The Actor reads the live RFC index, normalizes each entry, and emits one record per RFC. No browser automation, no captchas, no setup.
π How accurate are the relationship chains?
The obsoletes, obsoletedBy, updates, and updatedBy lists come straight from the official index. When a new RFC re-classifies an older one, the change flows through on your next run.
π How often is the catalog refreshed?
The RFC Editor publishes updates as documents move through the process. Every run of this Actor fetches the latest index so your dataset reflects the current state.
π What streams are supported?
All six: IETF, IAB, IRTF, Independent, Editorial, and Legacy. Filter to one stream or pull the whole catalog.
π¦ Do I get download links for each format?
Yes. Every record includes URLs for the HTML, PDF, plain text, and JSON renderings of the document.
π Can I use the DOI to cite RFCs in academic papers?
Yes. The DOI field gives you a citable identifier that resolves directly to the canonical RFC page and works with reference managers like Zotero and Mendeley.
πΌ Can I use this data commercially?
Yes. RFCs are public-domain or BCP-licensed depending on the stream and era. You are responsible for complying with any specific document's IPR statement.
π³ Do I need a paid Apify plan to use this Actor?
No. The free Apify plan is enough for testing and small pulls (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.
π What happens if a run fails or gets interrupted?
Apify automatically retries transient errors. If a run still fails, inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.
β οΈ Does it flag RFCs that have errata?
Yes. The errata boolean is set when the document has at least one reported erratum. Cross-check the RFC Editor errata page for the full report list.
π What if I need help?
Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.
π Integrate with any app
RFC Editor Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe RFC records into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Update an internal standards mirror, or alert your team in Slack when a watched RFC is obsoleted.
π Recommended Actors
- π GitHub Status History Scraper - GitHub uptime, incidents, and component history
- π arXiv Scraper - Preprint papers across physics, math, and CS
- π¬ ClinicalTrials.gov Scraper - Registered medical trials with outcomes and sponsors
- π OSF Scraper - Open Science Framework preregistrations and projects
- π IP Geolocation Scraper - Bulk IPv4/IPv6 geolocation lookups
π‘ Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
π Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
β οΈ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the RFC Editor, the IETF, or any of its contributors. All trademarks mentioned are the property of their respective owners. Only publicly available open standards data is collected.