Public Transport Alerts Scraper: Real-time Transit Data
Pricing
$7.99/month + usage
Public Transport Alerts Scraper: Real-time Transit Data
Extract real-time service alerts, route updates, and transit delays from any public transport portal. Perfect for transit apps and commuters needing structured data on service changes, dates, and direct links. Features proxy support and fast JSON/CSV export.
Pricing
$7.99/month + usage
Rating
0.0
(0)
Developer
Scrape Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
18 days ago
Last modified
Categories
Share
Public Transport Alerts Scraper | Service Disruptions, Delays & News from Any Transit Site
Scrape live service alerts, disruptions, delays, and notices from any public transport website, government portal, or news page โ automatically classified by severity and status. Works on TfL, MTA, SNCF, Deutsche Bahn, BART, National Rail, energy exchanges, and any custom URL.
๐ Table of Contents
- What Does This Actor Do?
- Quick Start โ 3 Steps
- Why This Public Transport Alerts Scraper?
- Supported Websites
- Use Cases
- What Data You Get
- Severity & Status Auto-Detection
- 3-Layer Extraction System
- Input Parameters
- Example Input & Output
- Filtering โ Keyword, Severity, Date
- Performance & Speed
- Cost Estimate
- Limitations
- Integrations
- FAQ
- Changelog
- Legal & Terms
๐จ What Does This Actor Do?
This public transport alerts scraper monitors service disruption pages, transit authority news portals, government alert feeds, and energy exchange announcement boards โ extracting structured alert records automatically from any URL you provide.
Every alert comes with:
- โ Service alert title โ the disruption, delay, or notice headline
- โ Route or service name โ auto-extracted from the title or domain
- โ Severity classification โ High, Moderate, or Low (auto-detected from alert text)
- โ Status โ Active, Planned, or Resolved (auto-detected)
- โ Dates โ start date, end date, or date range extracted from alert text
- โ Alert body โ full notice description where available
- โ Direct link โ URL to the original alert page
- โ Source domain โ which transit authority or portal published the alert
Paste in one URL or a list of transit portals. The actor parses the page, finds alert cards, classifies severity, extracts dates, and returns clean structured records โ ready for dashboards, alert systems, or compliance logs.
โก Quick Start โ 3 Steps
Step 1 โ Add your transit or news portal URL(s)
{"target_urls": ["https://tfl.gov.uk/travel-information/disruptions","https://www.mta.info/alerts","https://www.nationalrail.co.uk/service-disruptions"],"severity_filter": "High","max_results": 30}
Step 2 โ Click Run The actor fetches each page, runs the 3-layer extraction system, auto-detects severity and status, extracts dates, and pushes structured records to the Dataset.
Step 3 โ Get your public transport alert data
{"route": "Jubilee Line","service_alert": "Jubilee Line: Severe delays due to signal failure at London Bridge","alert_body": "Severe delays on the Jubilee line due to a signal failure at London Bridge. GOOD SERVICE on all other lines.","dates": "2024-10-30","link": "https://tfl.gov.uk/tube/stop/940GZZLULNB/london-bridge-tube-station","severity": "High","status": "Active","source": "tfl.gov.uk","processed_at": "2024-10-30T10:15:00Z"}
Your transport alerts are in the Dataset tab โ export as JSON, CSV, or Excel in one click.
๐ Why This Public Transport Alerts Scraper?
| Feature | This Actor | Manual Monitoring | Generic Scrapers |
|---|---|---|---|
| Works on any transit website | โ Any URL | โ One site at a time | โ ๏ธ Site-specific only |
| Auto-detects severity (High/Moderate/Low) | โ Built-in NLP | โ | โ |
| Auto-detects status (Active/Planned/Resolved) | โ Built-in | โ | โ |
| Auto-extracts date ranges from alert text | โ 5 date patterns | โ | โ |
| Auto-extracts route/service name | โ Pattern + domain map | โ | โ |
| 3-layer extraction (JSON-LD โ HTML โ links) | โ Max coverage | โ | โ |
| Multi-URL monitoring in one run | โ Unlimited URLs | โ | โ ๏ธ |
| Keyword filter | โ Built-in | โ | โ |
| Severity filter | โ Built-in | โ | โ |
| Date filter | โ Built-in | โ | โ |
| Energy exchange support (EPEX, EEX) | โ Yes | โ | โ |
| Residential proxy support | โ Recommended | โ | โ ๏ธ |
The only Apify actor that works as a universal public transport alerts scraper AND a general service disruption monitor โ with automatic severity classification, status detection, and date extraction built in.
๐ Supported Websites
This public transport alerts scraper is designed and tested to work on:
๐ Public Transport Authorities
| Authority | Country | Website |
|---|---|---|
| TfL (Transport for London) | ๐ฌ๐ง UK | tfl.gov.uk |
| National Rail | ๐ฌ๐ง UK | nationalrail.co.uk |
| MTA | ๐บ๐ธ USA | mta.info |
| BART | ๐บ๐ธ USA | bart.gov |
| SNCF | ๐ซ๐ท France | sncf.com |
| Deutsche Bahn | ๐ฉ๐ช Germany | bahn.de |
| Transport Ireland | ๐ฎ๐ช Ireland | rte.ie |
โก Energy Exchanges & Market Portals
| Platform | Description |
|---|---|
| EPEX SPOT | European Power Exchange โ market notices and system alerts |
| EEX | European Energy Exchange โ trading announcements and updates |
๐๏ธ Government & Public Service Portals
Any government or public authority website publishing service notices, disruptions, or announcements in standard HTML format.
๐ฐ News & Alert Portals
Any news website, industry portal, or announcement board using standard article cards, notice lists, or JSON-LD structured data.
Don't see your transit authority listed? This scraper works on any website with alert/notice content in HTML. Add the URL and run โ the 3-layer extraction system handles unknown page structures automatically.
๐ฏ Use Cases
๐ Real-Time Transit Disruption Monitoring
Monitor multiple public transport authority websites simultaneously. Get structured alerts for every delay, cancellation, and service disruption across your city or region โ classified by severity so you can prioritize High alerts instantly.
๐ฑ Commuter Alert App Backend
Power a commuter alert application or internal notification system with live data scraped from official transit authority pages. Schedule hourly runs and push new High-severity alerts to users automatically via Zapier or Make.
๐ข Corporate Fleet & Travel Management
Monitor transport disruptions affecting employee commutes or business travel routes. Filter by keyword (specific line, station, or route) to track only the alerts relevant to your organization's locations.
โก Energy Market Alert Monitoring
Track EPEX SPOT and EEX market notices, system announcements, and trading disruptions alongside transit alerts. Useful for energy traders, grid operators, and utilities that need structured alert feeds from exchange portals.
๐๏ธ Government & Regulatory Compliance
Scrape and archive service disruption notices from regulated public transport operators for compliance reporting, audit trails, or regulatory submissions. Each record includes processed_at timestamp and direct source link.
๐ Service Quality & SLA Reporting
Collect transport service alerts across multiple operators into a single dataset. Analyze disruption frequency, severity distribution, and affected routes to produce service quality reports or SLA breach documentation.
๐ค Automated Escalation Workflows
Filter for "severity": "High" and "status": "Active" to build an automated escalation pipeline โ fire Slack alerts, send SMS notifications, or create incident tickets whenever a major service disruption is detected.
๐ฐ Journalism & Public Information
Track transport disruptions across multiple authorities for reporting on infrastructure reliability, service quality, or regional transit performance. The structured output with dates and severity levels makes analysis straightforward.
๐ What Data You Get
Every alert extracted by this public transport alerts scraper contains up to 9 fields:
| Field | Type | Description | Example |
|---|---|---|---|
route | string | Auto-extracted route, line, or service name | "Jubilee Line" |
service_alert | string | Alert headline or notice title (up to 300 chars) | "Jubilee Line: Severe delays..." |
alert_body | string | Full alert description or notice text (up to 500 chars) | "Signal failure at London Bridge..." |
dates | string | Date or date range extracted from alert text | "2024-10-30 to 2024-10-31" |
link | string | Direct URL to the original alert or article | "https://tfl.gov.uk/..." |
severity | string | Auto-classified severity level | "High" / "Moderate" / "Low" |
status | string | Auto-detected alert status | "Active" / "Planned" / "Resolved" |
source | string | Domain of the source website | "tfl.gov.uk" |
processed_at | string | ISO 8601 timestamp of extraction | "2024-10-30T10:15:00Z" |
๐ด Severity & Status Auto-Detection
Every alert is automatically classified without manual tagging โ the actor analyzes the alert title and body text using keyword matching.
Severity Classification
| Level | Keywords That Trigger It | What It Means |
|---|---|---|
| High | emergency, critical, cancelled, suspended, shutdown, failure, disruption, outage, closure, major, severe, force majeure | Serious impact โ immediate attention required |
| Moderate | delay, delayed, maintenance, diversion, detour, modification, partial, limited, restricted, congestion | Partial impact โ service running but affected |
| Low | advisory, information, notice, reminder, planned, scheduled, minor, improvement, announcement | Informational โ minimal or no service impact |
Status Detection
| Status | Keywords That Trigger It |
|---|---|
| Resolved | completed, resolved, restored, ended, finished |
| Planned | planned, upcoming, future, scheduled for |
| Active | Default โ all alerts not matching Resolved or Planned |
Date Extraction
The actor recognizes 5 date patterns including date ranges (from X to Y), ISO format (2024-10-30), European format (30/10/2024), and natural language (30 October 2024). When a range is found, both start and end dates are preserved in "dates".
๐ง 3-Layer Extraction System
This public transport alerts scraper uses three extraction layers in sequence โ starting with the most structured and falling back to the most flexible:
Layer 1 โ JSON-LD Structured Data (Highest Accuracy)
Many modern government and transit websites publish alerts using JSON-LD markup (NewsArticle, Article, Event, AlertAction schema types). When found, the actor reads structured fields directly โ title, description, date, and URL โ with no HTML parsing required. This gives the most reliable and complete records.
Layer 2 โ HTML Card Parsing (Standard Coverage)
When JSON-LD is not present, the actor searches for alert cards using 14 CSS selectors in priority order โ from article tags and [class*='alert'] patterns to news list items and table rows. For each card found, it extracts the title, body text, link, and date using element-specific selectors.
Layer 3 โ Link Scan Fallback (Maximum Coverage)
When no structured cards are found (unusual page layouts, minimal HTML, custom frameworks), the actor falls back to scanning all links on the page. It filters out navigation links (login, home, about, contact) and returns meaningful alert links with whatever context text is available.
This 3-layer approach means this public transport alerts scraper handles pages from major transit authorities, bespoke government portals, and unusual page layouts that break single-method scrapers.
โ๏ธ Input Parameters
{"target_urls": ["https://tfl.gov.uk/travel-information/disruptions","https://www.mta.info/alerts","https://www.epexspot.com/en/news"],"target_url": "","keyword": "Central Line","severity_filter": "High","date_from": "2024-10-01","max_results": 50,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
| Parameter | Type | Default | Description |
|---|---|---|---|
target_urls | array or string | [] | List of transit or news portal URLs to scrape. One per item or comma/newline-separated. |
target_url | string | "" | Single URL shortcut โ added to target_urls automatically |
keyword | string | "" | Filter alerts by keyword โ checked against title and body text (case-insensitive) |
severity_filter | string | "" | Return only alerts of this severity: "High", "Moderate", or "Low". Leave empty for all. |
date_from | string | "" | Return only alerts dated on or after this date (format: YYYY-MM-DD) |
max_results | integer | 50 | Maximum total alerts to return across all URLs |
proxyConfiguration | object | Off | Apify proxy config โ RESIDENTIAL recommended for government and transit sites |
๐ฆ Example Input & Output
Example 1 โ Monitor TfL and MTA for High-Severity Alerts
Input:
{"target_urls": ["https://tfl.gov.uk/travel-information/disruptions","https://www.mta.info/alerts"],"severity_filter": "High","max_results": 20}
Output (one TfL record):
{"route": "Central Line","service_alert": "Central Line: Part suspension โ no service between Liverpool Street and Stratford","alert_body": "No service between Liverpool Street and Stratford due to an earlier person ill incident. GOOD SERVICE on all other sections.","dates": "2024-10-30","link": "https://tfl.gov.uk/tube/status/","severity": "High","status": "Active","source": "tfl.gov.uk","processed_at": "2024-10-30T08:45:00Z"}
Example 2 โ Track Planned Maintenance Across Multiple Rail Operators
Input:
{"target_urls": ["https://www.nationalrail.co.uk/travel-information/service-disruptions","https://www.bahn.de/service/information"],"keyword": "maintenance","max_results": 30}
Output: All alerts containing "maintenance" from National Rail and Deutsche Bahn โ automatically classified by severity, with "status": "Planned" where the alert text indicates scheduled work.
Example 3 โ EPEX SPOT Market Notices
Input:
{"target_urls": ["https://www.epexspot.com/en/news"],"max_results": 20}
Output: Structured market notices and system announcements from EPEX SPOT โ each with auto-detected severity, date, and direct link to the original notice.
Example 4 โ Keyword Filter for Specific Route
Input:
{"target_urls": ["https://tfl.gov.uk/travel-information/disruptions"],"keyword": "Jubilee","max_results": 10}
Output: Only alerts mentioning "Jubilee" โ useful for monitoring a specific line without filtering through all disruptions manually.
๐ Filtering โ Keyword, Severity, Date
Three independent filters can be combined in any combination:
keyword โ Case-insensitive substring match against the alert title and body text. Use to narrow to a specific route ("Jubilee Line"), station ("London Bridge"), incident type ("signal failure"), or any term relevant to your monitoring needs.
severity_filter โ Returns only alerts at the specified severity level. Combine with keyword for precision: "severity_filter": "High" + "keyword": "suspension" returns only urgent suspension alerts.
date_from โ Returns only alerts dated on or after the specified date in YYYY-MM-DD format. Useful when scheduling recurring runs โ set date_from to yesterday's date to get only new alerts since your last run.
Combining all three:
{"keyword": "Northern Line","severity_filter": "High","date_from": "2024-10-01"}
Returns only High-severity Northern Line alerts from October 2024 onwards.
โก Performance & Speed
| Configuration | URLs | Estimated Time |
|---|---|---|
| Single transit site, 20 alerts | 1 URL | ~10โ20 seconds |
| 3 transit portals, 50 alerts | 3 URLs | ~30โ60 seconds |
| 5 portals, 100 alerts | 5 URLs | ~1โ2 minutes |
| 10 portals, 200 alerts | 10 URLs | ~2โ4 minutes |
Each URL includes a 1.5โ3 second random delay before the next request โ safe for government and transit portals that rate-limit aggressive scrapers.
Residential proxy is strongly recommended for government and transit authority websites. Many transit portals (TfL, SNCF, Deutsche Bahn) block datacenter IPs. A 403 response from any URL is logged with a clear message to enable proxy.
๐ฐ Cost Estimate
Subscription: $7.99/month ยท Free Trial: 2 Hours (no credit card required)
| Run Type | Apify Compute Units | Approx. Compute Cost |
|---|---|---|
| 1 URL, 20 alerts | ~0.01โ0.02 CU | < $0.01 |
| 5 URLs, 100 alerts | ~0.05โ0.10 CU | < $0.01 |
| 10 URLs, 200 alerts | ~0.10โ0.20 CU | ~$0.01 |
| Scheduled hourly (30-day month) | ~1โ3 CU/month | ~$0.08โ$0.24 |
| Scheduled daily (30-day month) | ~0.1โ0.3 CU/month | < $0.02 |
This actor is extremely compute-efficient โ most of the runtime is network wait time (fetching pages), not compute. The $7.99 subscription covers unlimited runs with negligible compute costs.
โ ๏ธ Limitations
Being transparent about what this actor cannot do:
- โ JavaScript-rendered alert pages โ Some transit authority sites (particularly newer React/Vue apps) load alert content via JavaScript after page load. This actor fetches raw HTML โ pages that require JS execution to display alerts may return empty or partial results. Residential proxy often helps, but JS-heavy pages are a known limitation.
- โ Login-gated portals โ Alerts behind authentication walls cannot be accessed.
- โ Real-time push/WebSocket feeds โ This actor fetches on demand. It is not a persistent connection to transit systems. For near-real-time monitoring, schedule runs every 15โ30 minutes.
- โ PDF or document attachments โ Some portals publish alerts as PDF downloads. Only HTML-visible text is extracted.
- โ 100% date accuracy โ Date extraction uses pattern matching on free text. Unusual date formats or dates embedded in complex sentence structures may not be parsed correctly. The extraction date (
processed_at) is always accurate. - โ Non-English severity keywords โ Severity detection uses English keywords. French, German, or other language transit sites may return
"Low"severity for high-impact alerts if the triggering words appear only in the non-English text. - โ API-based transit feeds (GTFS-RT) โ This actor scrapes HTML web pages, not binary GTFS Realtime feeds. For GTFS-RT data, a different solution is required.
๐ Integrations
Slack / Microsoft Teams Alerts via Zapier or Make
Schedule hourly runs with "severity_filter": "High". When new High-severity alerts appear, trigger a Zapier workflow to post formatted disruption notices to your team's Slack channel automatically.
Google Sheets Disruption Log
Export results to Google Sheets after each run. Use conditional formatting to highlight rows where severity = "High" in red, "Moderate" in yellow. Build a rolling disruption log across multiple transit authorities.
Apify API โ Scheduled Monitoring
// Trigger a public transport alerts scrape via APIconst run = await fetch("https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs", {method: "POST",headers: {"Content-Type": "application/json","Authorization": "Bearer YOUR_TOKEN"},body: JSON.stringify({target_urls: ["https://tfl.gov.uk/travel-information/disruptions"],severity_filter: "High",max_results: 50})});
n8n Incident Response Pipeline
Use the Apify node in n8n to run this actor on a 30-minute schedule. Route High-severity active alerts to an incident management tool (PagerDuty, Jira, ServiceNow) and Moderate alerts to a daily digest email.
Airtable Service Disruption Database
Push all extracted alerts to Airtable with severity, status, route, and date fields. Build a searchable disruption history across all monitored transit operators.
โ FAQ
Q: Does this work on any transit website, or only the listed ones? A: Any website. The 3-layer extraction system (JSON-LD โ HTML cards โ link scan) is designed to handle unknown page structures. Paste in any transit authority, government portal, or news site URL and it will extract whatever alert-like content is available.
Q: Why is residential proxy recommended?
A: Government and transit authority websites (TfL, SNCF, Deutsche Bahn) often block datacenter IPs used by cloud scraping services. Residential proxy routes requests through real consumer IP addresses, which are treated as regular browser traffic. Without it, these sites may return 403 Forbidden.
Q: How accurate is the severity classification? A: Very accurate for English-language transit alerts. The keyword lists cover the standard vocabulary used by transit authorities worldwide. Severity is detected from the alert title and body combined โ reducing false negatives from neutral-sounding titles with urgent body text.
Q: Can I monitor multiple cities in one run?
A: Yes. Add as many target_urls as needed โ TfL for London, MTA for New York, SNCF for France, and Deutsche Bahn for Germany can all be monitored in a single run. Results are tagged with their source domain for easy filtering.
Q: How do I get only new alerts since my last run?
A: Use date_from set to yesterday's date. Each alert's dates field is compared against this filter before inclusion. Combine with a scheduled run for a rolling new-alerts-only feed.
Q: What happens if a page layout changes on a transit site? A: The 3-layer extraction system is resilient to layout changes โ if the primary HTML selector stops matching, the system falls back to link scanning. For major structural changes, the actor may require an update. Report issues via the Apify page and fixes are prioritized.
Q: Can I use this for energy market alerts from EPEX SPOT or EEX?
A: Yes. EPEX SPOT (epexspot.com/en/news) is included in the domain-to-route mapping, which labels records with "EPEX SPOT Market" as the route. EEX works the same way. Useful for energy traders and grid operators monitoring market notices alongside transit alerts.
Q: Why do some alerts show today's date instead of the actual alert date?
A: When no date can be extracted from the alert text (no recognizable date pattern found), the actor falls back to today's date. This ensures every record has a dates value โ but it may not reflect the actual alert date for undated notices.
๐ Changelog
v1.0.0 (Current)
- โ Public transport alerts scraping from any HTML website
- โ 3-layer extraction: JSON-LD โ HTML card selectors โ link scan fallback
- โ 14 CSS selector patterns for alert card detection
- โ Auto-severity detection: High / Moderate / Low (keyword-based)
- โ Auto-status detection: Active / Planned / Resolved (keyword-based)
- โ Auto-date extraction: 5 regex patterns including ranges, ISO, European, and natural language formats
- โ Auto-route extraction: named line/route patterns + domain-to-service mapping (TfL, MTA, SNCF, Deutsche Bahn, BART, National Rail, EPEX SPOT, EEX)
- โ Keyword filter across title and body text
- โ Severity filter (High / Moderate / Low)
- โ Date-from filter
- โ Multi-URL support โ unlimited portals per run
- โ Deduplication by alert link across all pages
- โ 1.5โ3 second random delay between URL requests
- โ 3 retry attempts on failed requests with backoff
- โ
403detection with proxy recommendation in logs - โ
Results pushed to Dataset and Key-Value Store (
results.json) - โ
Residential proxy support via
curl_cffiChrome 110 impersonation - ๐ Coming next: Non-English severity keywords, GTFS-RT feed support
โ๏ธ Legal & Terms
This public transport alerts scraper accesses publicly visible service disruption notices, news articles, and alert announcements โ the same content visible to any user visiting the websites in a browser without logging in.
Please use responsibly:
- Only scrape publicly accessible pages โ do not attempt to access login-gated or restricted areas
- Respect the
robots.txtand Terms of Service of each website you monitor - Transit authority and government data is provided for informational purposes โ always verify critical safety information directly with the issuing authority
- Do not republish scraped alert data as your own without attribution to the original source
- This actor is intended for monitoring, alerting, research, compliance, and legitimate operational workflows
Safety Note: For real-time safety-critical transport decisions, always verify alerts directly with the relevant transit authority. Do not rely solely on scraped data for emergency or safety-critical operations.
๐ค Support
- Page not parsing correctly? Share the URL via the Apify actor page โ new site support and selector improvements are prioritized based on feedback
- Need non-English severity detection or GTFS-RT support? Drop a feature request โ these are on the roadmap
- Works well for your monitoring workflow? A โญ review on the Apify Store helps others find this public transport alerts scraper and keeps it actively maintained
Public Transport Alerts Scraper ยท Built on Apify
TfL ยท MTA ยท SNCF ยท Deutsche Bahn ยท BART ยท National Rail ยท EPEX SPOT ยท Any Transit Site ยท Auto-Severity ยท Real-Time