Public Transport Alerts Scraper: Real-time Transit Data avatar

Public Transport Alerts Scraper: Real-time Transit Data

Pricing

$7.99/month + usage

Go to Apify Store
Public Transport Alerts Scraper: Real-time Transit Data

Public Transport Alerts Scraper: Real-time Transit Data

Extract real-time service alerts, route updates, and transit delays from any public transport portal. Perfect for transit apps and commuters needing structured data on service changes, dates, and direct links. Features proxy support and fast JSON/CSV export.

Pricing

$7.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 days ago

Last modified

Share

Public Transport Alerts Scraper | Service Disruptions, Delays & News from Any Transit Site

Scrape live service alerts, disruptions, delays, and notices from any public transport website, government portal, or news page โ€” automatically classified by severity and status. Works on TfL, MTA, SNCF, Deutsche Bahn, BART, National Rail, energy exchanges, and any custom URL.


๐Ÿ“Œ Table of Contents


๐Ÿšจ What Does This Actor Do?

This public transport alerts scraper monitors service disruption pages, transit authority news portals, government alert feeds, and energy exchange announcement boards โ€” extracting structured alert records automatically from any URL you provide.

Every alert comes with:

  • โœ… Service alert title โ€” the disruption, delay, or notice headline
  • โœ… Route or service name โ€” auto-extracted from the title or domain
  • โœ… Severity classification โ€” High, Moderate, or Low (auto-detected from alert text)
  • โœ… Status โ€” Active, Planned, or Resolved (auto-detected)
  • โœ… Dates โ€” start date, end date, or date range extracted from alert text
  • โœ… Alert body โ€” full notice description where available
  • โœ… Direct link โ€” URL to the original alert page
  • โœ… Source domain โ€” which transit authority or portal published the alert

Paste in one URL or a list of transit portals. The actor parses the page, finds alert cards, classifies severity, extracts dates, and returns clean structured records โ€” ready for dashboards, alert systems, or compliance logs.


โšก Quick Start โ€” 3 Steps

Step 1 โ€” Add your transit or news portal URL(s)

{
"target_urls": [
"https://tfl.gov.uk/travel-information/disruptions",
"https://www.mta.info/alerts",
"https://www.nationalrail.co.uk/service-disruptions"
],
"severity_filter": "High",
"max_results": 30
}

Step 2 โ€” Click Run The actor fetches each page, runs the 3-layer extraction system, auto-detects severity and status, extracts dates, and pushes structured records to the Dataset.

Step 3 โ€” Get your public transport alert data

{
"route": "Jubilee Line",
"service_alert": "Jubilee Line: Severe delays due to signal failure at London Bridge",
"alert_body": "Severe delays on the Jubilee line due to a signal failure at London Bridge. GOOD SERVICE on all other lines.",
"dates": "2024-10-30",
"link": "https://tfl.gov.uk/tube/stop/940GZZLULNB/london-bridge-tube-station",
"severity": "High",
"status": "Active",
"source": "tfl.gov.uk",
"processed_at": "2024-10-30T10:15:00Z"
}

Your transport alerts are in the Dataset tab โ€” export as JSON, CSV, or Excel in one click.


๐Ÿ† Why This Public Transport Alerts Scraper?

FeatureThis ActorManual MonitoringGeneric Scrapers
Works on any transit websiteโœ… Any URLโŒ One site at a timeโš ๏ธ Site-specific only
Auto-detects severity (High/Moderate/Low)โœ… Built-in NLPโŒโŒ
Auto-detects status (Active/Planned/Resolved)โœ… Built-inโŒโŒ
Auto-extracts date ranges from alert textโœ… 5 date patternsโŒโŒ
Auto-extracts route/service nameโœ… Pattern + domain mapโŒโŒ
3-layer extraction (JSON-LD โ†’ HTML โ†’ links)โœ… Max coverageโŒโŒ
Multi-URL monitoring in one runโœ… Unlimited URLsโŒโš ๏ธ
Keyword filterโœ… Built-inโŒโŒ
Severity filterโœ… Built-inโŒโŒ
Date filterโœ… Built-inโŒโŒ
Energy exchange support (EPEX, EEX)โœ… YesโŒโŒ
Residential proxy supportโœ… RecommendedโŒโš ๏ธ

The only Apify actor that works as a universal public transport alerts scraper AND a general service disruption monitor โ€” with automatic severity classification, status detection, and date extraction built in.


๐ŸŒ Supported Websites

This public transport alerts scraper is designed and tested to work on:

๐Ÿš‡ Public Transport Authorities

AuthorityCountryWebsite
TfL (Transport for London)๐Ÿ‡ฌ๐Ÿ‡ง UKtfl.gov.uk
National Rail๐Ÿ‡ฌ๐Ÿ‡ง UKnationalrail.co.uk
MTA๐Ÿ‡บ๐Ÿ‡ธ USAmta.info
BART๐Ÿ‡บ๐Ÿ‡ธ USAbart.gov
SNCF๐Ÿ‡ซ๐Ÿ‡ท Francesncf.com
Deutsche Bahn๐Ÿ‡ฉ๐Ÿ‡ช Germanybahn.de
Transport Ireland๐Ÿ‡ฎ๐Ÿ‡ช Irelandrte.ie

โšก Energy Exchanges & Market Portals

PlatformDescription
EPEX SPOTEuropean Power Exchange โ€” market notices and system alerts
EEXEuropean Energy Exchange โ€” trading announcements and updates

๐Ÿ›๏ธ Government & Public Service Portals

Any government or public authority website publishing service notices, disruptions, or announcements in standard HTML format.

๐Ÿ“ฐ News & Alert Portals

Any news website, industry portal, or announcement board using standard article cards, notice lists, or JSON-LD structured data.

Don't see your transit authority listed? This scraper works on any website with alert/notice content in HTML. Add the URL and run โ€” the 3-layer extraction system handles unknown page structures automatically.


๐ŸŽฏ Use Cases

๐ŸšŒ Real-Time Transit Disruption Monitoring

Monitor multiple public transport authority websites simultaneously. Get structured alerts for every delay, cancellation, and service disruption across your city or region โ€” classified by severity so you can prioritize High alerts instantly.

๐Ÿ“ฑ Commuter Alert App Backend

Power a commuter alert application or internal notification system with live data scraped from official transit authority pages. Schedule hourly runs and push new High-severity alerts to users automatically via Zapier or Make.

๐Ÿข Corporate Fleet & Travel Management

Monitor transport disruptions affecting employee commutes or business travel routes. Filter by keyword (specific line, station, or route) to track only the alerts relevant to your organization's locations.

โšก Energy Market Alert Monitoring

Track EPEX SPOT and EEX market notices, system announcements, and trading disruptions alongside transit alerts. Useful for energy traders, grid operators, and utilities that need structured alert feeds from exchange portals.

๐Ÿ›๏ธ Government & Regulatory Compliance

Scrape and archive service disruption notices from regulated public transport operators for compliance reporting, audit trails, or regulatory submissions. Each record includes processed_at timestamp and direct source link.

๐Ÿ“Š Service Quality & SLA Reporting

Collect transport service alerts across multiple operators into a single dataset. Analyze disruption frequency, severity distribution, and affected routes to produce service quality reports or SLA breach documentation.

๐Ÿค– Automated Escalation Workflows

Filter for "severity": "High" and "status": "Active" to build an automated escalation pipeline โ€” fire Slack alerts, send SMS notifications, or create incident tickets whenever a major service disruption is detected.

๐Ÿ“ฐ Journalism & Public Information

Track transport disruptions across multiple authorities for reporting on infrastructure reliability, service quality, or regional transit performance. The structured output with dates and severity levels makes analysis straightforward.


๐Ÿ“‹ What Data You Get

Every alert extracted by this public transport alerts scraper contains up to 9 fields:

FieldTypeDescriptionExample
routestringAuto-extracted route, line, or service name"Jubilee Line"
service_alertstringAlert headline or notice title (up to 300 chars)"Jubilee Line: Severe delays..."
alert_bodystringFull alert description or notice text (up to 500 chars)"Signal failure at London Bridge..."
datesstringDate or date range extracted from alert text"2024-10-30 to 2024-10-31"
linkstringDirect URL to the original alert or article"https://tfl.gov.uk/..."
severitystringAuto-classified severity level"High" / "Moderate" / "Low"
statusstringAuto-detected alert status"Active" / "Planned" / "Resolved"
sourcestringDomain of the source website"tfl.gov.uk"
processed_atstringISO 8601 timestamp of extraction"2024-10-30T10:15:00Z"

๐Ÿ”ด Severity & Status Auto-Detection

Every alert is automatically classified without manual tagging โ€” the actor analyzes the alert title and body text using keyword matching.

Severity Classification

LevelKeywords That Trigger ItWhat It Means
Highemergency, critical, cancelled, suspended, shutdown, failure, disruption, outage, closure, major, severe, force majeureSerious impact โ€” immediate attention required
Moderatedelay, delayed, maintenance, diversion, detour, modification, partial, limited, restricted, congestionPartial impact โ€” service running but affected
Lowadvisory, information, notice, reminder, planned, scheduled, minor, improvement, announcementInformational โ€” minimal or no service impact

Status Detection

StatusKeywords That Trigger It
Resolvedcompleted, resolved, restored, ended, finished
Plannedplanned, upcoming, future, scheduled for
ActiveDefault โ€” all alerts not matching Resolved or Planned

Date Extraction

The actor recognizes 5 date patterns including date ranges (from X to Y), ISO format (2024-10-30), European format (30/10/2024), and natural language (30 October 2024). When a range is found, both start and end dates are preserved in "dates".


๐Ÿ”ง 3-Layer Extraction System

This public transport alerts scraper uses three extraction layers in sequence โ€” starting with the most structured and falling back to the most flexible:

Layer 1 โ€” JSON-LD Structured Data (Highest Accuracy)

Many modern government and transit websites publish alerts using JSON-LD markup (NewsArticle, Article, Event, AlertAction schema types). When found, the actor reads structured fields directly โ€” title, description, date, and URL โ€” with no HTML parsing required. This gives the most reliable and complete records.

Layer 2 โ€” HTML Card Parsing (Standard Coverage)

When JSON-LD is not present, the actor searches for alert cards using 14 CSS selectors in priority order โ€” from article tags and [class*='alert'] patterns to news list items and table rows. For each card found, it extracts the title, body text, link, and date using element-specific selectors.

When no structured cards are found (unusual page layouts, minimal HTML, custom frameworks), the actor falls back to scanning all links on the page. It filters out navigation links (login, home, about, contact) and returns meaningful alert links with whatever context text is available.

This 3-layer approach means this public transport alerts scraper handles pages from major transit authorities, bespoke government portals, and unusual page layouts that break single-method scrapers.


โš™๏ธ Input Parameters

{
"target_urls": [
"https://tfl.gov.uk/travel-information/disruptions",
"https://www.mta.info/alerts",
"https://www.epexspot.com/en/news"
],
"target_url": "",
"keyword": "Central Line",
"severity_filter": "High",
"date_from": "2024-10-01",
"max_results": 50,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
ParameterTypeDefaultDescription
target_urlsarray or string[]List of transit or news portal URLs to scrape. One per item or comma/newline-separated.
target_urlstring""Single URL shortcut โ€” added to target_urls automatically
keywordstring""Filter alerts by keyword โ€” checked against title and body text (case-insensitive)
severity_filterstring""Return only alerts of this severity: "High", "Moderate", or "Low". Leave empty for all.
date_fromstring""Return only alerts dated on or after this date (format: YYYY-MM-DD)
max_resultsinteger50Maximum total alerts to return across all URLs
proxyConfigurationobjectOffApify proxy config โ€” RESIDENTIAL recommended for government and transit sites

๐Ÿ“ฆ Example Input & Output

Example 1 โ€” Monitor TfL and MTA for High-Severity Alerts

Input:

{
"target_urls": [
"https://tfl.gov.uk/travel-information/disruptions",
"https://www.mta.info/alerts"
],
"severity_filter": "High",
"max_results": 20
}

Output (one TfL record):

{
"route": "Central Line",
"service_alert": "Central Line: Part suspension โ€” no service between Liverpool Street and Stratford",
"alert_body": "No service between Liverpool Street and Stratford due to an earlier person ill incident. GOOD SERVICE on all other sections.",
"dates": "2024-10-30",
"link": "https://tfl.gov.uk/tube/status/",
"severity": "High",
"status": "Active",
"source": "tfl.gov.uk",
"processed_at": "2024-10-30T08:45:00Z"
}

Example 2 โ€” Track Planned Maintenance Across Multiple Rail Operators

Input:

{
"target_urls": [
"https://www.nationalrail.co.uk/travel-information/service-disruptions",
"https://www.bahn.de/service/information"
],
"keyword": "maintenance",
"max_results": 30
}

Output: All alerts containing "maintenance" from National Rail and Deutsche Bahn โ€” automatically classified by severity, with "status": "Planned" where the alert text indicates scheduled work.


Example 3 โ€” EPEX SPOT Market Notices

Input:

{
"target_urls": ["https://www.epexspot.com/en/news"],
"max_results": 20
}

Output: Structured market notices and system announcements from EPEX SPOT โ€” each with auto-detected severity, date, and direct link to the original notice.


Example 4 โ€” Keyword Filter for Specific Route

Input:

{
"target_urls": ["https://tfl.gov.uk/travel-information/disruptions"],
"keyword": "Jubilee",
"max_results": 10
}

Output: Only alerts mentioning "Jubilee" โ€” useful for monitoring a specific line without filtering through all disruptions manually.


๐Ÿ” Filtering โ€” Keyword, Severity, Date

Three independent filters can be combined in any combination:

keyword โ€” Case-insensitive substring match against the alert title and body text. Use to narrow to a specific route ("Jubilee Line"), station ("London Bridge"), incident type ("signal failure"), or any term relevant to your monitoring needs.

severity_filter โ€” Returns only alerts at the specified severity level. Combine with keyword for precision: "severity_filter": "High" + "keyword": "suspension" returns only urgent suspension alerts.

date_from โ€” Returns only alerts dated on or after the specified date in YYYY-MM-DD format. Useful when scheduling recurring runs โ€” set date_from to yesterday's date to get only new alerts since your last run.

Combining all three:

{
"keyword": "Northern Line",
"severity_filter": "High",
"date_from": "2024-10-01"
}

Returns only High-severity Northern Line alerts from October 2024 onwards.


โšก Performance & Speed

ConfigurationURLsEstimated Time
Single transit site, 20 alerts1 URL~10โ€“20 seconds
3 transit portals, 50 alerts3 URLs~30โ€“60 seconds
5 portals, 100 alerts5 URLs~1โ€“2 minutes
10 portals, 200 alerts10 URLs~2โ€“4 minutes

Each URL includes a 1.5โ€“3 second random delay before the next request โ€” safe for government and transit portals that rate-limit aggressive scrapers.

Residential proxy is strongly recommended for government and transit authority websites. Many transit portals (TfL, SNCF, Deutsche Bahn) block datacenter IPs. A 403 response from any URL is logged with a clear message to enable proxy.


๐Ÿ’ฐ Cost Estimate

Subscription: $7.99/month ยท Free Trial: 2 Hours (no credit card required)

Run TypeApify Compute UnitsApprox. Compute Cost
1 URL, 20 alerts~0.01โ€“0.02 CU< $0.01
5 URLs, 100 alerts~0.05โ€“0.10 CU< $0.01
10 URLs, 200 alerts~0.10โ€“0.20 CU~$0.01
Scheduled hourly (30-day month)~1โ€“3 CU/month~$0.08โ€“$0.24
Scheduled daily (30-day month)~0.1โ€“0.3 CU/month< $0.02

This actor is extremely compute-efficient โ€” most of the runtime is network wait time (fetching pages), not compute. The $7.99 subscription covers unlimited runs with negligible compute costs.


โš ๏ธ Limitations

Being transparent about what this actor cannot do:

  • โŒ JavaScript-rendered alert pages โ€” Some transit authority sites (particularly newer React/Vue apps) load alert content via JavaScript after page load. This actor fetches raw HTML โ€” pages that require JS execution to display alerts may return empty or partial results. Residential proxy often helps, but JS-heavy pages are a known limitation.
  • โŒ Login-gated portals โ€” Alerts behind authentication walls cannot be accessed.
  • โŒ Real-time push/WebSocket feeds โ€” This actor fetches on demand. It is not a persistent connection to transit systems. For near-real-time monitoring, schedule runs every 15โ€“30 minutes.
  • โŒ PDF or document attachments โ€” Some portals publish alerts as PDF downloads. Only HTML-visible text is extracted.
  • โŒ 100% date accuracy โ€” Date extraction uses pattern matching on free text. Unusual date formats or dates embedded in complex sentence structures may not be parsed correctly. The extraction date (processed_at) is always accurate.
  • โŒ Non-English severity keywords โ€” Severity detection uses English keywords. French, German, or other language transit sites may return "Low" severity for high-impact alerts if the triggering words appear only in the non-English text.
  • โŒ API-based transit feeds (GTFS-RT) โ€” This actor scrapes HTML web pages, not binary GTFS Realtime feeds. For GTFS-RT data, a different solution is required.

๐Ÿ”Œ Integrations

Slack / Microsoft Teams Alerts via Zapier or Make

Schedule hourly runs with "severity_filter": "High". When new High-severity alerts appear, trigger a Zapier workflow to post formatted disruption notices to your team's Slack channel automatically.

Google Sheets Disruption Log

Export results to Google Sheets after each run. Use conditional formatting to highlight rows where severity = "High" in red, "Moderate" in yellow. Build a rolling disruption log across multiple transit authorities.

Apify API โ€” Scheduled Monitoring

// Trigger a public transport alerts scrape via API
const run = await fetch("https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_TOKEN"
},
body: JSON.stringify({
target_urls: ["https://tfl.gov.uk/travel-information/disruptions"],
severity_filter: "High",
max_results: 50
})
});

n8n Incident Response Pipeline

Use the Apify node in n8n to run this actor on a 30-minute schedule. Route High-severity active alerts to an incident management tool (PagerDuty, Jira, ServiceNow) and Moderate alerts to a daily digest email.

Airtable Service Disruption Database

Push all extracted alerts to Airtable with severity, status, route, and date fields. Build a searchable disruption history across all monitored transit operators.


โ“ FAQ

Q: Does this work on any transit website, or only the listed ones? A: Any website. The 3-layer extraction system (JSON-LD โ†’ HTML cards โ†’ link scan) is designed to handle unknown page structures. Paste in any transit authority, government portal, or news site URL and it will extract whatever alert-like content is available.

Q: Why is residential proxy recommended? A: Government and transit authority websites (TfL, SNCF, Deutsche Bahn) often block datacenter IPs used by cloud scraping services. Residential proxy routes requests through real consumer IP addresses, which are treated as regular browser traffic. Without it, these sites may return 403 Forbidden.

Q: How accurate is the severity classification? A: Very accurate for English-language transit alerts. The keyword lists cover the standard vocabulary used by transit authorities worldwide. Severity is detected from the alert title and body combined โ€” reducing false negatives from neutral-sounding titles with urgent body text.

Q: Can I monitor multiple cities in one run? A: Yes. Add as many target_urls as needed โ€” TfL for London, MTA for New York, SNCF for France, and Deutsche Bahn for Germany can all be monitored in a single run. Results are tagged with their source domain for easy filtering.

Q: How do I get only new alerts since my last run? A: Use date_from set to yesterday's date. Each alert's dates field is compared against this filter before inclusion. Combine with a scheduled run for a rolling new-alerts-only feed.

Q: What happens if a page layout changes on a transit site? A: The 3-layer extraction system is resilient to layout changes โ€” if the primary HTML selector stops matching, the system falls back to link scanning. For major structural changes, the actor may require an update. Report issues via the Apify page and fixes are prioritized.

Q: Can I use this for energy market alerts from EPEX SPOT or EEX? A: Yes. EPEX SPOT (epexspot.com/en/news) is included in the domain-to-route mapping, which labels records with "EPEX SPOT Market" as the route. EEX works the same way. Useful for energy traders and grid operators monitoring market notices alongside transit alerts.

Q: Why do some alerts show today's date instead of the actual alert date? A: When no date can be extracted from the alert text (no recognizable date pattern found), the actor falls back to today's date. This ensures every record has a dates value โ€” but it may not reflect the actual alert date for undated notices.


๐Ÿ“œ Changelog

v1.0.0 (Current)

  • โœ… Public transport alerts scraping from any HTML website
  • โœ… 3-layer extraction: JSON-LD โ†’ HTML card selectors โ†’ link scan fallback
  • โœ… 14 CSS selector patterns for alert card detection
  • โœ… Auto-severity detection: High / Moderate / Low (keyword-based)
  • โœ… Auto-status detection: Active / Planned / Resolved (keyword-based)
  • โœ… Auto-date extraction: 5 regex patterns including ranges, ISO, European, and natural language formats
  • โœ… Auto-route extraction: named line/route patterns + domain-to-service mapping (TfL, MTA, SNCF, Deutsche Bahn, BART, National Rail, EPEX SPOT, EEX)
  • โœ… Keyword filter across title and body text
  • โœ… Severity filter (High / Moderate / Low)
  • โœ… Date-from filter
  • โœ… Multi-URL support โ€” unlimited portals per run
  • โœ… Deduplication by alert link across all pages
  • โœ… 1.5โ€“3 second random delay between URL requests
  • โœ… 3 retry attempts on failed requests with backoff
  • โœ… 403 detection with proxy recommendation in logs
  • โœ… Results pushed to Dataset and Key-Value Store (results.json)
  • โœ… Residential proxy support via curl_cffi Chrome 110 impersonation
  • ๐Ÿ”œ Coming next: Non-English severity keywords, GTFS-RT feed support

This public transport alerts scraper accesses publicly visible service disruption notices, news articles, and alert announcements โ€” the same content visible to any user visiting the websites in a browser without logging in.

Please use responsibly:

  • Only scrape publicly accessible pages โ€” do not attempt to access login-gated or restricted areas
  • Respect the robots.txt and Terms of Service of each website you monitor
  • Transit authority and government data is provided for informational purposes โ€” always verify critical safety information directly with the issuing authority
  • Do not republish scraped alert data as your own without attribution to the original source
  • This actor is intended for monitoring, alerting, research, compliance, and legitimate operational workflows

Safety Note: For real-time safety-critical transport decisions, always verify alerts directly with the relevant transit authority. Do not rely solely on scraped data for emergency or safety-critical operations.


๐Ÿค Support

  • Page not parsing correctly? Share the URL via the Apify actor page โ€” new site support and selector improvements are prioritized based on feedback
  • Need non-English severity detection or GTFS-RT support? Drop a feature request โ€” these are on the roadmap
  • Works well for your monitoring workflow? A โญ review on the Apify Store helps others find this public transport alerts scraper and keeps it actively maintained

Public Transport Alerts Scraper ยท Built on Apify
TfL ยท MTA ยท SNCF ยท Deutsche Bahn ยท BART ยท National Rail ยท EPEX SPOT ยท Any Transit Site ยท Auto-Severity ยท Real-Time