Hansard UK Parliament Debates Scraper avatar

Hansard UK Parliament Debates Scraper

Pricing

from $23.63 / 1,000 results

Go to Apify Store
Hansard UK Parliament Debates Scraper

Hansard UK Parliament Debates Scraper

Export the official transcripts of UK Parliament debates and speeches from Hansard. Filter by House (Commons or Lords), search term, member, and date range. Each record includes the full speech text, speaker, debate section, and a permalink to the official Hansard transcript.

Pricing

from $23.63 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

ParseForge Banner

πŸ—£οΈ Hansard UK Parliament Debates Scraper

πŸš€ Export UK Parliament debate transcripts in seconds. Pull every spoken contribution from the House of Commons and House of Lords, filtered by topic, member, date, or department. Each record is a clean structured speech with full text, speaker, debate section, and a permalink to the official Hansard transcript. No sign-up, no manual paging, no parser to maintain.

πŸ•’ Last updated: 2026-05-15 Β· πŸ“Š 17 fields per record Β· πŸ—£οΈ Millions of contributions Β· πŸ“œ 250+ years of debates Β· πŸ‡¬πŸ‡§ Both Houses

The Hansard UK Parliament Debates Scraper queries the official Hansard transcript catalogue and returns up to 17 structured fields per record, including the contribution ID, speaker name and member ID, House, debate section, sitting date, full speech text, word count, ordering metadata, and a deep permalink back to the official Hansard page.

The catalogue covers the official record of every spoken contribution in the UK Parliament, including ministerial statements, backbench speeches, oral questions, urgent questions, statements, and full debates. Hansard has tracked the proceedings of the UK Parliament since 1803 and is the canonical record cited by historians, journalists, and political researchers.

🎯 Target AudienceπŸ’‘ Primary Use Cases
Political analysts and researchers, journalists, NLP and machine-learning teams, public-affairs and lobbying firms, civic-tech projects, academic political scientists, content creatorsSpeech and rhetoric analysis, member voting-context research, topic mining, NLP training corpora, ministerial statement monitoring, lobbyist due diligence, civic-tech transparency tools

πŸ“‹ What the Hansard UK Debates Scraper does

Six filtering workflows in a single run:

  • πŸ”Ž Free-text search. Match a keyword or phrase across every spoken contribution (e.g. "climate change", "NHS funding", "AUKUS").
  • πŸ›οΈ House filter. Restrict to House of Commons, House of Lords, or both.
  • πŸ‘€ Member filter. Substring match on the speaker name (e.g. "Keir Starmer", "Lord Hannan").
  • πŸ“… Date range. Scope to any sitting-date window with startDate and endDate.
  • πŸ›οΈ Department filter. Substring match on the responsible government department (e.g. "Treasury", "Department for Education").
  • πŸ”’ Page-driven sample. Pull the latest contributions across all topics when no query is set.

Each record includes the contribution ID, the member's name and (when available) their member ID, the House, the section ("Commons Chamber", "Westminster Hall", etc.), the debate section title, the Hansard internal section code, the sitting date, the timecode of the contribution, the full speech text (HTML preserved), a word count, the order in the debate, the paragraph tag, and a deep permalink back to the official Hansard transcript page.

πŸ’‘ Why it matters: Hansard transcripts power policy analysis, NLP corpora, civic transparency, and political journalism. Building your own pipeline means writing a paginated client, mapping debate identifiers to permalinks, normalising HTML across sittings, and refreshing daily. This Actor skips all of that and gives you a clean refreshed snapshot on every run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded Hansard dataset.


βš™οΈ Input

InputTypeDefaultBehavior
searchTermstring""Keyword or phrase to search across UK Parliament transcripts. Empty = most recent contributions across all topics.
housestring"Both"One of Both, Commons, or Lords.
memberNamestring""Substring match on the speaker name.
startDatestring""Earliest sitting date (YYYY-MM-DD).
endDatestring""Latest sitting date (YYYY-MM-DD).
departmentstring""Substring match on the responsible department.
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.

Example: every Commons contribution mentioning "Heathrow" since 2026-01-01.

{
"maxItems": 200,
"searchTerm": "Heathrow",
"house": "Commons",
"startDate": "2026-01-01"
}

Example: latest 50 contributions by Keir Starmer in either House.

{
"maxItems": 50,
"memberName": "Keir Starmer"
}

⚠️ Good to Know: the text field preserves the original Hansard markup, including column-number <span> tags and inline subscripts. That keeps the record faithful to the official transcript. If you need plain-text, strip HTML downstream once.


πŸ“Š Output

Each contribution record carries up to 17 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
πŸ†” contributionIdstring"0D8CEA45-19F1-4BF6-83D3-6688C26C01B9"
πŸ‘€ memberNamestring"Sarah Olney"
πŸ‘€ attributedTostring"Sarah Olney"
πŸ†” memberIdnumber4591
πŸ›οΈ housestring"Commons"
πŸ“‚ sectionstring"Commons Chamber"
πŸ“‚ debateSectionstring" Heathrow Airport: Third Runway"
πŸ†” debateSectionIdstring"15106B6A-3101-426D-89E3-0544452BD096"
πŸ“‚ hansardSectionstring"CP-CR1"
πŸ“… sittingDateYYYY-MM-DD"2026-05-14"
πŸ•’ timecodestring"2026-05-14T15:03:57"
πŸ“ textstring"The hon. Gentleman is absolutely right that we need to see the economic case..."
πŸ”’ wordCountnumber806
πŸ”’ orderInDebatenumber6
🏷️ paragraphTagstring"hs_Para"
πŸ”— urlstring"https://hansard.parliament.uk/Commons/2026-05-14/debates/.../HeathrowAirport%3AThirdRunway#contribution-..."
πŸ•’ scrapedAtISO 8601"2026-05-15T20:10:51.113Z"

πŸ“¦ Sample record


✨ Why choose this Actor

Capability
πŸ—£οΈBoth Houses, full text. Spoken contributions from Commons and Lords with the complete speech body.
🎯Multi-dimensional filters. Search term, House, member, date range, and department combine freely.
πŸ”—Permalinks per row. Every record links back to the canonical Hansard page anchor for citation.
πŸ“œHistoric depth. Indexed transcripts spanning decades of UK parliamentary debate.
⚑Fast. 100 contributions in seconds, 10,000 records in a few minutes.
πŸ”Always fresh. Every run hits the live transcript catalogue, so the dataset reflects the latest sittings.
🚫No authentication. Public open-government data. No login needed.

πŸ“Š Searchable Hansard transcripts are the foundation of every political-journalism dashboard, NLP corpus on UK politics, and lobbyist briefing pack.


πŸ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ Hansard UK Debates Scraper (this Actor)$5 free credit, then pay-per-useBoth Houses, full textLive per runsearch term, House, member, date, department⚑ 2 min
Commercial parliamentary monitoring$10k - $80k/yearComparable + voting recordsDailyMany🐒 Weeks (procurement)
TheyWorkForYou scrapingFreeCommons-leaning, derivedDailyFewπŸ•’ Days
Manual hansard.parliament.uk browsingFreeWhole catalogueLiveSite-side⏳ Forever

Pick this Actor when you want structured speech-level records with permalinks and zero pipeline maintenance.


πŸš€ How to use

  1. πŸ“ Sign up. Create a free account w/ $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Hansard UK Parliament Debates Scraper page on the Apify Store.
  3. 🎯 Set input. Type a search term or member name, optionally pick a House and date range, and set maxItems.
  4. πŸš€ Run it. Click Start and let the Actor collect your contributions.
  5. πŸ“₯ Download. Grab your dataset in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to a downloaded Hansard dataset: 3-5 minutes. No coding required.


πŸ’Ό Business use cases

πŸ›οΈ Public Affairs & Lobbying

  • Monitor every mention of your client's industry across Parliament
  • Build briefing packs from a member's recent contributions
  • Track ministerial statements by department
  • Surface debate momentum on niche policy areas

πŸ“° Political Journalism

  • Search every Commons speech for a specific quote
  • Trace how a policy has been debated over years
  • Build interactive dashboards of speech topics
  • Cross-reference Hansard with member directory data

πŸ€– NLP & Machine Learning

  • Train domain-specific UK political language models
  • Build topic-classification corpora with member metadata
  • Sentiment analysis on individual MPs over time
  • Question-answering systems that cite primary sources

πŸ“Š Civic Analytics & Research

  • Quantitative speech-pattern studies for academic papers
  • Word-frequency tracking on policy themes per session
  • Comparative analysis of Commons vs Lords language
  • Public dashboards showing debate volume by topic

πŸ”Œ Automating Hansard Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟒 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • πŸ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly or daily refreshes keep your political monitoring dashboards in sync with each new sitting.


🌟 Beyond business use cases

Open parliamentary transcripts power more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

πŸŽ“ Research and academia

  • Quantitative discourse analysis for political-science theses
  • Coursework on parliamentary procedure and rhetoric
  • Reproducible debate corpora for NLP research papers
  • Historical archives of policy framing over decades

🎨 Personal and creative

  • Side projects that visualise speech patterns by party
  • Newsletters that summarise yesterday's notable contributions
  • Word clouds of a session's most-debated topics
  • Hobbyist explorations of parliamentary humour

🀝 Non-profit and civic

  • Civic-tech tools that surface debates on a topic to citizens
  • Watchdog dashboards tracking member contribution rates
  • Investigative journalism on lobbying-aligned speeches
  • Accessibility projects that simplify parliamentary language

πŸ§ͺ Experimentation

  • Train summarisation models on parliamentary debates
  • Build agent pipelines that brief journalists on yesterday's speeches
  • Prototype semantic-search tools across decades of debate
  • Stress-test NLP infrastructure with real, long-form text

πŸ€– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Type a search term, optionally pick a member or date window, click Start, and the Actor pages through the official Hansard transcript catalogue, applies your filters, and emits a clean structured row per spoken contribution. No browser automation, no captchas, no setup.

πŸ“ How accurate is the data?

Every record comes from the official Hansard catalogue used by hansard.parliament.uk itself, so the speech text, member, and debate references match the canonical record line for line.

πŸ” How often is the dataset refreshed?

Hansard is updated as sittings are transcribed and published, typically within hours of a debate. Every run hits the live catalogue.

πŸ›οΈ Does it cover both Houses?

Yes. Set house to Both (default), Commons, or Lords.

πŸ‘€ Can I get every speech by a single MP or peer?

Yes. Set the memberName filter to a substring of their name (e.g. "Starmer", "Lord Hannan"). Combine with startDate and endDate for a session-bounded view.

πŸ“… How far back does the catalogue go?

The official Hansard archive runs back to the early 19th century, with full digital coverage of recent sessions and increasingly complete coverage going back decades.

πŸ“ Why does the speech text contain HTML tags?

The text field preserves Hansard's original markup (column-number <span>s, inline subscripts, paragraph anchors) so the record stays faithful to the official transcript. Strip HTML downstream if you need plain text.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (hourly during sittings, daily otherwise) and keep your political monitoring dashboards in sync.

Hansard transcripts are published under the Open Parliament Licence, which permits commercial reuse with attribution. Review the licence terms for your specific application.

πŸ’Ό Can I use this data commercially?

Yes. The Open Parliament Licence explicitly allows commercial reuse with attribution. You remain responsible for following the licence in your product.

πŸ’³ Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you scheduling, higher concurrency, and larger datasets.

πŸ†˜ What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


πŸ”Œ Integrate with any app

Hansard UK Debates Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step monitoring workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get debate-mention alerts in your channels
  • Airbyte - Pipe transcripts into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh transcripts into your NLP pipeline, or alert your political-research team in Slack.


πŸ’‘ Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


πŸ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the UK Parliament, the House of Commons, the House of Lords, or the Hansard Society. All trademarks mentioned are the property of their respective owners. Only publicly available open Hansard transcript data is collected, under the Open Parliament Licence.