Care Quality Commission Reports Scraper avatar

Care Quality Commission Reports Scraper

Pricing

from $15.00 / 1,000 results

Go to Apify Store
Care Quality Commission Reports Scraper

Care Quality Commission Reports Scraper

Scrape CQC inspection reports for any UK care service — dentists, care homes, GPs, hospitals and more. Extracts full report content (HTML & PDF), ratings, registered manager, nominated individual, and complete provider details. Powered by Playwright for reliable browser-based extraction.

Pricing

from $15.00 / 1,000 results

Rating

0.0

(0)

Developer

Alkausari M

Alkausari M

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

19 hours ago

Last modified

Share

CQC Inspection Reports Scraper

Scrape the latest inspection reports from the Care Quality Commission (CQC) — the independent regulator of health and social care in England. Extract full report content, ratings, provider details, and management information for any care service type, location, and search radius.


🔍 What This Actor Does

This actor uses a full browser (Playwright/Chromium) to navigate the CQC website and automatically:

  1. Searches CQC for providers matching your filters (service type, location, radius)
  2. Downloads the CSV export from the search results page
  3. Visits each provider's Reports page to find the most recent inspection
  4. Visits the provider's Overview page to extract management and organisational details
  5. Downloads and extracts the full report content — supporting both HTML and PDF report formats
  6. Saves everything to your Apify dataset

The start and end parameters give you precise control over which rows of the CSV to process, making it easy to run the actor in batches across large result sets.


📦 Output Fields

Each record in the dataset contains:

FieldDescription
report_urlDirect URL to the inspection report
report_typeHTML or PDF
titleName of the care service
service_nameService name as shown on the CQC overview page
organisation_nameThe organisation that runs the service
registered_managerName of the registered manager
nominated_individualName of the nominated individual (where applicable)
contentFull report content — raw HTML string or list of page text strings for PDFs
csv_rowComplete row from the CQC CSV export, including address, phone, postcode, service type, CQC IDs, and more

CSV Row Sub-fields

The csv_row object contains all columns from the CQC search CSV export:

Name, Address 1, Address 2, Town/City, County, Postcode, Phone number, Website, Local authority, Region, Report publication date, URL, Also known as, Specialisms/services, Service types, Provider name, Distance (miles away), CQC Provider ID, CQC Location ID

Sample Output

{
"report_url": "https://www.cqc.org.uk/location/1-16546284835/reports/AP11124/overall",
"report_type": "HTML",
"title": "University of Bristol Dental School",
"service_name": "University of Bristol Dental School",
"organisation_name": "University of Bristol",
"registered_manager": "Mr August Wouter Van Riessen",
"nominated_individual": "Mrs Lucinda Parr",
"content": "<header>...</header>...",
"csv_row": {
"Name": "University of Bristol Dental School",
"Address 1": "1 Trinity Quay",
"Town/City": "Bristol",
"Postcode": "BS2 0PT",
"Phone number": "01179289000",
"Service types": "Dentist",
"Provider name": "University of Bristol",
"Report publication date": "2025-03-21T10:22:57Z",
"CQC Location ID (for office use only)": "1-16546284835"
}
}

⚙️ Input Configuration

{
"start_urls": [
{
"url": "https://www.cqc.org.uk/search/all?filters%5B0%5D=services:dentist&location-query=bristol&radius=25&ajax=0&page=1"
}
],
"start": 1,
"end": 50
}

Input Parameters

ParameterTypeRequiredDescription
start_urlsarray✅ YesOne or more CQC search result URLs. Use the CQC website to build your search (location, service type, radius) and paste the URL here
startinteger✅ YesRow index to start from in the CSV (1-based). Use 1 to start from the beginning
endinteger✅ YesRow index to stop at (inclusive). The actor will not process beyond the total number of rows available

Building Your Start URL

  1. Go to cqc.org.uk/search
  2. Apply your filters — service type, location, search radius
  3. Copy the URL from your browser and paste it as the start_urls value

The actor will automatically click the Download CSV button on that page and use those results as its input list.


🚀 Key Features

  • Full browser automation — uses Playwright/Chromium to handle JavaScript-rendered pages and dynamic downloads that simple HTTP scrapers cannot access
  • CSV-driven processing — downloads the official CQC search CSV for reliable, structured provider discovery; no fragile HTML parsing of listing pages
  • Batch processing with start/end — process large result sets in chunks by setting start and end, making it easy to distribute work across multiple Actor runs
  • Dual report format support — handles both HTML reports (structured page content) and PDF reports (API-served UUIDs and direct file downloads)
  • Management data extraction — visits each provider's overview page to extract registered manager, nominated individual, and the full "who runs this service" breakdown
  • Resilient PDF fetching — retries PDF downloads up to 3 times with delay before giving up, handling transient network failures gracefully
  • Rich output — every record combines the full CSV row, management details from the overview page, and the complete report content in a single dataset entry

💡 Tips & Best Practices

Targeting a specific service type

The CQC search supports filtering by service type. Some useful URL filter values:

  • Dentist: filters%5B0%5D=services:dentist
  • Care home: filters%5B0%5D=services:care-home
  • GP: filters%5B0%5D=services:gp
  • Hospital: filters%5B0%5D=services:hospital

Processing large result sets in batches

If your search returns 500 providers, run the actor multiple times with different start/end ranges:

  • Run 1: start: 1, end: 100
  • Run 2: start: 101, end: 200
  • Run 3: start: 201, end: 300

Working with HTML report content

The content field for HTML reports contains the raw inner HTML of CQC's #main-content-wrapper. You can parse this downstream to extract specific ratings, section text, or judgement labels using a tool like BeautifulSoup or Cheerio.

Working with PDF report content

For PDF reports, content is an array of strings — one string per page of the PDF. Pages with no extractable text are excluded automatically.


⚠️ Limitations

  • Only the most recent inspection report per provider is extracted (the first timeline item on the reports page). If the first item has no HTML or PDF link, the actor falls back to the second timeline item.
  • Report content is returned as raw HTML or raw text — the actor does not post-process or summarise the content.
  • The actor uses a real browser and includes delays to avoid overloading the CQC website. Run times will vary depending on result set size and report format.
  • CQC search results are location-based. Make sure your location-query and radius values in the start URL reflect the geographic scope you need.

📄 License

This actor is intended for legitimate research, compliance monitoring, and data analysis use cases. All data is sourced from the publicly available CQC website. Users are responsible for ensuring their use complies with applicable terms of service and data protection regulations.