Ofsted Reports Data Scraper avatar

Ofsted Reports Data Scraper

Pricing

from $10.00 / 1,000 results

Go to Apify Store
Ofsted Reports Data Scraper

Ofsted Reports Data Scraper

Scrape Ofsted full inspection reports for children's homes. Extracts 18 structured fields from PDFs — judgement ratings, provider details, inspector info, home capacity and type — filtered by date. Exports to MySQL and/or Apify dataset.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Alkausari M

Alkausari M

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

3

Monthly active users

21 days ago

Last modified

Share

Extract structured data from Ofsted full inspection PDF reports for children's homes — at scale. Judgement ratings, provider details, inspectors, home capacity, specialism, dates — 18 fields per report, parsed directly from the source PDFs. Export to your MySQL database, your Apify dataset, or both.

Built and maintained by Alkausari M.


✦ Highlights

  • 📄 Full PDF parsing — 18 structured fields extracted from each report
  • 📅 Date-filtered crawling — target only reports in your inspection date range
  • 🗄 MySQL export — direct insert/update with ON DUPLICATE KEY UPDATE, no duplicates on re-runs
  • ♻️ Smart deduplication — startup checks your existing records and skips already-processed PDFs
  • 🔗 Direct PDF URL support — pass a files.ofsted.gov.uk URL to process a single report
  • 🛡 Resilient — auto-retry with exponential backoff; unparseable PDFs logged separately

⚙ How it works

  1. Paste a search URL — from the Ofsted reports portal with your filters applied. Or pass a direct PDF URL.
  2. Set a date rangelatest_report_date_start and latest_report_date_end (YYYY-MM-DD).
  3. Click Start — the Actor finds matching providers → Full Inspection reports → downloads and parses each PDF.
// Example input
{
"start_urls": [
{ "url": "https://reports.ofsted.gov.uk/search?q=&level_1_types=3&level_2_types%5B0%5D=11&status%5B0%5D=1&start=0&rows=10" }
],
"latest_report_date_start": "2026-02-15",
"latest_report_date_end": "2026-02-28",
"max_depth": 3,
"skip_db_export": false,
"db_host": "your-db-host",
"db_database": "your-database-name",
"db_user": "your-db-user",
"db_password": "your-db-password"
}

Set skip_db_export: true to use the Actor without any database — all data still lands in your Apify dataset (JSON, CSV, Excel, API).

MySQL tables

When MySQL export is enabled, two tables are used:

  • ofsted_reports — primary output, keyed on pdf_url. Records are inserted on first run, updated on re-runs.
  • ofsted_unsupported_reports — PDFs that don't match the expected Ofsted format (e.g. older layouts) are logged here for review rather than silently dropped.

📦 What you get back

Each record represents one parsed inspection report:

{
"PDF URL": "https://files.ofsted.gov.uk/v1/file/50298941",
"Unique reference number": "2587763",
"Registered provider": "Mercia Children Services Ltd",
"Registered provider address": "Windsor House, Bayshill Road, Cheltenham, Gloucestershire GL50 3AT",
"Provision sub-type": "Children's home",
"Responsible individual": "Michael Lloyd",
"Registered manager": "David Griffiths",
"Inspection dates": "3 and 4 March 2026",
"Inspection type": "Full inspection",
"Overall experiences and progress": "good",
"Help and protection": "good",
"Leadership and management": "good",
"Date of last inspection": "25 February 2025",
"Overall judgement at last inspection": "good",
"Enforcement action since last inspection": "None",
"Inspectors": [
{ "name": "Helen Fee", "role": "Social Care Inspector" }
],
"Home Capacity": "4",
"Home Type": "social and emotional difficulties"
}

📋 Input

ParameterTypeRequiredDefaultDescription
start_urlsArrayYesOfsted search URL(s) or a direct files.ofsted.gov.uk PDF URL
latest_report_date_startStringYesTodayStart of inspection date range (YYYY-MM-DD)
latest_report_date_endStringYesTodayEnd of inspection date range (YYYY-MM-DD)
max_depthIntegerNo31 = listing only, 2 = provider pages, 3 = full PDF extraction
skip_db_exportBooleanNofalsetrue = skip MySQL, save to Apify dataset only
db_hostStringIf DB exportMySQL host
db_databaseStringIf DB exportMySQL database name
db_userStringIf DB exportMySQL username
db_passwordStringIf DB exportMySQL password

Direct PDF — single-report mode

{
"start_urls": [{ "url": "https://files.ofsted.gov.uk/v1/file/50287454" }],
"max_depth": 1,
"skip_db_export": true
}

💡 Use cases

  • Research — academic and policy analysis of inspection trends across providers and regions.
  • Compliance monitoring — track ratings and enforcement actions across the providers you work with.
  • Sector consultancy — build a structured dataset of children's-home judgements for client reporting.
  • Scheduled syncs — set a rolling 7-day date window and schedule daily/weekly runs; dedup ensures no rework.
  • Data products — power dashboards and BI on top of clean, parsed Ofsted data via the Apify API.

📮 Support

Bugs, feature requests, or custom scraping work — open an issue on Apify or email alkausarimujahid@gmail.com.