Ofsted Reports Data Scraper
Pricing
from $10.00 / 1,000 results
Ofsted Reports Data Scraper
Scrape Ofsted full inspection reports for children's homes. Extracts 18 structured fields from PDFs — judgement ratings, provider details, inspector info, home capacity and type — filtered by date. Exports to MySQL and/or Apify dataset.
Pricing
from $10.00 / 1,000 results
Rating
0.0
(0)
Developer
Alkausari M
Actor stats
0
Bookmarked
5
Total users
2
Monthly active users
a day ago
Last modified
Categories
Share
Ofsted Children's Home Inspection Reports Scraper
Extract structured data from Ofsted full inspection PDF reports for children's homes — including judgement ratings, provider details, inspector information, home capacity, and more — and export directly to your MySQL database or Apify dataset.
🔍 What This Actor Does
This actor crawls the Ofsted reports portal and automatically:
- Finds children's home providers matching your date filters
- Identifies Full Inspection reports on each provider's page
- Downloads and parses the inspection PDF
- Extracts 18 structured fields from each report
- Exports results to your MySQL database and/or Apify dataset
It is designed for researchers, compliance teams, consultants, and data companies who need reliable, structured access to Ofsted inspection data at scale.
📦 Output Fields
Each record contains the following fields:
| Field | Description |
|---|---|
PDF URL | Direct link to the source PDF on files.ofsted.gov.uk |
Unique reference number | Ofsted URN — unique identifier for the provider |
Registered provider | Legal name of the registered provider |
Registered provider address | Full registered address |
Provision sub-type | e.g. Children's home |
Responsible individual | Named responsible individual |
Registered manager | Name of the registered manager (or "Post vacant") |
Inspection dates | Dates the inspection took place |
Inspection type | e.g. Full inspection |
Overall experiences and progress | Judgement rating |
Help and protection | Judgement rating |
Leadership and management | Judgement rating |
Date of last inspection | Date of the previous inspection |
Overall judgement at last inspection | Previous overall judgement |
Enforcement action since last inspection | Any enforcement actions taken |
Inspector | Name of the inspecting officer |
Role | Inspector's role/title |
Home Capacity | Maximum number of children the home can accommodate |
Home Type | Specialism e.g. emotional and behavioural difficulties |
Sample Output
[{"PDF URL": "https://files.ofsted.gov.uk/v1/file/50287454","Unique reference number": "2586943","Registered provider": "Horizon Care And Education Group Ltd","Registered provider address": "C/O DWF Company Secretarial Services Limited, 1 Scott Place 2 Hardman Street, Manchester M3 3AA","Provision sub-type": "Children's home","Responsible individual": "N/A","Registered manager": "Post vacant","Inspection dates": "12 and 13 August 2025","Inspection type": "Full inspection","Overall experiences and progress": "requires improvement to be good","Help and protection": "requires improvement to be good","Leadership and management": "requires improvement to be good","Date of last inspection": "28 January 2025","Overall judgement at last inspection": "good","Enforcement action since last inspection": "None","Inspector": "Julia Tompson","Role": "Social Care Inspector","Home Capacity": "4","Home Type": "social and emotional difficulties"}]
⚙️ Input Configuration
{"start_urls": [{"url": "https://reports.ofsted.gov.uk/search?q=&level_1_types=3&level_2_types%5B0%5D=11&status%5B0%5D=1&start=0&rows=10"}],"latest_report_date_start": "2026-02-15","latest_report_date_end": "2026-02-28","max_depth": 3,"skip_db_export": false,"db_host": "your-db-host","db_database": "your-database-name","db_user": "your-db-user","db_password": "your-db-password"}
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
start_urls | array | ✅ Yes | — | Ofsted search URL(s) to start crawling from. Can also be a direct PDF URL from files.ofsted.gov.uk |
latest_report_date_start | string | ✅ Yes | Today | Filter: start of inspection date range (YYYY-MM-DD) |
latest_report_date_end | string | ✅ Yes | Today | Filter: end of inspection date range (YYYY-MM-DD) |
max_depth | integer | No | 3 | Crawl depth: 1 = listing pages only, 2 = provider pages, 3 = full PDF extraction |
skip_db_export | boolean | No | false | Set to true to skip MySQL export and save to Apify dataset only |
db_host | string | Conditional | — | MySQL host (required if skip_db_export is false) |
db_database | string | Conditional | — | MySQL database name |
db_user | string | Conditional | — | MySQL username |
db_password | string | Conditional | — | MySQL password |
🚀 Key Features
- Date-filtered crawling — target only reports published in your specified date range, no manual filtering required
- Full PDF parsing — extracts all structured fields directly from the Ofsted PDF format, not just metadata
- Deduplication — checks your existing MySQL records on startup and skips already-processed PDFs, making reruns safe and efficient
- Direct PDF support — pass a
files.ofsted.gov.ukURL directly as a start URL to process a single report - MySQL integration — inserts or updates records using
ON DUPLICATE KEY UPDATE, so reruns never create duplicates - Unsupported format handling — PDFs that don't match the expected format are logged to a separate
ofsted_unsupported_reportstable for review - Automatic retries — failed requests are re-queued with up to 3 retry attempts and exponential backoff
- Dataset export — all results are always pushed to the Apify dataset regardless of DB settings
🗄️ Database Setup
If you are using MySQL export, the actor expects two tables. You can create them by uncommenting the relevant lines in the actor source before the first run:
# Uncomment in main() to auto-create tables on first run:# create_ofsted_reports_table(conn)# create_unsupported_reports_table(conn)
ofsted_reports — primary output table, keyed on pdf_url
ofsted_unsupported_reports — holds PDFs that could not be parsed (e.g. older format reports), keyed on pdf_url
💡 Tips & Best Practices
Getting the right start URL
Navigate to the Ofsted search page, apply your filters (provider type, status, etc.), and copy the resulting URL. The actor will inject your date range parameters automatically.
Running on a schedule
Set latest_report_date_start and latest_report_date_end to a rolling window (e.g. the past 7 days) and schedule the actor weekly. The deduplication logic ensures previously processed PDFs are skipped automatically.
Skipping the database
Set skip_db_export: true to use the actor without any database — all data will be available in your Apify dataset and can be exported to JSON, CSV, or connected to other tools via the Apify platform.
Direct PDF processing
If you have a specific PDF URL (e.g. from a notification or email), you can pass it directly as a start URL:
{"start_urls": [{ "url": "https://files.ofsted.gov.uk/v1/file/50287454" }],"skip_db_export": true}
⚠️ Limitations
- Only processes Full Inspection reports for children's homes (Ofsted level 2 type 11). Other provision types or report formats are not currently supported.
- PDF parsing relies on consistent Ofsted report formatting. Older or non-standard PDFs are captured in the unsupported reports table rather than discarded silently.
- The actor respects a 2-second delay between requests to avoid overloading the Ofsted server.
📄 License
This actor is provided for legitimate research, compliance monitoring, and data analysis use cases. All data is sourced from publicly available Ofsted reports. Users are responsible for ensuring their use complies with applicable terms of service and data protection regulations.