Ofsted Reports Data Scraper avatar

Ofsted Reports Data Scraper

Pricing

from $10.00 / 1,000 results

Go to Apify Store
Ofsted Reports Data Scraper

Ofsted Reports Data Scraper

Scrape Ofsted full inspection reports for children's homes. Extracts 18 structured fields from PDFs — judgement ratings, provider details, inspector info, home capacity and type — filtered by date. Exports to MySQL and/or Apify dataset.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Alkausari M

Alkausari M

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

2

Monthly active users

a day ago

Last modified

Share

Ofsted Children's Home Inspection Reports Scraper

Extract structured data from Ofsted full inspection PDF reports for children's homes — including judgement ratings, provider details, inspector information, home capacity, and more — and export directly to your MySQL database or Apify dataset.


🔍 What This Actor Does

This actor crawls the Ofsted reports portal and automatically:

  1. Finds children's home providers matching your date filters
  2. Identifies Full Inspection reports on each provider's page
  3. Downloads and parses the inspection PDF
  4. Extracts 18 structured fields from each report
  5. Exports results to your MySQL database and/or Apify dataset

It is designed for researchers, compliance teams, consultants, and data companies who need reliable, structured access to Ofsted inspection data at scale.


📦 Output Fields

Each record contains the following fields:

FieldDescription
PDF URLDirect link to the source PDF on files.ofsted.gov.uk
Unique reference numberOfsted URN — unique identifier for the provider
Registered providerLegal name of the registered provider
Registered provider addressFull registered address
Provision sub-typee.g. Children's home
Responsible individualNamed responsible individual
Registered managerName of the registered manager (or "Post vacant")
Inspection datesDates the inspection took place
Inspection typee.g. Full inspection
Overall experiences and progressJudgement rating
Help and protectionJudgement rating
Leadership and managementJudgement rating
Date of last inspectionDate of the previous inspection
Overall judgement at last inspectionPrevious overall judgement
Enforcement action since last inspectionAny enforcement actions taken
InspectorName of the inspecting officer
RoleInspector's role/title
Home CapacityMaximum number of children the home can accommodate
Home TypeSpecialism e.g. emotional and behavioural difficulties

Sample Output

[
{
"PDF URL": "https://files.ofsted.gov.uk/v1/file/50287454",
"Unique reference number": "2586943",
"Registered provider": "Horizon Care And Education Group Ltd",
"Registered provider address": "C/O DWF Company Secretarial Services Limited, 1 Scott Place 2 Hardman Street, Manchester M3 3AA",
"Provision sub-type": "Children's home",
"Responsible individual": "N/A",
"Registered manager": "Post vacant",
"Inspection dates": "12 and 13 August 2025",
"Inspection type": "Full inspection",
"Overall experiences and progress": "requires improvement to be good",
"Help and protection": "requires improvement to be good",
"Leadership and management": "requires improvement to be good",
"Date of last inspection": "28 January 2025",
"Overall judgement at last inspection": "good",
"Enforcement action since last inspection": "None",
"Inspector": "Julia Tompson",
"Role": "Social Care Inspector",
"Home Capacity": "4",
"Home Type": "social and emotional difficulties"
}
]

⚙️ Input Configuration

{
"start_urls": [
{
"url": "https://reports.ofsted.gov.uk/search?q=&level_1_types=3&level_2_types%5B0%5D=11&status%5B0%5D=1&start=0&rows=10"
}
],
"latest_report_date_start": "2026-02-15",
"latest_report_date_end": "2026-02-28",
"max_depth": 3,
"skip_db_export": false,
"db_host": "your-db-host",
"db_database": "your-database-name",
"db_user": "your-db-user",
"db_password": "your-db-password"
}

Input Parameters

ParameterTypeRequiredDefaultDescription
start_urlsarray✅ YesOfsted search URL(s) to start crawling from. Can also be a direct PDF URL from files.ofsted.gov.uk
latest_report_date_startstring✅ YesTodayFilter: start of inspection date range (YYYY-MM-DD)
latest_report_date_endstring✅ YesTodayFilter: end of inspection date range (YYYY-MM-DD)
max_depthintegerNo3Crawl depth: 1 = listing pages only, 2 = provider pages, 3 = full PDF extraction
skip_db_exportbooleanNofalseSet to true to skip MySQL export and save to Apify dataset only
db_hoststringConditionalMySQL host (required if skip_db_export is false)
db_databasestringConditionalMySQL database name
db_userstringConditionalMySQL username
db_passwordstringConditionalMySQL password

🚀 Key Features

  • Date-filtered crawling — target only reports published in your specified date range, no manual filtering required
  • Full PDF parsing — extracts all structured fields directly from the Ofsted PDF format, not just metadata
  • Deduplication — checks your existing MySQL records on startup and skips already-processed PDFs, making reruns safe and efficient
  • Direct PDF support — pass a files.ofsted.gov.uk URL directly as a start URL to process a single report
  • MySQL integration — inserts or updates records using ON DUPLICATE KEY UPDATE, so reruns never create duplicates
  • Unsupported format handling — PDFs that don't match the expected format are logged to a separate ofsted_unsupported_reports table for review
  • Automatic retries — failed requests are re-queued with up to 3 retry attempts and exponential backoff
  • Dataset export — all results are always pushed to the Apify dataset regardless of DB settings

🗄️ Database Setup

If you are using MySQL export, the actor expects two tables. You can create them by uncommenting the relevant lines in the actor source before the first run:

# Uncomment in main() to auto-create tables on first run:
# create_ofsted_reports_table(conn)
# create_unsupported_reports_table(conn)

ofsted_reports — primary output table, keyed on pdf_url

ofsted_unsupported_reports — holds PDFs that could not be parsed (e.g. older format reports), keyed on pdf_url


💡 Tips & Best Practices

Getting the right start URL

Navigate to the Ofsted search page, apply your filters (provider type, status, etc.), and copy the resulting URL. The actor will inject your date range parameters automatically.

Running on a schedule

Set latest_report_date_start and latest_report_date_end to a rolling window (e.g. the past 7 days) and schedule the actor weekly. The deduplication logic ensures previously processed PDFs are skipped automatically.

Skipping the database

Set skip_db_export: true to use the actor without any database — all data will be available in your Apify dataset and can be exported to JSON, CSV, or connected to other tools via the Apify platform.

Direct PDF processing

If you have a specific PDF URL (e.g. from a notification or email), you can pass it directly as a start URL:

{
"start_urls": [{ "url": "https://files.ofsted.gov.uk/v1/file/50287454" }],
"skip_db_export": true
}

⚠️ Limitations

  • Only processes Full Inspection reports for children's homes (Ofsted level 2 type 11). Other provision types or report formats are not currently supported.
  • PDF parsing relies on consistent Ofsted report formatting. Older or non-standard PDFs are captured in the unsupported reports table rather than discarded silently.
  • The actor respects a 2-second delay between requests to avoid overloading the Ofsted server.

📄 License

This actor is provided for legitimate research, compliance monitoring, and data analysis use cases. All data is sourced from publicly available Ofsted reports. Users are responsible for ensuring their use complies with applicable terms of service and data protection regulations.