BBC Good Food Recipe Scraper avatar

BBC Good Food Recipe Scraper

Pricing

Pay per event

Go to Apify Store
BBC Good Food Recipe Scraper

BBC Good Food Recipe Scraper

Enumerate and scrape the full BBC Good Food recipe catalogue (~15K+ recipes) from sitemap discovery. Extracts structured recipe data including ingredients, instructions, UK nutrition panels, skill level, dietary tags, ratings, and schema.org/Recipe JSON-LD fields.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Overview

The BBC Good Food Recipe Scraper enumerates and extracts the full BBC Good Food recipe catalogue (~15,000+ recipes) using sitemap discovery. It captures rich structured data from each recipe page including ingredients, step-by-step instructions, the UK nutrition panel, BBC-specific skill levels, dietary tags, star ratings, and schema.org/Recipe JSON-LD fields.

BBC Good Food is the largest free English-language recipe authority in the UK, with content covering everything from quick weeknight dinners to elaborate celebration cakes. Unlike generic multi-site scrapers that require you to supply URLs and drop BBC-specific fields, this actor discovers the entire corpus automatically and extracts every structured field the site provides.

Features

  • Full sitemap enumeration: Walks the BBC Good Food sitemap index and collects every recipe URL across all quarterly recipe sitemaps (~15K+ recipes).
  • BYO URL mode: Supply specific recipe URLs via startUrls to scrape targeted recipes without a full crawl.
  • schema.org/Recipe extraction: Parses the embedded JSON-LD block on each page for all standard Recipe fields.
  • BBC-specific fields: Extracts skill level (Easy / More effort / A challenge), dietary tags (vegetarian, vegan, gluten-free, healthy, etc.), and the UK nutrition panel.
  • Respectful crawling: Honours the site's crawl-delay directive with conservative concurrency.
  • Incremental-friendly: Use maxItems to cap run size for incremental update workflows.

Use Cases

  • Building recipe datasets for LLM fine-tuning or RAG pipelines.
  • Meal planning and nutrition app data ingestion.
  • Food-trend analytics using BBC's categorisation taxonomy and editorial dietary tags.
  • Competitive benchmarking for recipe content platforms.
  • Academic research on UK food culture and cooking trends.

How It Works

  1. Sitemap discovery: Fetches https://www.bbcgoodfood.com/sitemap.xml (a 260-child index) and filters to recipe-type sitemaps (e.g. 2026-Q2-recipe.xml).
  2. URL collection: Extracts all /recipes/<slug> URLs from matching sitemaps, capped at maxItems.
  3. Page extraction: Fetches each recipe page and parses the schema.org/Recipe JSON-LD block plus supplemental BBC DOM fields.
  4. Output: Stores one record per recipe in the Apify dataset.

Input

FieldTypeRequiredDescription
maxItemsIntegerYesMaximum number of recipes to scrape. Set to 0 for the full corpus (15K+). Default: 10.
startUrlsArrayNoSpecific BBC Good Food recipe URLs to scrape. Skips sitemap discovery when provided.

Example — Full sitemap run (capped)

{
"maxItems": 500
}

Example — BYO URLs

{
"startUrls": [
{ "url": "https://www.bbcgoodfood.com/recipes/easy-chocolate-cake" },
{ "url": "https://www.bbcgoodfood.com/recipes/iced-tea" }
],
"maxItems": 10
}

Output

One record per recipe. All fields sourced from schema.org/Recipe JSON-LD unless noted.

FieldTypeDescription
slugStringURL slug (e.g. easy-chocolate-cake)
urlStringFull recipe page URL
nameStringRecipe title
authorStringRecipe author name
descriptionStringShort editorial description
recipe_categoryStringCategory (e.g. Cake, Dinner, Drink)
recipe_cuisineStringCuisine type (e.g. British, Italian)
recipe_yieldStringServing yield (e.g. "Serves 8")
prep_timeStringPrep time as ISO 8601 duration (e.g. PT20M)
cook_timeStringCook time as ISO 8601 duration
total_timeStringTotal time as ISO 8601 duration
skill_levelStringBBC skill rating: Easy / More effort / A challenge
recipe_ingredientArrayList of ingredient strings
recipe_instructionsArrayList of step-by-step instruction strings
nutritionStringJSON-encoded per-serving nutrition data (kcal, fat, saturates, carbs, sugars, fibre, protein, salt)
aggregate_ratingNumberAverage star rating (1–5 scale)
rating_countIntegerNumber of ratings
keywordsArrayEditorial keyword tags
dietary_tagsArrayDietary suitability tags (vegetarian, vegan, gluten-free, healthy, etc.)
image_urlsArrayRecipe image URLs
date_publishedStringPublication date (ISO 8601)

Example output record

{
"slug": "easy-chocolate-cake",
"url": "https://www.bbcgoodfood.com/recipes/easy-chocolate-cake",
"name": "Easy chocolate cake",
"author": "Miriam Nice",
"description": "Master the chocolate cake with an airy, light sponge and rich buttercream filling...",
"recipe_category": "Cake",
"recipe_cuisine": "",
"recipe_yield": "Serves 8-10",
"prep_time": "PT30M",
"cook_time": "PT25M",
"total_time": "PT55M",
"skill_level": "Easy",
"recipe_ingredient": [
"225g unsalted butter, softened",
"225g golden caster sugar",
"4 large eggs"
],
"recipe_instructions": [
"Heat oven to 190C/170C fan/gas 5. Butter two 20cm sandwich tins...",
"Beat 225g softened unsalted butter and 225g golden caster sugar until fluffy..."
],
"nutrition": "{\"calories\":\"546 calories\",\"fatContent\":\"31 grams fat\",\"saturatedFatContent\":\"19 grams saturated fat\",\"carbohydrateContent\":\"63 grams carbohydrates\",\"sugarContent\":\"51 grams sugar\",\"fiberContent\":\"1 grams fiber\",\"proteinContent\":\"5 grams protein\",\"sodiumContent\":\"0.5 milligram of sodium\"}",
"aggregate_rating": 4.7,
"rating_count": 2314,
"keywords": ["Afternoon tea", "Celebration cake", "Chocolate cake"],
"dietary_tags": [],
"image_urls": ["https://images.immediate.co.uk/production/volatile/sites/30/2020/08/easy_chocolate_cake-b62f92c.jpg?resize=440,230"],
"date_published": "2020-08-21T00:00:00+00:00"
}

Notes

  • Crawl-delay: BBC Good Food's robots.txt specifies a 12-second crawl delay. The actor respects this via low concurrency. Full-corpus runs (~15K recipes) will take several hours.
  • New recipes: The sitemap is indexed quarterly (e.g. 2026-Q2-recipe.xml). Run periodically to capture newly published recipes.
  • Ratings on new recipes: Freshly published recipes may have no aggregate rating yet — aggregate_rating and rating_count will be null.
  • Nutrition format: The nutrition field is a JSON string. Parse it with JSON.parse(record.nutrition) to access individual nutrients.