BBC Good Food Recipe Scraper
Pricing
Pay per event
BBC Good Food Recipe Scraper
Enumerate and scrape the full BBC Good Food recipe catalogue (~15K+ recipes) from sitemap discovery. Extracts structured recipe data including ingredients, instructions, UK nutrition panels, skill level, dietary tags, ratings, and schema.org/Recipe JSON-LD fields.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Overview
The BBC Good Food Recipe Scraper enumerates and extracts the full BBC Good Food recipe catalogue (~15,000+ recipes) using sitemap discovery. It captures rich structured data from each recipe page including ingredients, step-by-step instructions, the UK nutrition panel, BBC-specific skill levels, dietary tags, star ratings, and schema.org/Recipe JSON-LD fields.
BBC Good Food is the largest free English-language recipe authority in the UK, with content covering everything from quick weeknight dinners to elaborate celebration cakes. Unlike generic multi-site scrapers that require you to supply URLs and drop BBC-specific fields, this actor discovers the entire corpus automatically and extracts every structured field the site provides.
Features
- Full sitemap enumeration: Walks the BBC Good Food sitemap index and collects every recipe URL across all quarterly recipe sitemaps (~15K+ recipes).
- BYO URL mode: Supply specific recipe URLs via
startUrlsto scrape targeted recipes without a full crawl. - schema.org/Recipe extraction: Parses the embedded JSON-LD block on each page for all standard Recipe fields.
- BBC-specific fields: Extracts skill level (Easy / More effort / A challenge), dietary tags (vegetarian, vegan, gluten-free, healthy, etc.), and the UK nutrition panel.
- Respectful crawling: Honours the site's crawl-delay directive with conservative concurrency.
- Incremental-friendly: Use
maxItemsto cap run size for incremental update workflows.
Use Cases
- Building recipe datasets for LLM fine-tuning or RAG pipelines.
- Meal planning and nutrition app data ingestion.
- Food-trend analytics using BBC's categorisation taxonomy and editorial dietary tags.
- Competitive benchmarking for recipe content platforms.
- Academic research on UK food culture and cooking trends.
How It Works
- Sitemap discovery: Fetches
https://www.bbcgoodfood.com/sitemap.xml(a 260-child index) and filters to recipe-type sitemaps (e.g.2026-Q2-recipe.xml). - URL collection: Extracts all
/recipes/<slug>URLs from matching sitemaps, capped atmaxItems. - Page extraction: Fetches each recipe page and parses the
schema.org/RecipeJSON-LD block plus supplemental BBC DOM fields. - Output: Stores one record per recipe in the Apify dataset.
Input
| Field | Type | Required | Description |
|---|---|---|---|
maxItems | Integer | Yes | Maximum number of recipes to scrape. Set to 0 for the full corpus (15K+). Default: 10. |
startUrls | Array | No | Specific BBC Good Food recipe URLs to scrape. Skips sitemap discovery when provided. |
Example — Full sitemap run (capped)
{"maxItems": 500}
Example — BYO URLs
{"startUrls": [{ "url": "https://www.bbcgoodfood.com/recipes/easy-chocolate-cake" },{ "url": "https://www.bbcgoodfood.com/recipes/iced-tea" }],"maxItems": 10}
Output
One record per recipe. All fields sourced from schema.org/Recipe JSON-LD unless noted.
| Field | Type | Description |
|---|---|---|
slug | String | URL slug (e.g. easy-chocolate-cake) |
url | String | Full recipe page URL |
name | String | Recipe title |
author | String | Recipe author name |
description | String | Short editorial description |
recipe_category | String | Category (e.g. Cake, Dinner, Drink) |
recipe_cuisine | String | Cuisine type (e.g. British, Italian) |
recipe_yield | String | Serving yield (e.g. "Serves 8") |
prep_time | String | Prep time as ISO 8601 duration (e.g. PT20M) |
cook_time | String | Cook time as ISO 8601 duration |
total_time | String | Total time as ISO 8601 duration |
skill_level | String | BBC skill rating: Easy / More effort / A challenge |
recipe_ingredient | Array | List of ingredient strings |
recipe_instructions | Array | List of step-by-step instruction strings |
nutrition | String | JSON-encoded per-serving nutrition data (kcal, fat, saturates, carbs, sugars, fibre, protein, salt) |
aggregate_rating | Number | Average star rating (1–5 scale) |
rating_count | Integer | Number of ratings |
keywords | Array | Editorial keyword tags |
dietary_tags | Array | Dietary suitability tags (vegetarian, vegan, gluten-free, healthy, etc.) |
image_urls | Array | Recipe image URLs |
date_published | String | Publication date (ISO 8601) |
Example output record
{"slug": "easy-chocolate-cake","url": "https://www.bbcgoodfood.com/recipes/easy-chocolate-cake","name": "Easy chocolate cake","author": "Miriam Nice","description": "Master the chocolate cake with an airy, light sponge and rich buttercream filling...","recipe_category": "Cake","recipe_cuisine": "","recipe_yield": "Serves 8-10","prep_time": "PT30M","cook_time": "PT25M","total_time": "PT55M","skill_level": "Easy","recipe_ingredient": ["225g unsalted butter, softened","225g golden caster sugar","4 large eggs"],"recipe_instructions": ["Heat oven to 190C/170C fan/gas 5. Butter two 20cm sandwich tins...","Beat 225g softened unsalted butter and 225g golden caster sugar until fluffy..."],"nutrition": "{\"calories\":\"546 calories\",\"fatContent\":\"31 grams fat\",\"saturatedFatContent\":\"19 grams saturated fat\",\"carbohydrateContent\":\"63 grams carbohydrates\",\"sugarContent\":\"51 grams sugar\",\"fiberContent\":\"1 grams fiber\",\"proteinContent\":\"5 grams protein\",\"sodiumContent\":\"0.5 milligram of sodium\"}","aggregate_rating": 4.7,"rating_count": 2314,"keywords": ["Afternoon tea", "Celebration cake", "Chocolate cake"],"dietary_tags": [],"image_urls": ["https://images.immediate.co.uk/production/volatile/sites/30/2020/08/easy_chocolate_cake-b62f92c.jpg?resize=440,230"],"date_published": "2020-08-21T00:00:00+00:00"}
Notes
- Crawl-delay: BBC Good Food's
robots.txtspecifies a 12-second crawl delay. The actor respects this via low concurrency. Full-corpus runs (~15K recipes) will take several hours. - New recipes: The sitemap is indexed quarterly (e.g.
2026-Q2-recipe.xml). Run periodically to capture newly published recipes. - Ratings on new recipes: Freshly published recipes may have no aggregate rating yet —
aggregate_ratingandrating_countwill benull. - Nutrition format: The
nutritionfield is a JSON string. Parse it withJSON.parse(record.nutrition)to access individual nutrients.