Recipe Scraper (Universal / schema.org)
Pricing
from $3.00 / 1,000 results
Recipe Scraper (Universal / schema.org)
Scrape any schema.org-compliant recipe site like Epicurious, BBC Good Food, Tasty, NYT Cooking, Serious Eats, Food Network, plus thousands of food blogs. Extracts ingredients, instructions, nutrition, ratings, prep/cook time, yield, author, and images via JSON-LD parsing.
Pricing
from $3.00 / 1,000 results
Rating
5.0
(21)
Developer
Crawler Bros
Actor stats
21
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Scrape any schema.org-compliant recipe site. Extracts ingredients, step-by-step instructions, nutrition, ratings, prep + cook + total time, yield, author, images, and video URLs from a clean schema.org/Recipe JSON-LD parse.
Works on Epicurious, Tasty, BBC Good Food, NYT Cooking, Bon Appétit, Serious Eats, King Arthur Baking, Food Network, Smitten Kitchen, Budget Bytes, and thousands of food blogs that use the standard recipe schema (most WordPress recipe plugins emit it).
AllRecipes is not supported. It is fronted by Akamai Bot Manager which blocks both datacenter and residential IPs. AllRecipes URLs are rejected upfront with a typed
url_failed(reason: "unsupported_site") record. Use any of the supported sites above instead.
What you get
Recipe records (recordType=recipe)
| Field | Description |
|---|---|
url | Canonical recipe URL |
id | Same as url |
platform | Site slug (epicurious, tasty, bbcgoodfood, nytcooking, bonappetit, seriouseats, …) |
name | Recipe title |
description | Short blurb (HTML stripped) |
image | Hero image URL |
author | {name, [url]} |
ratingValue | Average rating 0-5 |
ratingCount | Number of ratings |
reviewCount | Number of written reviews |
prepTimeMinutes | Prep time in minutes |
cookTimeMinutes | Cook time in minutes |
totalTimeMinutes | Total time (uses totalTime if present, else prep + cook) |
recipeYield | Servings / pieces (e.g. 8 slices) |
recipeCategory | Array (e.g. ["Dessert"]) |
recipeCuisine | Array (e.g. ["American"]) |
keywords | Array of free-form keyword tags |
recipeIngredient | Array of ingredient strings |
recipeInstructions | Array of step strings |
nutrition | {calories, proteinContent, carbohydrateContent, fatContent, sodiumContent, …} |
video | {url, name, thumbnailUrl} if present |
datePublished | ISO publish date |
dateModified | ISO last-modified date |
scrapedAt | ISO 8601 UTC timestamp |
Empty fields are dropped from every record at every depth.
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | Enum | byUrls | byUrls / byTag / bySitemap |
recipeUrls | Array | ["https://www.epicurious.com/recipes/food/views/banana-bread"] | Recipe URLs (mode=byUrls) |
tagUrls | Array | — | Epicurious tag URLs or slugs (mode=byTag) — e.g. dessert, /ingredient/banana, full category URL |
sitemapUrls | Array | — | Sitemap.xml URLs from any recipe site (mode=bySitemap) |
minRating | Integer | — | Drop recipes with ratingValue below this. Scale 0-500 (e.g. 400 = 4.0/5) |
minRatingCount | Integer | — | Drop recipes with fewer ratings than this |
maxTotalTimeMinutes | Integer | — | Drop recipes with total time above this |
keywordIncludes | String | — | Drop recipes whose name + description don't include this keyword |
Example input — by tag (Epicurious)
{"mode": "byTag","tagUrls": ["main-course", "/ingredient/banana"],"maxItems": 30}
Example input — by sitemap (BBC Good Food)
{"mode": "bySitemap","sitemapUrls": ["https://www.bbcgoodfood.com/sitemaps/2026-Q2-recipe.xml"],"maxItems": 50}
| proxy | Object | RESIDENTIAL | Optional; supported sites work without proxy too |
| maxItems | Integer | 50 | Hard cap on emitted records (1-1000) |
Example input — single Epicurious recipe (no proxy needed)
{"recipeUrls": ["https://www.epicurious.com/recipes/food/views/banana-bread"]}
Example input — bulk recipe URLs across multiple sites
{"recipeUrls": ["https://tasty.co/recipe/banana-bread","https://www.bbcgoodfood.com/recipes/banana-bread","https://cooking.nytimes.com/recipes/12166-banana-bread","https://www.epicurious.com/recipes/food/views/banana-bread"],"minRating": 400,"maxTotalTimeMinutes": 90}
Example input — keyword filter
{"recipeUrls": ["https://tasty.co/recipe/banana-bread"],"keywordIncludes": "banana","minRatingCount": 100}
Example output
{"recordType": "recipe","url": "https://www.epicurious.com/recipes/food/views/banana-bread","platform": "epicurious","name": "Banana Bread With Variations","description": "Use this versatile banana bread recipe as a base for fun mix-ins...","image": "https://assets.epicurious.com/photos/.../banana-bread.jpg","author": { "name": "Epicurious Editors" },"ratingValue": 4.5,"ratingCount": 423,"prepTimeMinutes": 15,"cookTimeMinutes": 60,"totalTimeMinutes": 75,"recipeYield": "1 loaf","recipeCategory": ["dessert"],"recipeCuisine": ["American"],"keywords": ["banana", "bread", "baking"],"recipeIngredient": ["1¾ cups all-purpose flour","3 large overripe bananas, mashed","..."],"recipeInstructions": ["Preheat oven to 350°F.","In a bowl, mash the bananas...","..."],"nutrition": {"calories": "320","carbohydrateContent": "55g","proteinContent": "5g"},"datePublished": "2009-03-30","scrapedAt": "2026-05-06T10:42:18Z"}
Use cases
- Recipe SEO / content audits — Pull schema.org Recipe data across competitor sites for content gap analysis.
- Meal-planning apps — Build recipe libraries by ingesting curated URL lists.
- Nutrition trackers — Standardize ingredient + nutrition data across sources.
- AI training data — Construct labelled recipe datasets for ML pipelines (cooking instruction generation, ingredient extraction).
- Aggregator backends — Power recipe-bookmarking apps that index user-submitted URLs.
FAQ
Why isn't AllRecipes supported?
AllRecipes is fronted by Akamai Bot Manager which blocks both datacenter and residential IP ranges aggressively. Every known bypass currently fails. AllRecipes URLs are rejected upfront with a typed url_failed (reason: "unsupported_site") record so users see clear feedback instead of silent failures or fake data.
Which sites are confirmed to work?
- ✅ Epicurious — works without proxy
- ✅ Tasty (BuzzFeed) — works without proxy
- ✅ BBC Good Food — works without proxy
- ✅ NYT Cooking — works without proxy (paywall content blocked, public recipes work)
- ✅ Bon Appétit, Serious Eats, Food Network, King Arthur Baking — work, residential proxy is the safe default
- ✅ Most WordPress food blogs — work without proxy (Smitten Kitchen, Budget Bytes, Pinch of Yum, Minimalist Baker, etc.)
- ❌ AllRecipes — not supported (rejected upfront)
What if a URL has no schema.org/Recipe JSON-LD?
The actor emits a url_failed record with reason: "no_recipe_jsonld". Most modern recipe sites embed it; sites that don't (older blogs, sites with custom layouts) won't yield data.
Do I get instructions as a single block or step-by-step?
Step-by-step. The actor parses HowToStep and HowToSection schema types into a flat array of step strings. HTML formatting is stripped.
What if the page returns blocked / 403?
The actor retries with exponential backoff. URLs that 403 emit a url_failed record with reason: "anti_bot_block".
How current is the data? Live — every run hits the recipe site at request time. Schedule the actor for daily / weekly refreshes to track rating drift on a recipe portfolio.
Limitations
- The actor reads
schema.org/RecipeJSON-LD only. Sites that don't embed it (or use a custom microformat) won't yield data. - AllRecipes specifically requires residential proxy; without one you'll get a
recipe_blockedsentinel. - Some sites strip nutrition data or vary on
prepTime/cookTime; missing fields are simply omitted. - Per-comment / per-review data isn't included (only aggregate
ratingCount/reviewCount). - Video transcripts / step-by-step photos are not captured.