Food.com Recipe Scraper avatar

Food.com Recipe Scraper

Pricing

Pay per event

Go to Apify Store
Food.com Recipe Scraper

Food.com Recipe Scraper

Scrape recipes from Food.com — one of the largest English community recipe databases with 500K+ recipes. Enumerate the full sitemap or supply specific URLs. Extracts ingredients, instructions, nutrition, ratings, reviews, and tag taxonomy.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Scrape recipes from Food.com — one of the largest English-language community recipe databases with over 500,000 recipes including ratings, reviews, and a rich tag taxonomy.

What it does

This actor enumerates Food.com's full sitemap (or accepts a direct list of recipe URLs) and extracts structured recipe data from each page. All core fields come from the embedded schema.org/Recipe JSON-LD block, supplemented with DOM extraction for Food.com-specific data (tag taxonomy, rating details, image gallery).

Use cases

  • Build recommender training datasets (ratings + review counts at 500K+ scale)
  • Meal-plan and recipe-app databases
  • Food trend analytics and NLP corpora
  • RAG pipelines for culinary applications
  • Competitive ingredient and nutritional analysis

Input

FieldTypeDescription
maxItemsintegerMaximum number of recipes to scrape. Set to 0 for the full ~500K corpus. Default: 10
recipeUrlsarrayOptional list of specific Food.com recipe URLs to scrape. If provided, sitemap enumeration is skipped.

Example: specific URLs

{
"maxItems": 5,
"recipeUrls": [
{ "url": "https://www.food.com/recipe/jo-mamas-world-famous-spaghetti-22782" },
{ "url": "https://www.food.com/recipe/easy-homemade-chicken-soup-157877" }
]
}

Example: sitemap enumeration (first 1000 recipes)

{
"maxItems": 1000
}

Output

Each record in the dataset corresponds to one recipe:

FieldTypeDescription
recipe_idstringUnique numeric recipe ID from the URL
urlstringCanonical recipe URL
namestringRecipe name
authorstringRecipe author username
descriptionstringFull recipe description
recipe_categorystringPrimary category (e.g. Dessert, Main Dish)
recipe_cuisinestringCuisine type if specified (e.g. Italian)
prep_timestringPreparation time in ISO 8601 format (e.g. PT15M)
cook_timestringCook time in ISO 8601 format
total_timestringTotal time in ISO 8601 format
recipe_yieldstringServings (e.g. "4 serving(s)")
recipe_ingredientarrayIngredients as formatted strings
recipe_instructionsarrayStep-by-step instruction strings
nutritionobjectNutritional data: calories, fat_content, saturated_fat, cholesterol, sodium, carbohydrate, fiber, sugar, protein
aggregate_ratingnumberAverage star rating (0-5 scale)
rating_countintegerTotal number of ratings
review_countintegerTotal number of written reviews
keywordsstringComma-separated keywords (occasion, diet, method tags)
tagsarrayFood.com topic taxonomy tags
image_urlsarrayRecipe photo URLs
date_publishedstringPublication date (ISO 8601)

Sample output record

{
"recipe_id": "22782",
"url": "https://www.food.com/recipe/jo-mamas-world-famous-spaghetti-22782",
"name": "Jo Mama's World Famous Spaghetti",
"author": "Sharlene~W",
"description": "My kids will give up a steak dinner for this spaghetti...",
"recipe_category": "Spaghetti",
"recipe_cuisine": null,
"prep_time": "PT20M",
"cook_time": "PT1H",
"total_time": "PT1H20M",
"recipe_yield": "4 quarts, 10-14 serving(s)",
"recipe_ingredient": ["2 lbs Italian sausage, casings removed", "..."],
"recipe_instructions": ["In large, heavy stockpot, brown Italian sausage...", "..."],
"nutrition": {
"calories": "555.9",
"fat_content": "26.3",
"protein": "29.8"
},
"aggregate_rating": 5.0,
"rating_count": 1376,
"review_count": 1376,
"keywords": "Pork,Meat,European,Kid Friendly,Weeknight,Stove Top,< 4 Hours,Easy",
"tags": ["Spaghetti"],
"image_urls": ["https://img.sndimg.com/food/image/upload/..."],
"date_published": "2002-03-17T10:26Z"
}

Crawl approach

  1. Sitemap enumeration: Fetches https://www.food.com/sitemap.xml (a 24-child sitemap index with gzip-compressed child files) and collects all /recipe/ URLs.
  2. Page scraping: Each recipe page is fetched and parsed via the embedded schema.org/Recipe JSON-LD block for structured data, plus DOM extraction for Food.com-specific taxonomy and image gallery.
  3. Rate limiting: Automatic rate-limit detection and backoff — no manual configuration needed.

Performance

  • Memory: 512 MB
  • No proxy required — Food.com datacenter access is open
  • Concurrency: 10 parallel requests
  • Full corpus (~500K recipes): runs over the default 4-hour timeout