URL to Menu: Restaurant Menu Scraper avatar

URL to Menu: Restaurant Menu Scraper

Pricing

from $50.00 / 1,000 price per url with successful menu extractions

Go to Apify Store
URL to Menu: Restaurant Menu Scraper

URL to Menu: Restaurant Menu Scraper

AI-powered restaurant menu scraper. Give any restaurant URL and receive structured JSON output instantly. Handles HTML, PDF, and image menus with no setup. Perfect for food delivery apps, aggregators, nutrition tools, and data pipelines. Contact lee.salesmap@gmail.com for support and pricing.

Pricing

from $50.00 / 1,000 price per url with successful menu extractions

Rating

0.0

(0)

Developer

Salesmap Lee

Salesmap Lee

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Extracts fully structured menu data from restaurant websites — sections, dishes, prices, and dietary tags — using AI-powered document parsing. Results are pushed to the Apify dataset as clean JSON, or served synchronously via a REST API in standby mode.

What does URL to Menu: Restaurant Menu Scraper do?

Given one or more restaurant URLs, the actor:

  1. Crawls the site to find menu pages, linked PDFs, and menu images.
  2. Filters out non-menu content (vacancy pages, gallery images, allergen info) using AI.
  3. Extracts clean text from HTML pages, PDFs, and images using AI engine processing.
  4. Structures all extracted text into a canonical menu JSON using AI.
  5. Pushes the result to the Apify dataset (batch mode) or returns it in the HTTP response (standby mode).

Why use URL to Menu: Restaurant Menu Scraper?

  • Handles PDFs and images — AI engine extracts text from scanned menus, photo menus, and PDF files.
  • Two output modes — batch dataset for bulk collection, REST API for real-time integration.
  • Status codes on every record — every result includes a status_code (200/400/404/422/500) and a one-line status_message so you can immediately see which URLs succeeded and why others failed.
  • Follows external menu links — if a restaurant links its menu to a third-party ordering platform, the actor follows that link securely.
  • Security hardened — URL validation, SSRF protection, and LLM prompt injection defence built in.
  • Graceful degradation — if a file fails processing, a partial result is returned rather than crashing.

How to use URL to Menu: Restaurant Menu Scraper

Batch mode (default)

  1. Open the actor in Apify Console.
  2. Under Input, add one or more restaurant URLs to the Restaurant URLs list.
  3. Optionally adjust Max Crawl Depth (default 3) and Max URLs per Run (default 10).
  4. Click Start. Results appear in the Dataset tab when the run finishes.

Standby mode (REST API)

  1. Open the actor in Apify Console.
  2. Enable Standby Mode (REST API) in the input.
  3. Start the actor. It will stay running and expose an HTTP endpoint.
  4. Send requests:
curl -X POST https://<container-url>/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://seapalace.nl"}'

The actor returns the parsed menu JSON synchronously.

Input

FieldTypeDefaultDescription
urlsstring[]Restaurant URLs to scrape (batch mode only, required). Each URL must start with http:// or https://.
maxDepthinteger3Crawl depth from homepage (1–5)
maxUrlsinteger10Max URLs per batch run (1–50)
standbyModebooleanfalseRun as persistent REST API server
idleTimeoutHoursinteger1Standby mode only. Auto-shutdown after this many hours with no requests. Set to 0 to disable.

Output

Dataset columns

ColumnTypeDescription
restaurant_namestringExtracted restaurant name
urlstringInput URL
status_codenumberResult code: 200 success, 400 invalid URL, 404 no menu found, 422 extraction failed, 500 scrape error
status_messagestringOne-line description of the result or failure reason
confidencestring | nullExtraction quality: "high" (≥10 items, ≥70% priced), "medium" (≥3 items), "low" (1–2 items), null (failed or empty)
section_countnumberNumber of menu sections
item_countnumberTotal number of menu items
sectionsstring (JSON)Full menu tree as a JSON string — sections may include a description field for set-menu notes

Parsing the sections column

import pandas as pd
df = pd.read_json('dataset.json')
df['sections_parsed'] = df['sections'].apply(pd.read_json)

Example output (success)

{
"restaurant_name": "Sea Palace",
"url": "https://seapalace.nl",
"status_code": 200,
"status_message": "OK — 38 item(s) across 4 section(s)",
"confidence": "high",
"section_count": 4,
"item_count": 38,
"sections": "[{\"name\": \"Set Menu\", \"description\": \"From 2 persons €51.50 per person\", \"items\": [{\"name\": \"Har Gow\", \"description\": \"Steamed shrimp dumpling\", \"price\": 5.5, \"dietary_tags\": []}]}]"
}

Example output (failure)

{
"restaurant_name": "",
"url": "https://example-restaurant.com",
"status_code": 404,
"status_message": "No menu content found — site may not have a public menu or it is hosted externally",
"confidence": null,
"section_count": 0,
"item_count": 0,
"sections": "[]"
}

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel from the Apify Console or via the dataset API.

REST API reference

Endpoint: POST /scrape

Request:

{"url": "https://restaurant.com"}

Success response (200):

{
"restaurant_name": "Sea Palace",
"url": "https://seapalace.nl",
"section_count": 4,
"item_count": 38,
"sections": "[...]"
}

Error responses:

StatusMeaning
400Missing/invalid url, invalid scheme, private IP, or injection keyword detected in URL
422Scraping succeeded but no menu sections could be parsed
429Rate limit exceeded — max 10 requests/min per IP
500Unexpected internal error

Readiness probe: GET / returns 200 OK with body "ready" — used by Apify Standby for lifecycle management.

Security

All URLs are validated before any scraping begins:

  • Only http:// and https:// schemes are accepted.
  • URLs resolving to private/reserved IP ranges (RFC 1918, loopback, link-local) are blocked to prevent SSRF attacks.
  • URLs containing LLM instruction keywords (ignore, jailbreak, bypass, etc.) in the domain or path are rejected.
  • URLs longer than 2 048 characters are rejected.

Scraped content is sanitised before being sent to any AI model:

  • HTML comments (<!-- ... -->) are stripped (common injection vector).
  • Known injection phrases (ignore all instructions, you are now, act as, etc.) are detected and redacted.
  • All scraped text is wrapped in XML fences (<untrusted_content source="…">) to structurally separate data from instructions.
  • Content is truncated to 8 000 characters per source to limit blast radius.

Environment variables

Set in menu-scraper/.env for local runs:

VariablePurpose
ANTHROPIC_API_KEYRequired for AI model API (menu filtering and structured extraction)
AI_ENGINE_API_KEYRequired for AI engine processing of PDFs and images
APIFY_TOKENRequired to call Apify APIs

Pricing

This actor uses the Pay Per Event model:

EventDescription
actor-startCharged once when the run begins, regardless of how many URLs are processed
task-completedCharged once per restaurant URL that returns a complete structured menu (status 200). One charge covers the full menu — all sections and items found for that URL. URLs that fail or return no menu data are not charged.

FAQ

Why did a URL return status 404? The actor could not find any menu content on the site. This can happen if the menu is hosted on a separate platform, requires JavaScript to render, or is behind a login. Check the status_message field for details.

Why did a URL return status 422? The actor found pages but could not extract structured menu items from them. The content may be fully image-based, in an unsupported format, or the AI engine could not identify menu items in the text. Check the status_message for which stage failed.

Can I scrape more than 50 URLs at once? The batch mode cap is 50 URLs per run. For larger sets, split into multiple runs or use standby mode with a loop.

Is scraping legal? Always respect the restaurant's Terms of Service and robots.txt. This actor is designed for legitimate menu data collection. Do not use it to scrape sites that prohibit automated access.

How do I report an issue or request a feature? Open an issue in the actor's Issues tab on Apify Console, or contact us directly at lee.salesmap@gmail.com.