URL to Menu: Restaurant Menu Scraper
Pricing
from $50.00 / 1,000 price per url with successful menu extractions
URL to Menu: Restaurant Menu Scraper
AI-powered restaurant menu scraper. Give any restaurant URL and receive structured JSON output instantly. Handles HTML, PDF, and image menus with no setup. Perfect for food delivery apps, aggregators, nutrition tools, and data pipelines. Contact lee.salesmap@gmail.com for support and pricing.
Pricing
from $50.00 / 1,000 price per url with successful menu extractions
Rating
0.0
(0)
Developer
Salesmap Lee
Maintained by CommunityActor stats
1
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Extracts fully structured menu data from restaurant websites — sections, dishes, prices, and dietary tags — using AI-powered document parsing. Results are pushed to the Apify dataset as clean JSON, or served synchronously via a REST API in standby mode.
What does URL to Menu: Restaurant Menu Scraper do?
Given one or more restaurant URLs, the actor:
- Crawls the site to find menu pages, linked PDFs, and menu images.
- Filters out non-menu content (vacancy pages, gallery images, allergen info) using AI.
- Extracts clean text from HTML pages, PDFs, and images using AI engine processing.
- Structures all extracted text into a canonical menu JSON using AI.
- Pushes the result to the Apify dataset (batch mode) or returns it in the HTTP response (standby mode).
Why use URL to Menu: Restaurant Menu Scraper?
- Handles PDFs and images — AI engine extracts text from scanned menus, photo menus, and PDF files.
- Two output modes — batch dataset for bulk collection, REST API for real-time integration.
- Status codes on every record — every result includes a
status_code(200/400/404/422/500) and a one-linestatus_messageso you can immediately see which URLs succeeded and why others failed. - Follows external menu links — if a restaurant links its menu to a third-party ordering platform, the actor follows that link securely.
- Security hardened — URL validation, SSRF protection, and LLM prompt injection defence built in.
- Graceful degradation — if a file fails processing, a partial result is returned rather than crashing.
How to use URL to Menu: Restaurant Menu Scraper
Batch mode (default)
- Open the actor in Apify Console.
- Under Input, add one or more restaurant URLs to the Restaurant URLs list.
- Optionally adjust Max Crawl Depth (default 3) and Max URLs per Run (default 10).
- Click Start. Results appear in the Dataset tab when the run finishes.
Standby mode (REST API)
- Open the actor in Apify Console.
- Enable Standby Mode (REST API) in the input.
- Start the actor. It will stay running and expose an HTTP endpoint.
- Send requests:
curl -X POST https://<container-url>/scrape \-H "Content-Type: application/json" \-d '{"url": "https://seapalace.nl"}'
The actor returns the parsed menu JSON synchronously.
Input
| Field | Type | Default | Description |
|---|---|---|---|
urls | string[] | — | Restaurant URLs to scrape (batch mode only, required). Each URL must start with http:// or https://. |
maxDepth | integer | 3 | Crawl depth from homepage (1–5) |
maxUrls | integer | 10 | Max URLs per batch run (1–50) |
standbyMode | boolean | false | Run as persistent REST API server |
idleTimeoutHours | integer | 1 | Standby mode only. Auto-shutdown after this many hours with no requests. Set to 0 to disable. |
Output
Dataset columns
| Column | Type | Description |
|---|---|---|
restaurant_name | string | Extracted restaurant name |
url | string | Input URL |
status_code | number | Result code: 200 success, 400 invalid URL, 404 no menu found, 422 extraction failed, 500 scrape error |
status_message | string | One-line description of the result or failure reason |
confidence | string | null | Extraction quality: "high" (≥10 items, ≥70% priced), "medium" (≥3 items), "low" (1–2 items), null (failed or empty) |
section_count | number | Number of menu sections |
item_count | number | Total number of menu items |
sections | string (JSON) | Full menu tree as a JSON string — sections may include a description field for set-menu notes |
Parsing the sections column
import pandas as pddf = pd.read_json('dataset.json')df['sections_parsed'] = df['sections'].apply(pd.read_json)
Example output (success)
{"restaurant_name": "Sea Palace","url": "https://seapalace.nl","status_code": 200,"status_message": "OK — 38 item(s) across 4 section(s)","confidence": "high","section_count": 4,"item_count": 38,"sections": "[{\"name\": \"Set Menu\", \"description\": \"From 2 persons €51.50 per person\", \"items\": [{\"name\": \"Har Gow\", \"description\": \"Steamed shrimp dumpling\", \"price\": 5.5, \"dietary_tags\": []}]}]"}
Example output (failure)
{"restaurant_name": "","url": "https://example-restaurant.com","status_code": 404,"status_message": "No menu content found — site may not have a public menu or it is hosted externally","confidence": null,"section_count": 0,"item_count": 0,"sections": "[]"}
You can download the dataset in various formats such as JSON, HTML, CSV, or Excel from the Apify Console or via the dataset API.
REST API reference
Endpoint: POST /scrape
Request:
{"url": "https://restaurant.com"}
Success response (200):
{"restaurant_name": "Sea Palace","url": "https://seapalace.nl","section_count": 4,"item_count": 38,"sections": "[...]"}
Error responses:
| Status | Meaning |
|---|---|
| 400 | Missing/invalid url, invalid scheme, private IP, or injection keyword detected in URL |
| 422 | Scraping succeeded but no menu sections could be parsed |
| 429 | Rate limit exceeded — max 10 requests/min per IP |
| 500 | Unexpected internal error |
Readiness probe: GET / returns 200 OK with body "ready" — used by Apify Standby for lifecycle management.
Security
All URLs are validated before any scraping begins:
- Only
http://andhttps://schemes are accepted. - URLs resolving to private/reserved IP ranges (RFC 1918, loopback, link-local) are blocked to prevent SSRF attacks.
- URLs containing LLM instruction keywords (
ignore,jailbreak,bypass, etc.) in the domain or path are rejected. - URLs longer than 2 048 characters are rejected.
Scraped content is sanitised before being sent to any AI model:
- HTML comments (
<!-- ... -->) are stripped (common injection vector). - Known injection phrases (
ignore all instructions,you are now,act as, etc.) are detected and redacted. - All scraped text is wrapped in XML fences (
<untrusted_content source="…">) to structurally separate data from instructions. - Content is truncated to 8 000 characters per source to limit blast radius.
Environment variables
Set in menu-scraper/.env for local runs:
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY | Required for AI model API (menu filtering and structured extraction) |
AI_ENGINE_API_KEY | Required for AI engine processing of PDFs and images |
APIFY_TOKEN | Required to call Apify APIs |
Pricing
This actor uses the Pay Per Event model:
| Event | Description |
|---|---|
actor-start | Charged once when the run begins, regardless of how many URLs are processed |
task-completed | Charged once per restaurant URL that returns a complete structured menu (status 200). One charge covers the full menu — all sections and items found for that URL. URLs that fail or return no menu data are not charged. |
FAQ
Why did a URL return status 404?
The actor could not find any menu content on the site. This can happen if the menu is hosted on a separate platform, requires JavaScript to render, or is behind a login. Check the status_message field for details.
Why did a URL return status 422?
The actor found pages but could not extract structured menu items from them. The content may be fully image-based, in an unsupported format, or the AI engine could not identify menu items in the text. Check the status_message for which stage failed.
Can I scrape more than 50 URLs at once? The batch mode cap is 50 URLs per run. For larger sets, split into multiple runs or use standby mode with a loop.
Is scraping legal?
Always respect the restaurant's Terms of Service and robots.txt. This actor is designed for legitimate menu data collection. Do not use it to scrape sites that prohibit automated access.
How do I report an issue or request a feature? Open an issue in the actor's Issues tab on Apify Console, or contact us directly at lee.salesmap@gmail.com.