# URL to Menu: Restaurant Menu Scraper (`salesmap-ai/url-to-menu-restaurant-menu-scraper`) Actor

AI-powered restaurant menu scraper. Give any restaurant URL and receive structured JSON output instantly. Handles HTML, PDF, and image menus with no setup. Perfect for food delivery apps, aggregators, nutrition tools, and data pipelines. Contact lee.salesmap@gmail.com for support and pricing.

- **URL**: https://apify.com/salesmap-ai/url-to-menu-restaurant-menu-scraper.md
- **Developed by:** [Salesmap Lee](https://apify.com/salesmap-ai) (community)
- **Categories:** AI, Agents, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $50.00 / 1,000 price per url with successful menu extractions

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## URL to Menu: Restaurant Menu Scraper

Extracts fully structured menu data from restaurant websites — sections, dishes, prices, and dietary tags — using AI-powered document parsing. Results are pushed to the Apify dataset as clean JSON, or served synchronously via a REST API in standby mode.

### What does URL to Menu: Restaurant Menu Scraper do?

Given one or more restaurant URLs, the actor:
1. Crawls the site to find menu pages, linked PDFs, and menu images.
2. Filters out non-menu content (vacancy pages, gallery images, allergen info) using AI.
3. Extracts clean text from HTML pages, PDFs, and images using AI engine processing.
4. Structures all extracted text into a canonical menu JSON using AI.
5. Pushes the result to the Apify dataset (batch mode) or returns it in the HTTP response (standby mode).

### Why use URL to Menu: Restaurant Menu Scraper?

- **Handles PDFs and images** — AI engine extracts text from scanned menus, photo menus, and PDF files.
- **Two output modes** — batch dataset for bulk collection, REST API for real-time integration.
- **Status codes on every record** — every result includes a `status_code` (200/400/404/422/500) and a one-line `status_message` so you can immediately see which URLs succeeded and why others failed.
- **Follows external menu links** — if a restaurant links its menu to a third-party ordering platform, the actor follows that link securely.
- **Security hardened** — URL validation, SSRF protection, and LLM prompt injection defence built in.
- **Graceful degradation** — if a file fails processing, a partial result is returned rather than crashing.

### How to use URL to Menu: Restaurant Menu Scraper

#### Batch mode (default)

1. Open the actor in Apify Console.
2. Under **Input**, add one or more restaurant URLs to the **Restaurant URLs** list.
3. Optionally adjust **Max Crawl Depth** (default 3) and **Max URLs per Run** (default 10).
4. Click **Start**. Results appear in the **Dataset** tab when the run finishes.

#### Standby mode (REST API)

1. Open the actor in Apify Console.
2. Enable **Standby Mode (REST API)** in the input.
3. Start the actor. It will stay running and expose an HTTP endpoint.
4. Send requests:

```bash
curl -X POST https://<container-url>/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://seapalace.nl"}'
````

The actor returns the parsed menu JSON synchronously.

### Input

| Field | Type | Default | Description |
|---|---|---|---|
| `urls` | string\[] | — | Restaurant URLs to scrape (batch mode only, required). Each URL must start with `http://` or `https://`. |
| `maxDepth` | integer | 3 | Crawl depth from homepage (1–5) |
| `maxUrls` | integer | 10 | Max URLs per batch run (1–50) |
| `standbyMode` | boolean | false | Run as persistent REST API server |
| `idleTimeoutHours` | integer | 1 | Standby mode only. Auto-shutdown after this many hours with no requests. Set to `0` to disable. |

### Output

#### Dataset columns

| Column | Type | Description |
|---|---|---|
| `restaurant_name` | string | Extracted restaurant name |
| `url` | string | Input URL |
| `status_code` | number | Result code: 200 success, 400 invalid URL, 404 no menu found, 422 extraction failed, 500 scrape error |
| `status_message` | string | One-line description of the result or failure reason |
| `confidence` | string | null | Extraction quality: `"high"` (≥10 items, ≥70% priced), `"medium"` (≥3 items), `"low"` (1–2 items), `null` (failed or empty) |
| `section_count` | number | Number of menu sections |
| `item_count` | number | Total number of menu items |
| `sections` | string (JSON) | Full menu tree as a JSON string — sections may include a `description` field for set-menu notes |

#### Parsing the `sections` column

```python
import pandas as pd

df = pd.read_json('dataset.json')
df['sections_parsed'] = df['sections'].apply(pd.read_json)
```

#### Example output (success)

```json
{
  "restaurant_name": "Sea Palace",
  "url": "https://seapalace.nl",
  "status_code": 200,
  "status_message": "OK — 38 item(s) across 4 section(s)",
  "confidence": "high",
  "section_count": 4,
  "item_count": 38,
  "sections": "[{\"name\": \"Set Menu\", \"description\": \"From 2 persons €51.50 per person\", \"items\": [{\"name\": \"Har Gow\", \"description\": \"Steamed shrimp dumpling\", \"price\": 5.5, \"dietary_tags\": []}]}]"
}
```

#### Example output (failure)

```json
{
  "restaurant_name": "",
  "url": "https://example-restaurant.com",
  "status_code": 404,
  "status_message": "No menu content found — site may not have a public menu or it is hosted externally",
  "confidence": null,
  "section_count": 0,
  "item_count": 0,
  "sections": "[]"
}
```

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel from the Apify Console or via the dataset API.

### REST API reference

**Endpoint:** `POST /scrape`

**Request:**

```json
{"url": "https://restaurant.com"}
```

**Success response (200):**

```json
{
  "restaurant_name": "Sea Palace",
  "url": "https://seapalace.nl",
  "section_count": 4,
  "item_count": 38,
  "sections": "[...]"
}
```

**Error responses:**

| Status | Meaning |
|---|---|
| 400 | Missing/invalid `url`, invalid scheme, private IP, or injection keyword detected in URL |
| 422 | Scraping succeeded but no menu sections could be parsed |
| 429 | Rate limit exceeded — max 10 requests/min per IP |
| 500 | Unexpected internal error |

**Readiness probe:** `GET /` returns `200 OK` with body `"ready"` — used by Apify Standby for lifecycle management.

### Security

All URLs are validated before any scraping begins:

- Only `http://` and `https://` schemes are accepted.
- URLs resolving to private/reserved IP ranges (RFC 1918, loopback, link-local) are blocked to prevent SSRF attacks.
- URLs containing LLM instruction keywords (`ignore`, `jailbreak`, `bypass`, etc.) in the domain or path are rejected.
- URLs longer than 2 048 characters are rejected.

Scraped content is sanitised before being sent to any AI model:

- HTML comments (`<!-- ... -->`) are stripped (common injection vector).
- Known injection phrases (`ignore all instructions`, `you are now`, `act as`, etc.) are detected and redacted.
- All scraped text is wrapped in XML fences (`<untrusted_content source="…">`) to structurally separate data from instructions.
- Content is truncated to 8 000 characters per source to limit blast radius.

### Environment variables

Set in `menu-scraper/.env` for local runs:

| Variable | Purpose |
|---|---|
| `ANTHROPIC_API_KEY` | Required for AI model API (menu filtering and structured extraction) |
| `AI_ENGINE_API_KEY` | Required for AI engine processing of PDFs and images |
| `APIFY_TOKEN` | Required to call Apify APIs |

### Pricing

This actor uses the [Pay Per Event](https://docs.apify.com/platform/actors/publishing/monetize#pay-per-event-pricing-model) model:

| Event | Description |
|---|---|
| `actor-start` | Charged once when the run begins, regardless of how many URLs are processed |
| `task-completed` | Charged once per restaurant URL that returns a complete structured menu (status 200). One charge covers the full menu — all sections and items found for that URL. URLs that fail or return no menu data are not charged. |

### FAQ

**Why did a URL return status 404?**
The actor could not find any menu content on the site. This can happen if the menu is hosted on a separate platform, requires JavaScript to render, or is behind a login. Check the `status_message` field for details.

**Why did a URL return status 422?**
The actor found pages but could not extract structured menu items from them. The content may be fully image-based, in an unsupported format, or the AI engine could not identify menu items in the text. Check the `status_message` for which stage failed.

**Can I scrape more than 50 URLs at once?**
The batch mode cap is 50 URLs per run. For larger sets, split into multiple runs or use standby mode with a loop.

**Is scraping legal?**
Always respect the restaurant's Terms of Service and `robots.txt`. This actor is designed for legitimate menu data collection. Do not use it to scrape sites that prohibit automated access.

**How do I report an issue or request a feature?**
Open an issue in the actor's Issues tab on Apify Console, or contact us directly at **lee.salesmap@gmail.com**.

# Actor input Schema

## `urls` (type: `array`):

List of restaurant website URLs to scrape for menu content. Each URL must start with http:// or https://. Ignored in standby mode.

## `maxDepth` (type: `integer`):

How many link levels to follow from the restaurant homepage (1 = menu landing pages only, 3 = recommended).

## `maxUrls` (type: `integer`):

Maximum number of restaurant URLs to process in a single batch run (1–50). Ignored in standby mode.

## `standbyMode` (type: `boolean`):

When enabled, the actor runs as a persistent HTTP server instead of a batch job. Send POST /scrape with {"url": "https://..."} to scrape a single restaurant. Rate limited to 10 req/min per IP.

## `idleTimeoutHours` (type: `integer`):

Standby mode only. The actor automatically shuts down after this many hours with no incoming requests, preventing unnecessary compute costs. Set to 0 to disable (actor runs until manually stopped).

## Actor input object example

```json
{
  "urls": [
    "https://loetje.nl/"
  ],
  "maxDepth": 3,
  "maxUrls": 10,
  "standbyMode": false,
  "idleTimeoutHours": 1
}
```

# Actor output Schema

## `results` (type: `string`):

No description

## `rawPages` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://loetje.nl/"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("salesmap-ai/url-to-menu-restaurant-menu-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "urls": ["https://loetje.nl/"] }

# Run the Actor and wait for it to finish
run = client.actor("salesmap-ai/url-to-menu-restaurant-menu-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://loetje.nl/"
  ]
}' |
apify call salesmap-ai/url-to-menu-restaurant-menu-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=salesmap-ai/url-to-menu-restaurant-menu-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "URL to Menu: Restaurant Menu Scraper",
        "description": "AI-powered restaurant menu scraper. Give any restaurant URL and receive structured JSON output instantly. Handles HTML, PDF, and image menus with no setup. Perfect for food delivery apps, aggregators, nutrition tools, and data pipelines. Contact lee.salesmap@gmail.com for support and pricing.",
        "version": "1.0",
        "x-build-id": "OGT6baZBhZPVsBitp"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/salesmap-ai~url-to-menu-restaurant-menu-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-salesmap-ai-url-to-menu-restaurant-menu-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/salesmap-ai~url-to-menu-restaurant-menu-scraper/runs": {
            "post": {
                "operationId": "runs-sync-salesmap-ai-url-to-menu-restaurant-menu-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/salesmap-ai~url-to-menu-restaurant-menu-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-salesmap-ai-url-to-menu-restaurant-menu-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "Restaurant URLs",
                        "type": "array",
                        "description": "List of restaurant website URLs to scrape for menu content. Each URL must start with http:// or https://. Ignored in standby mode.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxDepth": {
                        "title": "Maximum Crawl Depth",
                        "minimum": 1,
                        "maximum": 5,
                        "type": "integer",
                        "description": "How many link levels to follow from the restaurant homepage (1 = menu landing pages only, 3 = recommended).",
                        "default": 3
                    },
                    "maxUrls": {
                        "title": "Max URLs per Run",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Maximum number of restaurant URLs to process in a single batch run (1–50). Ignored in standby mode.",
                        "default": 10
                    },
                    "standbyMode": {
                        "title": "Standby Mode (REST API)",
                        "type": "boolean",
                        "description": "When enabled, the actor runs as a persistent HTTP server instead of a batch job. Send POST /scrape with {\"url\": \"https://...\"} to scrape a single restaurant. Rate limited to 10 req/min per IP.",
                        "default": false
                    },
                    "idleTimeoutHours": {
                        "title": "Idle Timeout (hours)",
                        "minimum": 0,
                        "maximum": 720,
                        "type": "integer",
                        "description": "Standby mode only. The actor automatically shuts down after this many hours with no incoming requests, preventing unnecessary compute costs. Set to 0 to disable (actor runs until manually stopped).",
                        "default": 1
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
