# EU AI Act & Regulation Monitor (RAG-Optimized) (`aelix/eu-ai-act-regulation-monitor`) Actor

Monitors EUR-Lex for EU AI-related legislation and delivers clean, structured Markdown/JSON enriched with CELEX IDs, version hashes, token counts, and vector-DB chunk hints. Ideal for RAG pipelines, legal AI assistants, and compliance dashboards.

Premium RAG-Ready Feed: $150.00 per 1,000 results.

- **URL**: https://apify.com/aelix/eu-ai-act-regulation-monitor.md
- **Developed by:** [Aelix](https://apify.com/aelix) (community)
- **Categories:** AI, Developer tools, Other
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $150.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## EU AI Act & Regulation Monitor (RAG-Optimized)

Stop feeding your AI agents messy HTML and irrelevant search results. This premium Apify Actor is built specifically for **LegalTech developers**, **Compliance Officers**, and **AI Startups** who need clean, structured, and highly relevant legislative data from the European Union's EUR-Lex portal.

---

### 🚀 Why This Actor is Different

Most scrapers return a massive dump of unreadable HTML. This Actor functions as a **commercial-grade data pipeline**, processing raw legal text into a format immediately ready for Retrieval-Augmented Generation (RAG) and LLM context windows.

- **Strict AI Relevance Filtering** — It doesn't just search for keywords; it parses the full body text of every document. If a document doesn't explicitly mention *"Artificial Intelligence"* or *"AI Act"* in the actual content, it is dropped before saving. You only pay for high-signal data.
- **Pure Markdown Output** — All EU navbars, footers, cookie banners, and HTML tables are aggressively stripped, leaving only clean, dense Markdown prose ready for tokenisation.
- **Built-in Chunking Hints** — The output automatically identifies every `Article` boundary and records its exact character index, enabling seamless splitting for Vector Database ingestion with zero additional parsing.
- **Token Count Included** — Every document ships with an `estimatedTokens` field (GPT-4 / `cl100k_base` encoding) so you can bin-pack context windows without loading the full text first.
- **Version Tracking** — Each document includes a `versionHash` (SHA-256 of the Markdown body). Run the Actor daily and your pipeline instantly knows if legislation has changed — without spending tokens re-reading unchanged text.

---

### 💰 Pricing

This Actor uses Apify's **Pay-per-Result** model. You are only charged for documents that pass the AI relevance filter and are written to the dataset.

| Volume | Cost |
|---|---|
| 1–1,000 documents | **$150.00 per 1,000** ($0.15 each) |
| Documents skipped by relevance filter | **Free** |
| Duplicate CELEX IDs across search terms | **Free** (deduplicated automatically) |

> You will never be charged for noise. If EUR-Lex returns a driving-licence directive that happens to use the word "AI" in a footnote, this Actor drops it silently.

---

### ⚙️ Configuration

| Parameter | Default | Description |
|---|---|---|
| `searchTerms` | *(7 AI law terms)* | Array of EUR-Lex queries run in sequence. Duplicate documents are saved only once across all terms. |
| `searchText` | `"artificial intelligence"` | Single-term fallback. Used only when `searchTerms` is empty. |
| `searchIn` | `Title and full text` | Scope of the search: title only, full text, or both. |
| `maxPages` | `20` | Number of EUR-Lex result pages to crawl **per search term**. |
| `maxDocuments` | `300` | Hard cap on **total unique documents** saved across all terms. Controls maximum spend. |
| `excludeCorrigenda` | `true` | Filters out correction notices so you only get primary legislation. |
| `startUrl` | *(blank)* | Advanced: override the EUR-Lex start URL entirely. |

---

### 🛠️ Output Schema

The JSON output is designed to be piped directly into your AI workflow:

```json
{
  "url": "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32026R0697",
  "celexId": "32026R0697",
  "title": "REGULATION (EU) 2026/697 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL",
  "documentType": "Regulation",
  "publicationDate": "20.3.2026",
  "estimatedTokens": 12462,
  "versionHash": "70e9c9a76ed4f6ae",
  "scrapedAt": "2026-05-17T10:22:31.000Z",
  "markdown": "REGULATION (EU) 2026/697...\n\n#### Article 1\n\nThis Regulation applies to...",
  "metadata": {
    "chunkHints": [
      { "type": "article", "title": "Article 1", "index": 17813 },
      { "type": "article", "title": "Article 2", "index": 19204 }
    ],
    "totalChunks": 43,
    "wordCount": 9821,
    "suggestedSplitStrategy": "Split at each Article boundary (chunkHints where type=\"article\"). Average article ≈ 300–600 tokens — well within text-embedding-3 context."
  }
}
````

#### Key Fields

| Field | Type | Description |
|---|---|---|
| `celexId` | `string` | EUR-Lex CELEX identifier — the canonical EU document ID. |
| `title` | `string` | Official document title extracted from page metadata. |
| `documentType` | `string` | `Regulation`, `Directive`, `Decision`, etc. — derived from CELEX ID. |
| `publicationDate` | `string` | Publication date as it appears in the Official Journal. |
| `markdown` | `string` | Full document body as clean Markdown. No HTML, no nav chrome. |
| `estimatedTokens` | `number` | Token count using `cl100k_base` (GPT-4 / text-embedding-3). |
| `versionHash` | `string` | First 16 hex chars of SHA-256(markdown). Changes if the law is amended. |
| `metadata.chunkHints` | `array` | Ordered list of Article/Chapter/Section split points with character indexes. |

***

### 🔁 Recommended Usage Pattern

1. **Initial sweep** — Run once with default settings to build your baseline dataset (~200 EUR-Lex pages, up to 300 AI-relevant documents).
2. **Daily monitoring** — Schedule a lighter run (`maxPages: 3`, `maxDocuments: 30`) to catch new publications.
3. **Change detection** — Compare `versionHash` against your stored value. If it differs, re-embed that document. If it matches, skip it.
4. **RAG ingestion** — Split each document at `chunkHints` boundaries and upsert into your vector store with `celexId` + `title` + `publicationDate` as metadata filters.

***

### 🏗️ Tech Stack

Built on **Crawlee 3 + Playwright** with fingerprint rotation, AWS WAF challenge handling, and stealth mode — engineered to reliably navigate EUR-Lex's bot-detection layers without brittle workarounds.

# Actor input Schema

## `searchTerms` (type: `array`):

One or more EUR-Lex search queries to run in sequence. Duplicate documents found across multiple terms are saved only once (deduplicated by CELEX ID). Leave blank to use the single 'Search Query' field below.

## `searchText` (type: `string`):

Used only when 'Search Terms' is empty. Supports EUR-Lex query syntax: double-quotes for phrases, OR/AND for boolean logic.

## `searchIn` (type: `string`):

Where within each document to apply the search query.

## `maxPages` (type: `integer`):

Number of EUR-Lex result pages to crawl per search term. Each page contains ~10 results. With 7 default terms at 20 pages each, the Actor can sweep ~1,400 result pages in one run.

## `maxDocuments` (type: `integer`):

Hard cap on total unique documents saved across all search terms combined. Duplicate CELEX IDs are never double-counted. Use to control maximum spend.

## `excludeCorrigenda` (type: `boolean`):

When true, filters out correction notices (Corrigenda) from search results.

## `startUrl` (type: `string`):

Advanced: override the EUR-Lex starting URL. Bypasses all search term settings. Leave blank for normal operation.

## Actor input object example

```json
{
  "searchTerms": [
    "\"artificial intelligence\"",
    "\"foundation model\""
  ],
  "searchText": "\"artificial intelligence\" OR \"machine learning\"",
  "searchIn": "ti-te",
  "maxPages": 20,
  "maxDocuments": 300,
  "excludeCorrigenda": true
}
```

# Actor output Schema

## `legislativeDocuments` (type: `string`):

Dataset of EU AI-relevant laws as clean Markdown with CELEX IDs, token counts, version hashes, and Article-level chunk hints for vector DB ingestion.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("aelix/eu-ai-act-regulation-monitor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("aelix/eu-ai-act-regulation-monitor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call aelix/eu-ai-act-regulation-monitor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=aelix/eu-ai-act-regulation-monitor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "EU AI Act & Regulation Monitor (RAG-Optimized)",
        "description": "Monitors EUR-Lex for EU AI-related legislation and delivers clean, structured Markdown/JSON enriched with CELEX IDs, version hashes, token counts, and vector-DB chunk hints. Ideal for RAG pipelines, legal AI assistants, and compliance dashboards.\n\nPremium RAG-Ready Feed: $150.00 per 1,000 results.",
        "version": "1.3",
        "x-build-id": "oO0V1Pagw9hyCh0e0"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/aelix~eu-ai-act-regulation-monitor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-aelix-eu-ai-act-regulation-monitor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/aelix~eu-ai-act-regulation-monitor/runs": {
            "post": {
                "operationId": "runs-sync-aelix-eu-ai-act-regulation-monitor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/aelix~eu-ai-act-regulation-monitor/run-sync": {
            "post": {
                "operationId": "run-sync-aelix-eu-ai-act-regulation-monitor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchTerms": {
                        "title": "Search Terms (multi-term sweep)",
                        "type": "array",
                        "description": "One or more EUR-Lex search queries to run in sequence. Duplicate documents found across multiple terms are saved only once (deduplicated by CELEX ID). Leave blank to use the single 'Search Query' field below.",
                        "default": [
                            "\"artificial intelligence\"",
                            "\"machine learning\"",
                            "\"automated decision\"",
                            "\"algorithmic\"",
                            "\"facial recognition\"",
                            "\"foundation model\"",
                            "\"high-risk system\""
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchText": {
                        "title": "Search Query (single term)",
                        "type": "string",
                        "description": "Used only when 'Search Terms' is empty. Supports EUR-Lex query syntax: double-quotes for phrases, OR/AND for boolean logic.",
                        "default": "\"artificial intelligence\""
                    },
                    "searchIn": {
                        "title": "Search In",
                        "enum": [
                            "ti",
                            "te",
                            "ti-te"
                        ],
                        "type": "string",
                        "description": "Where within each document to apply the search query.",
                        "default": "ti-te"
                    },
                    "maxPages": {
                        "title": "Max Result Pages (per search term)",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Number of EUR-Lex result pages to crawl per search term. Each page contains ~10 results. With 7 default terms at 20 pages each, the Actor can sweep ~1,400 result pages in one run.",
                        "default": 20
                    },
                    "maxDocuments": {
                        "title": "Max Documents (total across all terms)",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Hard cap on total unique documents saved across all search terms combined. Duplicate CELEX IDs are never double-counted. Use to control maximum spend.",
                        "default": 300
                    },
                    "excludeCorrigenda": {
                        "title": "Exclude Corrigenda",
                        "type": "boolean",
                        "description": "When true, filters out correction notices (Corrigenda) from search results.",
                        "default": true
                    },
                    "startUrl": {
                        "title": "Override Start URL",
                        "type": "string",
                        "description": "Advanced: override the EUR-Lex starting URL. Bypasses all search term settings. Leave blank for normal operation."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
