# ANVISA Medicine Scraper (`david_craft/anvisa-raw-material-scraper`) Actor

Extracts complete ANVISA medicine data (presentations, manufacturers, ATC). Uses Playwright to automatically bypass Cloudflare/Dynatrace WAFs.

- **URL**: https://apify.com/david\_craft/anvisa-raw-material-scraper.md
- **Developed by:** [David Mendonça](https://apify.com/david_craft) (community)
- **Categories:** Automation, Developer tools, Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## ANVISA Medicine Scraper 💊

Extracts complete data on registered medicines from [ANVISA](https://consultas.anvisa.gov.br/) (Brazil's National Health Surveillance Agency), including commercial presentations, domestic and international manufacturers, ATC classification, therapeutic class, and registration holder details.

### Why this Actor?

ANVISA's consultation portal is protected by a WAF (Cloudflare + Dynatrace), which blocks direct HTTP calls to the API — any request without valid session cookies gets a `403 Forbidden`.

This Actor solves the problem using **Playwright** (a real headless browser) that:

1. **Resolves WAF challenges automatically** — Cloudflare and Dynatrace are handled as part of normal browser navigation
2. **Intercepts JSON responses from the internal API** — more robust than CSS selector scraping, won't break if the UI changes
3. **Returns complete, structured data** — same depth of data available in each medicine's detail panel

### Input

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `startDate` | `string` | No | 7 days ago | Start date of the publication period (`DD/MM/YYYY`) |
| `endDate` | `string` | No | Today | End date of the publication period (`DD/MM/YYYY`) |
| `cnpj` | `string` | No | — | Registration holder's CNPJ (digits only). Filters by holder |
| `nomeProduto` | `string` | No | — | Product name text search (partial match supported) |
| `maxPages` | `integer` | No | `0` | Listing page limit (`0` = unlimited, each page = 10 products) |
| `maxRequestsPerCrawl` | `integer` | No | `1000` | Safety limit for HTTP requests per run |

#### Input examples

```json
{
    "startDate": "01/04/2025",
    "endDate": "30/04/2025",
    "maxPages": 1
}
````

Search for a specific registration holder:

```json
{
    "startDate": "01/01/2025",
    "endDate": "30/06/2025",
    "cnpj": "00000000000100"
}
```

Search by product name:

```json
{
    "nomeProduto": "paracetamol",
    "maxPages": 3
}
```

### Output

Each dataset item is a `Medicine` object with the following structure:

```json
{
    "anvisaRegistrationId": "100000001",
    "tradeName": "EXEMPLOMAX",
    "activeIngredient": "PARACETAMOL",
    "referenceMedicine": "TYLENOL",
    "atcCodes": ["N02BE01"],
    "therapeuticClasses": ["ANALGÉSICOS"],
    "regulatoryCategory": "Genérico",
    "registrationHolder": {
        "legalName": "PHARMA EXEMPLO LTDA.",
        "cnpj": "00000000000100",
        "authorizationNumber": "1000001"
    },
    "approvalDate": "2025-04-28",
    "expiryDate": "2035-04-28",
    "processNumber": "25351000000202500",
    "presentations": [
        {
            "registrationId": "1000000010010",
            "description": "500 MG COM CT BL AL PLAS INC X 20",
            "pharmaceuticalForms": ["COMPRIMIDO SIMPLES"],
            "routesOfAdministration": ["ORAL"],
            "destinations": ["Comercial"],
            "publicationDate": "2025-01-15",
            "validity": "54",
            "manufacturers": [
                {
                    "name": "FÁBRICA EXEMPLO S.A.",
                    "address": "RUA DAS INDÚSTRIAS, 123 - CIDADE/SP",
                    "country": "BRASIL",
                    "manufacturingStage": "FABRICAÇÃO DO PRODUTO TERMINADO",
                    "uniqueCode": "X000001"
                }
            ]
        }
    ]
}
```

#### Output fields

| Field | Type | Description |
|---|---|---|
| `anvisaRegistrationId` | `string` | ANVISA registration number (up to 13 digits) |
| `tradeName` | `string` | Trade name (brand) |
| `activeIngredient` | `string` | Active pharmaceutical ingredient (API) |
| `referenceMedicine` | `string\|null` | Reference (innovator) medicine |
| `atcCodes` | `string[]` | ATC codes (WHO classification) |
| `therapeuticClasses` | `string[]` | ANVISA therapeutic classes |
| `regulatoryCategory` | `string` | Regulatory category (Generic, Similar, New, Biological) |
| `registrationHolder` | `object` | Registration holder company (`legalName`, `cnpj`, `authorizationNumber`) |
| `approvalDate` | `string` | Registration/approval date (`YYYY-MM-DD`) |
| `expiryDate` | `string` | Registration expiry date (`YYYY-MM-DD`) |
| `processNumber` | `string` | ANVISA administrative process number |
| `presentations` | `array` | Commercial presentations (dosage, packaging, manufacturers) |

Each **presentation** contains:

| Field | Type | Description |
|---|---|---|
| `registrationId` | `string` | Presentation registration code |
| `description` | `string` | Full description (dosage + form + packaging) |
| `pharmaceuticalForms` | `string[]` | Pharmaceutical forms |
| `routesOfAdministration` | `string[]` | Approved routes of administration |
| `destinations` | `string[]` | Commercial destination (Commercial, Hospital, etc.) |
| `publicationDate` | `string` | Official Gazette publication date (`YYYY-MM-DD`) |
| `validity` | `string` | Shelf life in months |
| `manufacturers` | `array` | Manufacturers (domestic and international, unified) |

Each **manufacturer** contains:

| Field | Type | Description |
|---|---|---|
| `name` | `string` | Legal name |
| `address` | `string` | Manufacturing plant address |
| `country` | `string` | Country (`BRASIL` for domestic) |
| `manufacturingStage` | `string` | Manufacturing process stage |
| `uniqueCode` | `string` | Unique code in the ANVISA system |

### How it works

```
┌─────────────────────────────────────────────────────────┐
│  1. Playwright navigates to the ANVISA SPA              │
│     → WAF (Cloudflare/Dynatrace) resolved               │
│     → Session cookies captured by the browser            │
├─────────────────────────────────────────────────────────┤
│  2. DEFAULT route: Paginated listing API                 │
│     → fetch() via browser context (with WAF cookies)     │
│     → Filters out NOTIFICADO (notified-only) products    │
│     → Enqueues DETAIL for each REGISTERED product        │
│     → Next page if available                             │
├─────────────────────────────────────────────────────────┤
│  3. DETAIL route: Per-product detail API                 │
│     → fetch() via browser context (with WAF cookies)     │
│     → Maps JSON to Medicine structure                    │
│     → Saves to Dataset via pushData()                    │
└─────────────────────────────────────────────────────────┘
```

The scraper **does not rely on CSS selectors** — it intercepts JSON responses from the internal API that ANVISA's own Angular SPA consumes. This makes extraction resilient to visual changes in the portal.

### 🔌 Integration & API

You can easily integrate this Actor into your own data pipelines, backend applications, or BI tools using the Apify API.

#### Starting the Actor via REST API

Trigger a run by sending a `POST` request to the Apify API, passing your parameters in the JSON body:

```bash
curl "https://api.apify.com/v2/acts/YOUR_USERNAME~anvisa-raw-material-scraper/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startDate": "01/04/2025",
    "endDate": "30/04/2025",
    "maxPages": 1
  }'
```

> **Note:** Replace `YOUR_USERNAME~anvisa-raw-material-scraper` with your actual Actor ID and provide your [Apify API Token](https://console.apify.com/account/integrations).

#### Fetching the Results

Once the run finishes, download the extracted data (in JSON, CSV, or Excel format) directly from the run's dataset:

```bash
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=json"
```

For more details on integrating Apify Actors via Node.js, Python, or REST, refer to the [official Apify API documentation](https://docs.apify.com/api/v2).

### Tech stack

- [Crawlee](https://crawlee.dev) — Web scraping framework
- [Playwright](https://playwright.dev) — Browser automation
- [Apify SDK](https://docs.apify.com/sdk/js) — Actor platform
- TypeScript — Strict typing

### Limitations and considerations

- **Rate limiting**: The crawler runs with a max concurrency of 3 to be respectful to ANVISA's servers
- **WAF**: In rare cases, the WAF may require manual CAPTCHA solving. The automatic retry (3 attempts) usually handles it
- **Proxy**: In production on Apify, using a proxy is recommended to avoid IP-based blocks. The Actor attempts to configure a proxy automatically and works without one if no credits are available
- **Data volume**: Each listing page contains 10 products. For long date ranges, the volume can be large — use `maxPages` to limit during testing.

### License

ISC

# Actor input Schema

## `startDate` (type: `string`):

Data inicial do período de publicação no formato DD/MM/YYYY. Se não informada, assume 7 dias atrás.

## `endDate` (type: `string`):

Data final do período de publicação no formato DD/MM/YYYY. Se não informada, assume a data atual.

## `cnpj` (type: `string`):

CNPJ da empresa detentora do registro (apenas dígitos alfanuméricos). Filtra medicamentos por detentor. Deixe em branco para buscar todos.

## `nomeProduto` (type: `string`):

Nome do produto para busca textual (ex: 'paracetamol'). Aceita busca parcial. Deixe em branco para buscar todos.

## `maxPages` (type: `integer`):

Número máximo de páginas de listagem a processar (0 = ilimitado). Cada página contém 10 produtos.

## `maxRequestsPerCrawl` (type: `integer`):

Safety limit: número máximo de requests HTTP que o crawler pode fazer em uma execução.

## Actor input object example

```json
{
  "startDate": "01/04/2025",
  "endDate": "30/04/2025",
  "maxPages": 0,
  "maxRequestsPerCrawl": 1000
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startDate": "01/04/2025",
    "endDate": "30/04/2025"
};

// Run the Actor and wait for it to finish
const run = await client.actor("david_craft/anvisa-raw-material-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startDate": "01/04/2025",
    "endDate": "30/04/2025",
}

# Run the Actor and wait for it to finish
run = client.actor("david_craft/anvisa-raw-material-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startDate": "01/04/2025",
  "endDate": "30/04/2025"
}' |
apify call david_craft/anvisa-raw-material-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=david_craft/anvisa-raw-material-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "ANVISA Medicine Scraper",
        "description": "Extracts complete ANVISA medicine data (presentations, manufacturers, ATC). Uses Playwright to automatically bypass Cloudflare/Dynatrace WAFs.",
        "version": "0.1",
        "x-build-id": "fPMA2V76UBs61Aynl"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/david_craft~anvisa-raw-material-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-david_craft-anvisa-raw-material-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/david_craft~anvisa-raw-material-scraper/runs": {
            "post": {
                "operationId": "runs-sync-david_craft-anvisa-raw-material-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/david_craft~anvisa-raw-material-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-david_craft-anvisa-raw-material-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startDate": {
                        "title": "Data Inicial",
                        "type": "string",
                        "description": "Data inicial do período de publicação no formato DD/MM/YYYY. Se não informada, assume 7 dias atrás."
                    },
                    "endDate": {
                        "title": "Data Final",
                        "type": "string",
                        "description": "Data final do período de publicação no formato DD/MM/YYYY. Se não informada, assume a data atual."
                    },
                    "cnpj": {
                        "title": "CNPJ do Detentor",
                        "type": "string",
                        "description": "CNPJ da empresa detentora do registro (apenas dígitos alfanuméricos). Filtra medicamentos por detentor. Deixe em branco para buscar todos."
                    },
                    "nomeProduto": {
                        "title": "Nome do Produto",
                        "type": "string",
                        "description": "Nome do produto para busca textual (ex: 'paracetamol'). Aceita busca parcial. Deixe em branco para buscar todos."
                    },
                    "maxPages": {
                        "title": "Limite de Páginas",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Número máximo de páginas de listagem a processar (0 = ilimitado). Cada página contém 10 produtos.",
                        "default": 0
                    },
                    "maxRequestsPerCrawl": {
                        "title": "Max Requests por Crawl",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Safety limit: número máximo de requests HTTP que o crawler pode fazer em uma execução.",
                        "default": 1000
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
