Under maintenance

Pricing

from $1.00 / 1,000 results

Try for free

Go to Apify Store

ANVISA Medicine Scraper

Under maintenance

Try for free

Extracts complete ANVISA medicine data (presentations, manufacturers, ATC). Uses Playwright to automatically bypass Cloudflare/Dynatrace WAFs.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

David Mendonça

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

ANVISA Medicine Scraper 💊

Extracts complete data on registered medicines from ANVISA (Brazil's National Health Surveillance Agency), including commercial presentations, domestic and international manufacturers, ATC classification, therapeutic class, and registration holder details.

Why this Actor?

ANVISA's consultation portal is protected by a WAF (Cloudflare + Dynatrace), which blocks direct HTTP calls to the API — any request without valid session cookies gets a 403 Forbidden.

This Actor solves the problem using Playwright (a real headless browser) that:

Resolves WAF challenges automatically — Cloudflare and Dynatrace are handled as part of normal browser navigation
Intercepts JSON responses from the internal API — more robust than CSS selector scraping, won't break if the UI changes
Returns complete, structured data — same depth of data available in each medicine's detail panel

Input

Field	Type	Required	Default	Description
`startDate`	`string`	No	7 days ago	Start date of the publication period (`DD/MM/YYYY`)
`endDate`	`string`	No	Today	End date of the publication period (`DD/MM/YYYY`)
`cnpj`	`string`	No	—	Registration holder's CNPJ (digits only). Filters by holder
`nomeProduto`	`string`	No	—	Product name text search (partial match supported)
`maxPages`	`integer`	No	`0`	Listing page limit (`0` = unlimited, each page = 10 products)
`maxRequestsPerCrawl`	`integer`	No	`1000`	Safety limit for HTTP requests per run

Input examples

{
    "startDate": "01/04/2025",
    "endDate": "30/04/2025",
    "maxPages": 1
}

Search for a specific registration holder:

{
    "startDate": "01/01/2025",
    "endDate": "30/06/2025",
    "cnpj": "00000000000100"
}

Search by product name:

{
    "nomeProduto": "paracetamol",
    "maxPages": 3
}

Output

Each dataset item is a Medicine object with the following structure:

{
    "anvisaRegistrationId": "100000001",
    "tradeName": "EXEMPLOMAX",
    "activeIngredient": "PARACETAMOL",
    "referenceMedicine": "TYLENOL",
    "atcCodes": ["N02BE01"],
    "therapeuticClasses": ["ANALGÉSICOS"],
    "regulatoryCategory": "Genérico",
    "registrationHolder": {
        "legalName": "PHARMA EXEMPLO LTDA.",
        "cnpj": "00000000000100",
        "authorizationNumber": "1000001"
    },
    "approvalDate": "2025-04-28",
    "expiryDate": "2035-04-28",
    "processNumber": "25351000000202500",
    "presentations": [
        {
            "registrationId": "1000000010010",
            "description": "500 MG COM CT BL AL PLAS INC X 20",
            "pharmaceuticalForms": ["COMPRIMIDO SIMPLES"],
            "routesOfAdministration": ["ORAL"],
            "destinations": ["Comercial"],
            "publicationDate": "2025-01-15",
            "validity": "54",
            "manufacturers": [
                {
                    "name": "FÁBRICA EXEMPLO S.A.",
                    "address": "RUA DAS INDÚSTRIAS, 123 - CIDADE/SP",
                    "country": "BRASIL",
                    "manufacturingStage": "FABRICAÇÃO DO PRODUTO TERMINADO",
                    "uniqueCode": "X000001"
                }
            ]
        }
    ]
}

Output fields

Field	Type	Description
`anvisaRegistrationId`	`string`	ANVISA registration number (up to 13 digits)
`tradeName`	`string`	Trade name (brand)
`activeIngredient`	`string`	Active pharmaceutical ingredient (API)
`referenceMedicine`	`string\|null`	Reference (innovator) medicine
`atcCodes`	`string[]`	ATC codes (WHO classification)
`therapeuticClasses`	`string[]`	ANVISA therapeutic classes
`regulatoryCategory`	`string`	Regulatory category (Generic, Similar, New, Biological)
`registrationHolder`	`object`	Registration holder company (`legalName`, `cnpj`, `authorizationNumber`)
`approvalDate`	`string`	Registration/approval date (`YYYY-MM-DD`)
`expiryDate`	`string`	Registration expiry date (`YYYY-MM-DD`)
`processNumber`	`string`	ANVISA administrative process number
`presentations`	`array`	Commercial presentations (dosage, packaging, manufacturers)

Each presentation contains:

Field	Type	Description
`registrationId`	`string`	Presentation registration code
`description`	`string`	Full description (dosage + form + packaging)
`pharmaceuticalForms`	`string[]`	Pharmaceutical forms
`routesOfAdministration`	`string[]`	Approved routes of administration
`destinations`	`string[]`	Commercial destination (Commercial, Hospital, etc.)
`publicationDate`	`string`	Official Gazette publication date (`YYYY-MM-DD`)
`validity`	`string`	Shelf life in months
`manufacturers`	`array`	Manufacturers (domestic and international, unified)

Each manufacturer contains:

Field	Type	Description
`name`	`string`	Legal name
`address`	`string`	Manufacturing plant address
`country`	`string`	Country (`BRASIL` for domestic)
`manufacturingStage`	`string`	Manufacturing process stage
`uniqueCode`	`string`	Unique code in the ANVISA system

How it works

┌─────────────────────────────────────────────────────────┐
│  1. Playwright navigates to the ANVISA SPA              │
│     → WAF (Cloudflare/Dynatrace) resolved               │
│     → Session cookies captured by the browser            │
├─────────────────────────────────────────────────────────┤
│  2. DEFAULT route: Paginated listing API                 │
│     → fetch() via browser context (with WAF cookies)     │
│     → Filters out NOTIFICADO (notified-only) products    │
│     → Enqueues DETAIL for each REGISTERED product        │
│     → Next page if available                             │
├─────────────────────────────────────────────────────────┤
│  3. DETAIL route: Per-product detail API                 │
│     → fetch() via browser context (with WAF cookies)     │
│     → Maps JSON to Medicine structure                    │
│     → Saves to Dataset via pushData()                    │
└─────────────────────────────────────────────────────────┘

The scraper does not rely on CSS selectors — it intercepts JSON responses from the internal API that ANVISA's own Angular SPA consumes. This makes extraction resilient to visual changes in the portal.

🔌 Integration & API

You can easily integrate this Actor into your own data pipelines, backend applications, or BI tools using the Apify API.

Starting the Actor via REST API

Trigger a run by sending a POST request to the Apify API, passing your parameters in the JSON body:

curl "https://api.apify.com/v2/acts/YOUR_USERNAME~anvisa-raw-material-scraper/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startDate": "01/04/2025",
    "endDate": "30/04/2025",
    "maxPages": 1
  }'

Note: Replace YOUR_USERNAME~anvisa-raw-material-scraper with your actual Actor ID and provide your Apify API Token.

Fetching the Results

Once the run finishes, download the extracted data (in JSON, CSV, or Excel format) directly from the run's dataset:

$curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=json"

For more details on integrating Apify Actors via Node.js, Python, or REST, refer to the official Apify API documentation.

Tech stack

Crawlee — Web scraping framework
Playwright — Browser automation
Apify SDK — Actor platform
TypeScript — Strict typing

Limitations and considerations

Rate limiting: The crawler runs with a max concurrency of 3 to be respectful to ANVISA's servers
WAF: In rare cases, the WAF may require manual CAPTCHA solving. The automatic retry (3 attempts) usually handles it
Proxy: In production on Apify, using a proxy is recommended to avoid IP-based blocks. The Actor attempts to configure a proxy automatically and works without one if no credits are available
Data volume: Each listing page contains 10 products. For long date ranges, the volume can be large — use maxPages to limit during testing.

License

ISC

ANVISA Brazil Medicines Registry Scraper

parseforge/anvisa-brazil-medicines-scraper

Search the ANVISA medicines registry by product name or active ingredient and pull product_name, active_ingredient, manufacturer, registration, expiry_date, category, and presentation. Handy for pharma market research, regulatory monitoring, and competitive intelligence across Brazil.

ParseForge

Sports Medicine Physician Email Scraper

contacts-api/sports-medicine-physician-email-scraper

Sports medicine physician email scraper to extract verified physician emails from clinics, hospitals, and medical directories 📧🏥 Perfect for healthcare outreach, recruitment, and targeted lead generation.

Lead Heaven

Internists Email Scraper

contacts-api/internists-email-scraper

Internists email scraper to extract verified internal medicine physician emails from hospitals, clinics, private practices, and healthcare directories 📧🩺 Perfect for healthcare outreach, recruitment, and internal medicine lead generation.

Lead Heaven

Cloudflare Bypass

pamberton/cloudflare-bypass

Bypass Cloudflare protected routes. Works for API endpoints too unlike the web crawler.

Pamberton

5.0

Physiatrist Email Scraper

contacts-api/physiatrist-email-scraper

Physiatrist email scraper to extract verified physical medicine and rehabilitation physician emails from hospitals, rehabilitation centers, clinics, and healthcare directories 📧🩺 Perfect for healthcare outreach, recruitment, and physical medicine lead generation.

Lead Heaven

European Medicines Agency Medicines Scraper

parseforge/ema-medicines-scraper

Export EU authorised medicines from the European Medicines Agency. Pull medicine name, INN, ATC code, authorisation holder, therapeutic indication, status, and authorisation date. Filter by status, medicine type (human/veterinary), and therapeutic area.

ParseForge

Romania Medicine Prices — Generic Equivalents

ponderable_hydrometer/romania-medicine-prices

Find cheaper generic equivalents of any Romanian medicine — same active substance (DCI), sorted by price, with producer, wholesale and retail prices. Official ANMDMR + CANAMED data.

Ponderable Hydrometer

Cloudflare Bypass Scraper Pro

xtech/cloudflare-scraper-pro

Cloudflare Scraper Pro: The ultimate solution for scraping Cloudflare-protected websites. Advanced browser automation with intelligent Turnstile & CAPTCHA bypass, automatic Cloudflare challenge resolution, and robust proxy rotation to extract data from the most heavily protected sites.

Xtech

1.0

Sound Medicine Academy Blog Scraper

yourapiservice/soundmedicineacademy-blog-scraper

Sound Medicine Academy Blog Scraper (soundmedicineacademy.com) lets you extract blog content in HTML, JSON, and plaintext. Get authors, create/update date, images, read time, RSS, titles, SEO titles, featured images & videos, and keywords easily for content analysis and aggregation.