ANVISA Medicine Scraper avatar

ANVISA Medicine Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
ANVISA Medicine Scraper

ANVISA Medicine Scraper

Extracts complete ANVISA medicine data (presentations, manufacturers, ATC). Uses Playwright to automatically bypass Cloudflare/Dynatrace WAFs.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

David Mendonça

David Mendonça

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 minutes ago

Last modified

Share

ANVISA Medicine Scraper 💊

Extracts complete data on registered medicines from ANVISA (Brazil's National Health Surveillance Agency), including commercial presentations, domestic and international manufacturers, ATC classification, therapeutic class, and registration holder details.

Why this Actor?

ANVISA's consultation portal is protected by a WAF (Cloudflare + Dynatrace), which blocks direct HTTP calls to the API — any request without valid session cookies gets a 403 Forbidden.

This Actor solves the problem using Playwright (a real headless browser) that:

  1. Resolves WAF challenges automatically — Cloudflare and Dynatrace are handled as part of normal browser navigation
  2. Intercepts JSON responses from the internal API — more robust than CSS selector scraping, won't break if the UI changes
  3. Returns complete, structured data — same depth of data available in each medicine's detail panel

Input

FieldTypeRequiredDefaultDescription
startDatestringNo7 days agoStart date of the publication period (DD/MM/YYYY)
endDatestringNoTodayEnd date of the publication period (DD/MM/YYYY)
cnpjstringNoRegistration holder's CNPJ (digits only). Filters by holder
nomeProdutostringNoProduct name text search (partial match supported)
maxPagesintegerNo0Listing page limit (0 = unlimited, each page = 10 products)
maxRequestsPerCrawlintegerNo1000Safety limit for HTTP requests per run

Input examples

{
"startDate": "01/04/2025",
"endDate": "30/04/2025",
"maxPages": 1
}

Search for a specific registration holder:

{
"startDate": "01/01/2025",
"endDate": "30/06/2025",
"cnpj": "00000000000100"
}

Search by product name:

{
"nomeProduto": "paracetamol",
"maxPages": 3
}

Output

Each dataset item is a Medicine object with the following structure:

{
"anvisaRegistrationId": "100000001",
"tradeName": "EXEMPLOMAX",
"activeIngredient": "PARACETAMOL",
"referenceMedicine": "TYLENOL",
"atcCodes": ["N02BE01"],
"therapeuticClasses": ["ANALGÉSICOS"],
"regulatoryCategory": "Genérico",
"registrationHolder": {
"legalName": "PHARMA EXEMPLO LTDA.",
"cnpj": "00000000000100",
"authorizationNumber": "1000001"
},
"approvalDate": "2025-04-28",
"expiryDate": "2035-04-28",
"processNumber": "25351000000202500",
"presentations": [
{
"registrationId": "1000000010010",
"description": "500 MG COM CT BL AL PLAS INC X 20",
"pharmaceuticalForms": ["COMPRIMIDO SIMPLES"],
"routesOfAdministration": ["ORAL"],
"destinations": ["Comercial"],
"publicationDate": "2025-01-15",
"validity": "54",
"manufacturers": [
{
"name": "FÁBRICA EXEMPLO S.A.",
"address": "RUA DAS INDÚSTRIAS, 123 - CIDADE/SP",
"country": "BRASIL",
"manufacturingStage": "FABRICAÇÃO DO PRODUTO TERMINADO",
"uniqueCode": "X000001"
}
]
}
]
}

Output fields

FieldTypeDescription
anvisaRegistrationIdstringANVISA registration number (up to 13 digits)
tradeNamestringTrade name (brand)
activeIngredientstringActive pharmaceutical ingredient (API)
referenceMedicinestring|nullReference (innovator) medicine
atcCodesstring[]ATC codes (WHO classification)
therapeuticClassesstring[]ANVISA therapeutic classes
regulatoryCategorystringRegulatory category (Generic, Similar, New, Biological)
registrationHolderobjectRegistration holder company (legalName, cnpj, authorizationNumber)
approvalDatestringRegistration/approval date (YYYY-MM-DD)
expiryDatestringRegistration expiry date (YYYY-MM-DD)
processNumberstringANVISA administrative process number
presentationsarrayCommercial presentations (dosage, packaging, manufacturers)

Each presentation contains:

FieldTypeDescription
registrationIdstringPresentation registration code
descriptionstringFull description (dosage + form + packaging)
pharmaceuticalFormsstring[]Pharmaceutical forms
routesOfAdministrationstring[]Approved routes of administration
destinationsstring[]Commercial destination (Commercial, Hospital, etc.)
publicationDatestringOfficial Gazette publication date (YYYY-MM-DD)
validitystringShelf life in months
manufacturersarrayManufacturers (domestic and international, unified)

Each manufacturer contains:

FieldTypeDescription
namestringLegal name
addressstringManufacturing plant address
countrystringCountry (BRASIL for domestic)
manufacturingStagestringManufacturing process stage
uniqueCodestringUnique code in the ANVISA system

How it works

┌─────────────────────────────────────────────────────────┐
1. Playwright navigates to the ANVISA SPA
│ → WAF (Cloudflare/Dynatrace) resolved │
│ → Session cookies captured by the browser │
├─────────────────────────────────────────────────────────┤
2. DEFAULT route: Paginated listing API
│ → fetch() via browser context (with WAF cookies)
│ → Filters out NOTIFICADO (notified-only) products │
│ → Enqueues DETAIL for each REGISTERED product │
│ → Next page if available │
├─────────────────────────────────────────────────────────┤
3. DETAIL route: Per-product detail API
│ → fetch() via browser context (with WAF cookies)
│ → Maps JSON to Medicine structure │
│ → Saves to Dataset via pushData()
└─────────────────────────────────────────────────────────┘

The scraper does not rely on CSS selectors — it intercepts JSON responses from the internal API that ANVISA's own Angular SPA consumes. This makes extraction resilient to visual changes in the portal.

🔌 Integration & API

You can easily integrate this Actor into your own data pipelines, backend applications, or BI tools using the Apify API.

Starting the Actor via REST API

Trigger a run by sending a POST request to the Apify API, passing your parameters in the JSON body:

curl "https://api.apify.com/v2/acts/YOUR_USERNAME~anvisa-raw-material-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"startDate": "01/04/2025",
"endDate": "30/04/2025",
"maxPages": 1
}'

Note: Replace YOUR_USERNAME~anvisa-raw-material-scraper with your actual Actor ID and provide your Apify API Token.

Fetching the Results

Once the run finishes, download the extracted data (in JSON, CSV, or Excel format) directly from the run's dataset:

$curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=json"

For more details on integrating Apify Actors via Node.js, Python, or REST, refer to the official Apify API documentation.

Tech stack

Limitations and considerations

  • Rate limiting: The crawler runs with a max concurrency of 3 to be respectful to ANVISA's servers
  • WAF: In rare cases, the WAF may require manual CAPTCHA solving. The automatic retry (3 attempts) usually handles it
  • Proxy: In production on Apify, using a proxy is recommended to avoid IP-based blocks. The Actor attempts to configure a proxy automatically and works without one if no credits are available
  • Data volume: Each listing page contains 10 products. For long date ranges, the volume can be large — use maxPages to limit during testing.

License

ISC