ANVISA Medicine Scraper
Pricing
from $1.00 / 1,000 results
ANVISA Medicine Scraper
Extracts complete ANVISA medicine data (presentations, manufacturers, ATC). Uses Playwright to automatically bypass Cloudflare/Dynatrace WAFs.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
David Mendonça
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
18 minutes ago
Last modified
Categories
Share
ANVISA Medicine Scraper 💊
Extracts complete data on registered medicines from ANVISA (Brazil's National Health Surveillance Agency), including commercial presentations, domestic and international manufacturers, ATC classification, therapeutic class, and registration holder details.
Why this Actor?
ANVISA's consultation portal is protected by a WAF (Cloudflare + Dynatrace), which blocks direct HTTP calls to the API — any request without valid session cookies gets a 403 Forbidden.
This Actor solves the problem using Playwright (a real headless browser) that:
- Resolves WAF challenges automatically — Cloudflare and Dynatrace are handled as part of normal browser navigation
- Intercepts JSON responses from the internal API — more robust than CSS selector scraping, won't break if the UI changes
- Returns complete, structured data — same depth of data available in each medicine's detail panel
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
startDate | string | No | 7 days ago | Start date of the publication period (DD/MM/YYYY) |
endDate | string | No | Today | End date of the publication period (DD/MM/YYYY) |
cnpj | string | No | — | Registration holder's CNPJ (digits only). Filters by holder |
nomeProduto | string | No | — | Product name text search (partial match supported) |
maxPages | integer | No | 0 | Listing page limit (0 = unlimited, each page = 10 products) |
maxRequestsPerCrawl | integer | No | 1000 | Safety limit for HTTP requests per run |
Input examples
{"startDate": "01/04/2025","endDate": "30/04/2025","maxPages": 1}
Search for a specific registration holder:
{"startDate": "01/01/2025","endDate": "30/06/2025","cnpj": "00000000000100"}
Search by product name:
{"nomeProduto": "paracetamol","maxPages": 3}
Output
Each dataset item is a Medicine object with the following structure:
{"anvisaRegistrationId": "100000001","tradeName": "EXEMPLOMAX","activeIngredient": "PARACETAMOL","referenceMedicine": "TYLENOL","atcCodes": ["N02BE01"],"therapeuticClasses": ["ANALGÉSICOS"],"regulatoryCategory": "Genérico","registrationHolder": {"legalName": "PHARMA EXEMPLO LTDA.","cnpj": "00000000000100","authorizationNumber": "1000001"},"approvalDate": "2025-04-28","expiryDate": "2035-04-28","processNumber": "25351000000202500","presentations": [{"registrationId": "1000000010010","description": "500 MG COM CT BL AL PLAS INC X 20","pharmaceuticalForms": ["COMPRIMIDO SIMPLES"],"routesOfAdministration": ["ORAL"],"destinations": ["Comercial"],"publicationDate": "2025-01-15","validity": "54","manufacturers": [{"name": "FÁBRICA EXEMPLO S.A.","address": "RUA DAS INDÚSTRIAS, 123 - CIDADE/SP","country": "BRASIL","manufacturingStage": "FABRICAÇÃO DO PRODUTO TERMINADO","uniqueCode": "X000001"}]}]}
Output fields
| Field | Type | Description |
|---|---|---|
anvisaRegistrationId | string | ANVISA registration number (up to 13 digits) |
tradeName | string | Trade name (brand) |
activeIngredient | string | Active pharmaceutical ingredient (API) |
referenceMedicine | string|null | Reference (innovator) medicine |
atcCodes | string[] | ATC codes (WHO classification) |
therapeuticClasses | string[] | ANVISA therapeutic classes |
regulatoryCategory | string | Regulatory category (Generic, Similar, New, Biological) |
registrationHolder | object | Registration holder company (legalName, cnpj, authorizationNumber) |
approvalDate | string | Registration/approval date (YYYY-MM-DD) |
expiryDate | string | Registration expiry date (YYYY-MM-DD) |
processNumber | string | ANVISA administrative process number |
presentations | array | Commercial presentations (dosage, packaging, manufacturers) |
Each presentation contains:
| Field | Type | Description |
|---|---|---|
registrationId | string | Presentation registration code |
description | string | Full description (dosage + form + packaging) |
pharmaceuticalForms | string[] | Pharmaceutical forms |
routesOfAdministration | string[] | Approved routes of administration |
destinations | string[] | Commercial destination (Commercial, Hospital, etc.) |
publicationDate | string | Official Gazette publication date (YYYY-MM-DD) |
validity | string | Shelf life in months |
manufacturers | array | Manufacturers (domestic and international, unified) |
Each manufacturer contains:
| Field | Type | Description |
|---|---|---|
name | string | Legal name |
address | string | Manufacturing plant address |
country | string | Country (BRASIL for domestic) |
manufacturingStage | string | Manufacturing process stage |
uniqueCode | string | Unique code in the ANVISA system |
How it works
┌─────────────────────────────────────────────────────────┐│ 1. Playwright navigates to the ANVISA SPA ││ → WAF (Cloudflare/Dynatrace) resolved ││ → Session cookies captured by the browser │├─────────────────────────────────────────────────────────┤│ 2. DEFAULT route: Paginated listing API ││ → fetch() via browser context (with WAF cookies) ││ → Filters out NOTIFICADO (notified-only) products ││ → Enqueues DETAIL for each REGISTERED product ││ → Next page if available │├─────────────────────────────────────────────────────────┤│ 3. DETAIL route: Per-product detail API ││ → fetch() via browser context (with WAF cookies) ││ → Maps JSON to Medicine structure ││ → Saves to Dataset via pushData() │└─────────────────────────────────────────────────────────┘
The scraper does not rely on CSS selectors — it intercepts JSON responses from the internal API that ANVISA's own Angular SPA consumes. This makes extraction resilient to visual changes in the portal.
🔌 Integration & API
You can easily integrate this Actor into your own data pipelines, backend applications, or BI tools using the Apify API.
Starting the Actor via REST API
Trigger a run by sending a POST request to the Apify API, passing your parameters in the JSON body:
curl "https://api.apify.com/v2/acts/YOUR_USERNAME~anvisa-raw-material-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"startDate": "01/04/2025","endDate": "30/04/2025","maxPages": 1}'
Note: Replace
YOUR_USERNAME~anvisa-raw-material-scraperwith your actual Actor ID and provide your Apify API Token.
Fetching the Results
Once the run finishes, download the extracted data (in JSON, CSV, or Excel format) directly from the run's dataset:
$curl "https://api.apify.com/v2/datasets/DATASET_ID/items?format=json"
For more details on integrating Apify Actors via Node.js, Python, or REST, refer to the official Apify API documentation.
Tech stack
- Crawlee — Web scraping framework
- Playwright — Browser automation
- Apify SDK — Actor platform
- TypeScript — Strict typing
Limitations and considerations
- Rate limiting: The crawler runs with a max concurrency of 3 to be respectful to ANVISA's servers
- WAF: In rare cases, the WAF may require manual CAPTCHA solving. The automatic retry (3 attempts) usually handles it
- Proxy: In production on Apify, using a proxy is recommended to avoid IP-based blocks. The Actor attempts to configure a proxy automatically and works without one if no credits are available
- Data volume: Each listing page contains 10 products. For long date ranges, the volume can be large — use
maxPagesto limit during testing.
License
ISC