Bnm Amlcft Scraper
Pricing
Pay per usage
Bnm Amlcft Scraper
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Htet Aung Shine
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
9 days ago
Last modified
Categories
Share
Bank Negara Malaysia AML/CFT Compliance Scraper
An Apify actor that scrapes Bank Negara Malaysia's (BNM) Anti-Money Laundering and Counter Financing of Terrorism (AML/CFT) regulatory documents, downloads PDF files, and extracts compliance policies for fintech companies.
Features
- π Automated Web Scraping: Crawls BNM's AML/CFT pages to find all regulatory documents
- π PDF Download & Processing: Downloads and processes PDF documents automatically
- π Text Extraction: Extracts full text content from PDFs using
pdf-parse - π·οΈ Compliance Categorization: Automatically categorizes content into compliance areas:
- AML (Anti-Money Laundering)
- CFT (Counter Financing of Terrorism)
- KYC (Know Your Customer)
- CDD (Customer Due Diligence)
- STR (Suspicious Transaction Reporting)
- RBA (Risk-Based Approach)
- SANCTIONS (Sanctions Compliance)
- PEP (Politically Exposed Persons)
- RECORD_KEEPING
- TRAINING
- GOVERNANCE
- β‘ Importance Assessment: Rates compliance sections by importance (high/medium/low)
- π Regulatory Reference Extraction: Identifies regulatory references and citations
- π Comprehensive Reporting: Generates detailed scraping statistics and compliance summaries
Input Configuration
{"startUrls": [{ "url": "https://www.bnm.gov.my/amlcft" }],"maxPdfsToDownload": 0,"extractFullText": true,"followLinks": true,"maxCrawlDepth": 2,"pdfKeywords": []}
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrls | Array | BNM AML/CFT page | List of URLs to start scraping from |
maxPdfsToDownload | Integer | 0 (unlimited) | Maximum number of PDFs to download |
extractFullText | Boolean | true | Whether to extract full text from PDFs |
followLinks | Boolean | true | Whether to follow links to sub-pages |
maxCrawlDepth | Integer | 2 | Maximum depth of links to follow |
pdfKeywords | Array | [] | Filter PDFs by keywords in URL/link text |
Output
Dataset Output
Each PDF document is saved to the dataset with the following structure:
{id: string; // Unique document identifierfilename: string; // Original PDF filenamesourceUrl: string; // URL where PDF was downloaded fromfoundOnPage: string; // Page where the PDF link was foundlinkText: string; // Text of the download linktitle: string; // Document titlefileSize: number; // File size in bytesscrapedAt: string; // ISO timestamp of scrapinglastModified: string; // Last modified date from serverpageCount: number; // Number of pages in PDFfullText: string; // Extracted text contentcomplianceSections: [{title: string; // Section titlecontent: string; // Section contentcategory: string; // Compliance categoryimportance: string; // high/medium/lowreferences: string[]; // Regulatory references}];metadata: {author: string;creator: string;producer: string;creationDate: string;modificationDate: string;keywords: string;subject: string;};status: string; // success/partial/failederror?: string; // Error message if failed}
Key-Value Store Output
SCRAPING_STATS- Scraping statisticsFINAL_REPORT- Comprehensive final report with compliance summaryPDF_{id}- Raw PDF files (binary)OUTPUT- Actor output summary
Local Development
Prerequisites
- Node.js 18+
- npm or yarn
Setup
# Clone the repositorycd apify-actor-bnm-amlcft# Install dependenciesnpm install# Build TypeScriptnpm run build# Run locallynpm start
Running with Apify CLI
# Install Apify CLInpm install -g apify-cli# Login to Apifyapify login# Run the actor locallyapify run# Push to Apify platformapify push
Usage Example
Basic Usage
import Apify from 'apify';const run = await Apify.call('your-username/bnm-amlcft-scraper', {startUrls: [{ url: 'https://www.bnm.gov.my/amlcft' }],maxPdfsToDownload: 10,extractFullText: true,});console.log('Scraping results:', run.output);
Filtering by Keywords
const run = await Apify.call('your-username/bnm-amlcft-scraper', {pdfKeywords: ['guideline', 'circular', 'policy'],maxCrawlDepth: 3,});
Integration with Veris Platform
This actor is designed to work with the Veris AI Compliance Analysis platform:
- Schedule Regular Runs: Set up scheduled runs to check for new regulatory documents
- Webhook Integration: Configure webhooks to notify the platform when new documents are found
- API Access: Use the Apify API to fetch results programmatically
- Dataset Export: Export datasets in JSON/CSV format for analysis
Example Integration Code
import { ApifyClient } from 'apify-client';const client = new ApifyClient({token: 'YOUR_APIFY_TOKEN',});// Run the actorconst run = await client.actor('your-username/bnm-amlcft-scraper').call({maxPdfsToDownload: 50,});// Get resultsconst { items } = await client.dataset(run.defaultDatasetId).listItems();// Process compliance documentsfor (const doc of items) {await processComplianceDocument(doc);}
Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Apify Actor ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€β βββββββββββββββ βββββββββββββββ βββββββββββββββ ββ β main.ts βββββΆβ scraper.ts βββββΆβpdf-extractorβ ββ β (Entry) β β (Crawler) β β .ts β ββ βββββββββββββββ βββββββββββββββ βββββββββββββββ ββ β β β ββ βΌ βΌ βΌ ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ β types.ts β ββ β (Type Definitions) β ββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€β Outputs ββ βββββββββββββ βββββββββββββ βββββββββββββββββββββββββ ββ β Dataset β βKey-Value β β Final Report β ββ β (JSON) β β Store β β (Stats + Summary) β ββ βββββββββββββ βββββββββββββ βββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Compliance Categories
The actor automatically categorizes content into these compliance areas:
| Category | Description |
|---|---|
| AML | Anti-Money Laundering provisions |
| CFT | Counter Financing of Terrorism |
| KYC | Know Your Customer requirements |
| CDD | Customer Due Diligence |
| STR | Suspicious Transaction Reporting |
| RBA | Risk-Based Approach |
| SANCTIONS | Targeted Financial Sanctions |
| PEP | Politically Exposed Persons |
| RECORD_KEEPING | Record Retention Requirements |
| TRAINING | Staff Training Requirements |
| GOVERNANCE | Internal Controls & Governance |
License
MIT License - See LICENSE file for details.
Support
For issues or feature requests, please create an issue in the repository or contact the Veris team.