Obsidian Mcp Actor
Pricing
$2.00 / 1,000 results
Obsidian Mcp Actor
A lightweight Obsidian MCP Actor built for fast, local note automation. It parses, indexes, and transforms vault content with zero bloat. Perfect for workflows that need speed, clean structure, and reliable processing across markdown files, tags, metadata, and linked notes.
Pricing
$2.00 / 1,000 results
Rating
0.0
(0)
Developer

Antony mwangi
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
a day ago
Last modified
Categories
Share
A secure, high-performance Apify Actor that bridges Obsidian note-taking workflows with intelligent web scraping automation via the Model Context Protocol (MCP). Automatically enrich your knowledge base with structured data from any website.
๐ฏ What's New in v2.0
๐ก๏ธ Security Hardening
- Path traversal protection: All file operations validated against vault directory
- Content size limits: Prevents OOM with 10MB maximum content size
- Input validation: Sanitized URLs and filenames with clear error categories
- Robots.txt compliance: Automatic respect for
robots.txtrules
โก Performance & Scalability
- Parallel processing: Concurrent image downloads (3-10x faster)
- Intelligent caching: Three cache strategies (memory, disk, Apify KV)
- Exponential backoff: Smart retry logic with jitter for rate-limited sites
- Stealth mode: Enhanced Playwright evasion for anti-bot protection
๐๏ธ Modern Architecture
- Service-oriented design: Modular, testable, maintainable
- Strategy pattern: Pluggable scraping engines (Cheerio โ Playwright fallback)
- Real-time monitoring: Live WebSocket progress viewer
- TypeScript support: Full type definitions for core interfaces
๐ฅ Key Features
| Feature | Description |
|---|---|
| ๐ค Dual Scraping Engines | Cheerio for speed, Playwright for JavaScript-heavy sites |
| ๐พ Persistent Caching | Avoid re-scraping with disk-backed cache between runs |
| ๐ท๏ธ Intelligent Tagging | Extract tags from content, metadata, JSON-LD, and domains |
| ๐ Auto Internal Linking | Automatically link related notes by shared tags |
| ๐ธ Image Handling | Download and reference images with parallel processing |
| ๐ Template Support | Configure scraping via Obsidian template files |
| ๐ Live Progress | WebSocket viewer shows real-time scraping status |
| ๐ Security First | Path traversal protection, input validation, size limits |
| ๐ฏ MCP Integration | Expose 5 tools to Claude/LLMs for AI-driven workflows |
| ๐ Performance Metrics | Track cache hit rates, processing times, and throughput |
๐ Quick Start
Single URL Scrape
{"url": "https://example.com/article","vaultPath": "/Users/yourname/Documents/Obsidian","folderPath": "research/articles","tags": ["ai", "research"],"autoTag": true,"autoLink": true}
Bulk Import with Caching
{"urls": ["https://site1.com/post","https://site2.com/guide","https://site3.com/tutorial"],"vaultPath": "/Users/yourname/Documents/Obsidian","bulkMode": true,"usePlaywright": false,"cache": "disk","rateLimitDelay": 2000}
JavaScript-Heavy Site
{"url": "https://react-app.example.com","vaultPath": "/Users/yourname/Documents/Obsidian","usePlaywright": true,"enableStealth": true,"playwrightTimeout": 45}
๐ Configuration Reference
Core Settings
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | No* | - | Single URL to scrape (use urls for bulk) |
urls | array | No* | [] | Array of URLs for bulk import |
vaultPath | string | Yes | - | Absolute path to your Obsidian vault |
folderPath | string | No | scraped | Subfolder path within vault |
noteName | string | No | Auto | Custom note filename (auto-sanitized) |
Processing Options
| Parameter | Type | Default | Description |
|---|---|---|---|
addMetadata | boolean | true | Include YAML front-matter |
tags | array | [] | Manual tags to apply |
autoTag | boolean | true | Enable intelligent auto-tagging |
autoLink | boolean | true | Create internal links between notes |
updateExisting | boolean | false | Allow overwriting existing notes |
templatePath | string | - | Obsidian template for config |
Performance & Reliability
| Parameter | Type | Default | Description |
|---|---|---|---|
usePlaywright | boolean | false | Use Chrome browser automation |
playwrightTimeout | number | 30 | Page load timeout (seconds) |
enableStealth | boolean | true | Apply anti-bot evasion |
maxRetries | number | 3 | Retry attempts per URL |
rateLimitDelay | number | 2000 | Delay between requests (ms) |
cache | string | memory | Cache type: memory, disk, apify |
downloadImages | boolean | false | Download images to vault |
concurrency | number | 3 | Parallel download workers |
* Either url or urls must be provided
๐ Generated Note Format
---title: "Understanding Machine Learning"url: https://example.com/ml-guidescraped: 2024-01-15T10:30:00.000Ztags: ["machine-learning", "ai", "research", "technology", "example"]description: "A comprehensive guide to ML fundamentals"author: "Jane Smith"---# Understanding Machine Learning> ๐ Source: [https://example.com/ml-guide](https://example.com/ml-guide)> ๐ Scraped: January 15, 2024---## Article ContentFull content converted to Markdown...---## Metadata- **Author:** Jane Smith- **Description:** A comprehensive guide to ML fundamentals- **Canonical:** https://example.com/ml-guide- **Robots:** index,follow
๐ Advanced Usage
Template-Based Configuration
Create templates/scraper-config.md in your vault:
---folderPath: "research/ai-papers"autoTag: trueautoLink: truetags: ["ai", "paper"]usePlaywright: falsecache: "disk"---# AI Paper Scraper TemplateThis template automatically applies settings when referenced.
Usage:
{"url": "https://arxiv.org/abs/2401.12345","vaultPath": "/path/to/vault","templatePath": "templates/scraper-config"}
Caching Strategies
// Memory cache (fast, ephemeral)const cache = new MemoryCache({ maxSize: 100 });// Disk cache (persistent across runs)const cache = new PersistentCache({ cacheDir: './storage' });// Apify KV store (cloud, for scheduled actors)const cache = new PersistentScrapeCache('my-scrape-cache');
Real-Time Progress Viewer
Local Development:
npm install # Install dependenciesnpm run dev # Start MCP server with live viewer
Apify Platform:
{"startResultsServer": true,"resultsServerPort": 8080}
Then visit http://localhost:8080 in your browser.
๐ค MCP Server Integration
The Actor exposes 5 tools to Claude/LLMs:
# Install globallynpm install -g obsidian-mcp-actor# Add to Claude config{"mcpServers": {"obsidian": {"command": "obsidian-mcp-actor","args": ["mcp-server"]}}}
Available Tools:
scrape_website- Scrape any URLextract_tags- Analyze content for tagsvalidate_content- Check scrape qualityconvert_html_to_markdown- Transform contentsave_note- Save to Obsidian vault
AI Workflow Example:
"Claude, scrape the latest 5 articles from Hacker News, tag them by topic, and save to my
trendingfolder with internal links."
๐ง Development Setup
# Clone repositorygit clone https://github.com/yourusername/obsidian-mcp-actor.gitcd obsidian-mcp-actor# Install dependenciesnpm install# Run TypeScript compilationnpm run build# Run testsnpm test# Start MCP server locallynpm run mcp-server
Project Structure
obsidian-mcp-actor/โโโ src/โ โโโ main.js # Apify Actor entry pointโ โโโ mcp-server.js # MCP server entry pointโ โโโ lib/โ โโโ processor/ # Core business logicโ โ โโโ UnifiedScraper.jsโ โ โโโ MarkdownConverter.jsโ โ โโโ TagExtractor.jsโ โ โโโ ActorService.jsโ โโโ scraper/ # Scraping strategiesโ โ โโโ CheerioStrategy.jsโ โ โโโ PlaywrightStrategy.jsโ โโโ vault/ # Obsidian operationsโ โ โโโ NoteManager.jsโ โ โโโ LinkManager.jsโ โโโ cache/ # Caching implementationsโ โ โโโ MemoryCache.jsโ โ โโโ PersistentCache.jsโ โ โโโ PersistentScrapeCache.jsโ โโโ utils/ # Utilitiesโ โ โโโ url.jsโ โ โโโ errors.jsโ โ โโโ retry.jsโ โ โโโ stealth.jsโ โโโ server/ # WebSocket serverโ โโโ ResultsServer.jsโโโ test/ # Unit and integration testsโโโ input_schema.json # Apify input schemaโโโ output_schema.json # Apify output schemaโโโ package.json
๐งช Testing
# Run all testsnpm test# Run with coveragenpm run test:coverage# Run specific test filenpm test test/UnifiedScraper.test.js
Test Coverage Goals:
- Core scraping logic: >90%
- Security validation: 100%
- Vault operations: >85%
๐ฆ Deployment
Apify Platform
- Push to Apify:
$apify push
- Configure Environment Variables:
APIFY_MEMORY_MBYTES=4096APIFY_BUILD_TIMEOUT_SECS=300
- Schedule Runs:
apify schedule create my-schedule \--actor-id your-actor-id \--cron "0 9 * * *" \--input-json '{"urls": [...], "vaultPath": "/data"}'
Self-Hosted
# Dockerdocker build -t obsidian-mcp-actor .docker run -v /path/to/vault:/data -p 8080:8080 obsidian-mcp-actor
๐ Migration from v1.x
Breaking Changes
For most users: No changes needed. The public API remains identical.
If you extended internals:
- Legacy functions in
helpers.jsare deprecated but functional - Import from specific modules for new features:
// Old (still works)import { scrapeWebsite } from './helpers.js';// New (recommended)import { UnifiedScraper } from './lib/processor/UnifiedScraper.js';const scraper = new UnifiedScraper({ usePlaywright: true });
New Cache API
// Oldconst cache = new ScrapeCache();// Newconst cache = new MemoryCache({ maxSize: 100, ttl: 3600000 });
Updated File Structure
Move custom code from main.js to lib/processor/ActorService.js for modularity.
๐ Use Cases
| Use Case | Configuration |
|---|---|
| Research Paper Collection | usePlaywright: false, cache: "disk", folderPath: "papers/{year}" |
| News Monitoring | bulkMode: true, rateLimitDelay: 5000, updateExisting: true |
| Competitive Intelligence | enableStealth: true, downloadImages: true, autoTag: true |
| Course Materials | templatePath: "templates/course", addMetadata: true, autoLink: true |
| AI-Powered Curation | Enable MCP server, use Claude to orchestrate complex scraping tasks |
๐ Performance Benchmarks
| Scenario | v1.x | v2.0 | Improvement |
|---|---|---|---|
| Single static page | 2.1s | 0.8s | 2.6x faster |
| Bulk 10 URLs | 45s | 18s | 2.5x faster |
| JS-heavy SPA | 15s | 12s | 1.25x faster |
| Image downloads (20) | 25s | 3s | 8.3x faster |
| Cache hit rate | 0% | 78% | 78% reuse |
Benchmarks on M1 Mac, 10 concurrent workers
๐ค Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Development Guidelines
- Write tests for new features
- Follow existing code style (ESLint configured)
- Update TypeScript types
- Document public APIs with JSDoc
- Security-first: validate all inputs
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
- Built with Crawlee and Playwright
- Inspired by the Obsidian community's automation needs
- MCP protocol by Anthropic
Made with โค๏ธ for researchers, knowledge workers, and automation enthusiasts
Transform your Obsidian vault into a self-updating knowledge base.
