Obsidian Mcp Actor avatar
Obsidian Mcp Actor
Under maintenance

Pricing

$2.00 / 1,000 results

Go to Apify Store
Obsidian Mcp Actor

Obsidian Mcp Actor

Under maintenance

A lightweight Obsidian MCP Actor built for fast, local note automation. It parses, indexes, and transforms vault content with zero bloat. Perfect for workflows that need speed, clean structure, and reliable processing across markdown files, tags, metadata, and linked notes.

Pricing

$2.00 / 1,000 results

Rating

0.0

(0)

Developer

Antony mwangi

Antony mwangi

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

a day ago

Last modified

Share

A secure, high-performance Apify Actor that bridges Obsidian note-taking workflows with intelligent web scraping automation via the Model Context Protocol (MCP). Automatically enrich your knowledge base with structured data from any website.

CI License: MIT Release Apify Actor

๐ŸŽฏ What's New in v2.0

๐Ÿ›ก๏ธ Security Hardening

  • Path traversal protection: All file operations validated against vault directory
  • Content size limits: Prevents OOM with 10MB maximum content size
  • Input validation: Sanitized URLs and filenames with clear error categories
  • Robots.txt compliance: Automatic respect for robots.txt rules

โšก Performance & Scalability

  • Parallel processing: Concurrent image downloads (3-10x faster)
  • Intelligent caching: Three cache strategies (memory, disk, Apify KV)
  • Exponential backoff: Smart retry logic with jitter for rate-limited sites
  • Stealth mode: Enhanced Playwright evasion for anti-bot protection

๐Ÿ—๏ธ Modern Architecture

  • Service-oriented design: Modular, testable, maintainable
  • Strategy pattern: Pluggable scraping engines (Cheerio โ†’ Playwright fallback)
  • Real-time monitoring: Live WebSocket progress viewer
  • TypeScript support: Full type definitions for core interfaces

๐Ÿ”ฅ Key Features

FeatureDescription
๐Ÿค– Dual Scraping EnginesCheerio for speed, Playwright for JavaScript-heavy sites
๐Ÿ’พ Persistent CachingAvoid re-scraping with disk-backed cache between runs
๐Ÿท๏ธ Intelligent TaggingExtract tags from content, metadata, JSON-LD, and domains
๐Ÿ”— Auto Internal LinkingAutomatically link related notes by shared tags
๐Ÿ“ธ Image HandlingDownload and reference images with parallel processing
๐Ÿ“ Template SupportConfigure scraping via Obsidian template files
๐Ÿ“Š Live ProgressWebSocket viewer shows real-time scraping status
๐Ÿ” Security FirstPath traversal protection, input validation, size limits
๐ŸŽฏ MCP IntegrationExpose 5 tools to Claude/LLMs for AI-driven workflows
๐Ÿ“ˆ Performance MetricsTrack cache hit rates, processing times, and throughput

๐Ÿš€ Quick Start

Single URL Scrape

{
"url": "https://example.com/article",
"vaultPath": "/Users/yourname/Documents/Obsidian",
"folderPath": "research/articles",
"tags": ["ai", "research"],
"autoTag": true,
"autoLink": true
}

Bulk Import with Caching

{
"urls": [
"https://site1.com/post",
"https://site2.com/guide",
"https://site3.com/tutorial"
],
"vaultPath": "/Users/yourname/Documents/Obsidian",
"bulkMode": true,
"usePlaywright": false,
"cache": "disk",
"rateLimitDelay": 2000
}

JavaScript-Heavy Site

{
"url": "https://react-app.example.com",
"vaultPath": "/Users/yourname/Documents/Obsidian",
"usePlaywright": true,
"enableStealth": true,
"playwrightTimeout": 45
}

๐Ÿ“‹ Configuration Reference

Core Settings

ParameterTypeRequiredDefaultDescription
urlstringNo*-Single URL to scrape (use urls for bulk)
urlsarrayNo*[]Array of URLs for bulk import
vaultPathstringYes-Absolute path to your Obsidian vault
folderPathstringNoscrapedSubfolder path within vault
noteNamestringNoAutoCustom note filename (auto-sanitized)

Processing Options

ParameterTypeDefaultDescription
addMetadatabooleantrueInclude YAML front-matter
tagsarray[]Manual tags to apply
autoTagbooleantrueEnable intelligent auto-tagging
autoLinkbooleantrueCreate internal links between notes
updateExistingbooleanfalseAllow overwriting existing notes
templatePathstring-Obsidian template for config

Performance & Reliability

ParameterTypeDefaultDescription
usePlaywrightbooleanfalseUse Chrome browser automation
playwrightTimeoutnumber30Page load timeout (seconds)
enableStealthbooleantrueApply anti-bot evasion
maxRetriesnumber3Retry attempts per URL
rateLimitDelaynumber2000Delay between requests (ms)
cachestringmemoryCache type: memory, disk, apify
downloadImagesbooleanfalseDownload images to vault
concurrencynumber3Parallel download workers

* Either url or urls must be provided


๐Ÿ“ Generated Note Format

---
title: "Understanding Machine Learning"
url: https://example.com/ml-guide
scraped: 2024-01-15T10:30:00.000Z
tags: ["machine-learning", "ai", "research", "technology", "example"]
description: "A comprehensive guide to ML fundamentals"
author: "Jane Smith"
---
# Understanding Machine Learning
> ๐Ÿ”— Source: [https://example.com/ml-guide](https://example.com/ml-guide)
> ๐Ÿ“… Scraped: January 15, 2024
---
## Article Content
Full content converted to Markdown...
---
## Metadata
- **Author:** Jane Smith
- **Description:** A comprehensive guide to ML fundamentals
- **Canonical:** https://example.com/ml-guide
- **Robots:** index,follow

๐ŸŽ“ Advanced Usage

Template-Based Configuration

Create templates/scraper-config.md in your vault:

---
folderPath: "research/ai-papers"
autoTag: true
autoLink: true
tags: ["ai", "paper"]
usePlaywright: false
cache: "disk"
---
# AI Paper Scraper Template
This template automatically applies settings when referenced.

Usage:

{
"url": "https://arxiv.org/abs/2401.12345",
"vaultPath": "/path/to/vault",
"templatePath": "templates/scraper-config"
}

Caching Strategies

// Memory cache (fast, ephemeral)
const cache = new MemoryCache({ maxSize: 100 });
// Disk cache (persistent across runs)
const cache = new PersistentCache({ cacheDir: './storage' });
// Apify KV store (cloud, for scheduled actors)
const cache = new PersistentScrapeCache('my-scrape-cache');

Real-Time Progress Viewer

Local Development:

npm install # Install dependencies
npm run dev # Start MCP server with live viewer

Apify Platform:

{
"startResultsServer": true,
"resultsServerPort": 8080
}

Then visit http://localhost:8080 in your browser.


๐Ÿค– MCP Server Integration

The Actor exposes 5 tools to Claude/LLMs:

# Install globally
npm install -g obsidian-mcp-actor
# Add to Claude config
{
"mcpServers": {
"obsidian": {
"command": "obsidian-mcp-actor",
"args": ["mcp-server"]
}
}
}

Available Tools:

  1. scrape_website - Scrape any URL
  2. extract_tags - Analyze content for tags
  3. validate_content - Check scrape quality
  4. convert_html_to_markdown - Transform content
  5. save_note - Save to Obsidian vault

AI Workflow Example:

"Claude, scrape the latest 5 articles from Hacker News, tag them by topic, and save to my trending folder with internal links."


๐Ÿ”ง Development Setup

# Clone repository
git clone https://github.com/yourusername/obsidian-mcp-actor.git
cd obsidian-mcp-actor
# Install dependencies
npm install
# Run TypeScript compilation
npm run build
# Run tests
npm test
# Start MCP server locally
npm run mcp-server

Project Structure

obsidian-mcp-actor/
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ main.js # Apify Actor entry point
โ”‚ โ”œโ”€โ”€ mcp-server.js # MCP server entry point
โ”‚ โ””โ”€โ”€ lib/
โ”‚ โ”œโ”€โ”€ processor/ # Core business logic
โ”‚ โ”‚ โ”œโ”€โ”€ UnifiedScraper.js
โ”‚ โ”‚ โ”œโ”€โ”€ MarkdownConverter.js
โ”‚ โ”‚ โ”œโ”€โ”€ TagExtractor.js
โ”‚ โ”‚ โ””โ”€โ”€ ActorService.js
โ”‚ โ”œโ”€โ”€ scraper/ # Scraping strategies
โ”‚ โ”‚ โ”œโ”€โ”€ CheerioStrategy.js
โ”‚ โ”‚ โ””โ”€โ”€ PlaywrightStrategy.js
โ”‚ โ”œโ”€โ”€ vault/ # Obsidian operations
โ”‚ โ”‚ โ”œโ”€โ”€ NoteManager.js
โ”‚ โ”‚ โ””โ”€โ”€ LinkManager.js
โ”‚ โ”œโ”€โ”€ cache/ # Caching implementations
โ”‚ โ”‚ โ”œโ”€โ”€ MemoryCache.js
โ”‚ โ”‚ โ”œโ”€โ”€ PersistentCache.js
โ”‚ โ”‚ โ””โ”€โ”€ PersistentScrapeCache.js
โ”‚ โ”œโ”€โ”€ utils/ # Utilities
โ”‚ โ”‚ โ”œโ”€โ”€ url.js
โ”‚ โ”‚ โ”œโ”€โ”€ errors.js
โ”‚ โ”‚ โ”œโ”€โ”€ retry.js
โ”‚ โ”‚ โ””โ”€โ”€ stealth.js
โ”‚ โ””โ”€โ”€ server/ # WebSocket server
โ”‚ โ””โ”€โ”€ ResultsServer.js
โ”œโ”€โ”€ test/ # Unit and integration tests
โ”œโ”€โ”€ input_schema.json # Apify input schema
โ”œโ”€โ”€ output_schema.json # Apify output schema
โ””โ”€โ”€ package.json

๐Ÿงช Testing

# Run all tests
npm test
# Run with coverage
npm run test:coverage
# Run specific test file
npm test test/UnifiedScraper.test.js

Test Coverage Goals:

  • Core scraping logic: >90%
  • Security validation: 100%
  • Vault operations: >85%

๐Ÿ“ฆ Deployment

Apify Platform

  1. Push to Apify:
$apify push
  1. Configure Environment Variables:
APIFY_MEMORY_MBYTES=4096
APIFY_BUILD_TIMEOUT_SECS=300
  1. Schedule Runs:
apify schedule create my-schedule \
--actor-id your-actor-id \
--cron "0 9 * * *" \
--input-json '{"urls": [...], "vaultPath": "/data"}'

Self-Hosted

# Docker
docker build -t obsidian-mcp-actor .
docker run -v /path/to/vault:/data -p 8080:8080 obsidian-mcp-actor

๐Ÿ”„ Migration from v1.x

Breaking Changes

For most users: No changes needed. The public API remains identical.

If you extended internals:

  • Legacy functions in helpers.js are deprecated but functional
  • Import from specific modules for new features:
    // Old (still works)
    import { scrapeWebsite } from './helpers.js';
    // New (recommended)
    import { UnifiedScraper } from './lib/processor/UnifiedScraper.js';
    const scraper = new UnifiedScraper({ usePlaywright: true });

New Cache API

// Old
const cache = new ScrapeCache();
// New
const cache = new MemoryCache({ maxSize: 100, ttl: 3600000 });

Updated File Structure

Move custom code from main.js to lib/processor/ActorService.js for modularity.


๐Ÿ“š Use Cases

Use CaseConfiguration
Research Paper CollectionusePlaywright: false, cache: "disk", folderPath: "papers/{year}"
News MonitoringbulkMode: true, rateLimitDelay: 5000, updateExisting: true
Competitive IntelligenceenableStealth: true, downloadImages: true, autoTag: true
Course MaterialstemplatePath: "templates/course", addMetadata: true, autoLink: true
AI-Powered CurationEnable MCP server, use Claude to orchestrate complex scraping tasks

๐Ÿ“Š Performance Benchmarks

Scenariov1.xv2.0Improvement
Single static page2.1s0.8s2.6x faster
Bulk 10 URLs45s18s2.5x faster
JS-heavy SPA15s12s1.25x faster
Image downloads (20)25s3s8.3x faster
Cache hit rate0%78%78% reuse

Benchmarks on M1 Mac, 10 concurrent workers


๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Guidelines

  • Write tests for new features
  • Follow existing code style (ESLint configured)
  • Update TypeScript types
  • Document public APIs with JSDoc
  • Security-first: validate all inputs

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ™ Acknowledgments


Made with โค๏ธ for researchers, knowledge workers, and automation enthusiasts

Transform your Obsidian vault into a self-updating knowledge base.