
Bluesky & Mastodon Scraper - Decentralized Social Media
Pricing
Pay per event

Bluesky & Mastodon Scraper - Decentralized Social Media
Extract and monitor posts from Bluesky (AT Protocol) and Mastodon (Fediverse). The most comprehensive social media scraper for decentralized networks - perfect for social listening, brand monitoring, market research, sentiment analysis, and AI training data collection.
0.0 (0)
Pricing
Pay per event
0
2
2
Last modified
2 days ago
Bluesky & Mastodon Scraper API - Decentralized Social Media Data Aggregator
Extract and monitor posts from Bluesky (AT Protocol) and Mastodon (Fediverse) with a unified, normalized JSON API. The most comprehensive social media scraper for decentralized networks - perfect for social listening, brand monitoring, market research, sentiment analysis, and AI training data collection.
๐ Search by keywords โข ๐ฅ Track specific users โข ๐ Unified data format โข ๐ช Real-time webhooks โข ๐ฐ Pay-per-post pricing
๐ Features
- Multi-Platform Support: Scrape Bluesky and Mastodon simultaneously
- Keyword Search: Find posts mentioning specific terms or phrases
- Handle Tracking: Monitor specific users across platforms
- Date Range Filtering: Historical and real-time post collection
- Unified Schema: Normalized output format across all platforms
- Intelligent Deduplication: Automatic duplicate detection and removal
- Real-Time Webhooks: Send posts to your endpoints as they're discovered
- Language Filtering: Filter posts by language (BCP-47 codes)
- Pay-Per-Event Pricing: Only pay for posts collected, not compute time
- No Authentication Required: Works with public data (optional auth for higher limits)
๐ Supported Platforms
Bluesky (AT Protocol)
- Full keyword search via searchActors workaround
- User feed tracking
- Quote posts, replies, reposts, likes
- Media attachments (images, videos, GIFs)
- Rich metadata (DIDs, handles, timestamps)
Mastodon (Fediverse)
- Multi-instance support (mastodon.social, mas.to, fosstodon.org, etc.)
- Full keyword search across instances
- User timeline tracking
- Boosts, replies, favorites
- Media attachments with alt text
- Instance-specific data
๐ก Use Cases
- Social Listening: Track brand mentions and industry keywords
- Market Research: Analyze trends and conversations in your niche
- Sentiment Analysis: Collect data for AI/ML sentiment models
- Brand Monitoring: Monitor your company and competitors
- Academic Research: Study social media behavior and network effects
- Content Discovery: Find engaging content for curation
- Influencer Tracking: Monitor key voices in your industry
๐ฏ Quick Start
Example 1: Search for AI-related posts
```json { "platforms": ["bluesky", "mastodon"], "query": "artificial intelligence", "maxItems": 100, "languages": ["en"] } ```
Example 2: Track specific users
```json { "platforms": ["bluesky", "mastodon"], "handles": ["jay.bsky.social", "@gargron@mastodon.social"], "maxItems": 500 } ```
Example 3: Historical search with date range
```json { "platforms": ["bluesky"], "query": "climate change", "since": "2025-09-01T00:00:00Z", "until": "2025-10-01T00:00:00Z", "maxItems": 1000 } ```
Example 4: Real-time monitoring with webhooks
```json { "platforms": ["bluesky", "mastodon"], "query": "crypto", "emitWebhooks": true, "webhooks": [ { "url": "https://your-api.com/webhook", "headers": {"Authorization": "Bearer YOUR_TOKEN"}, "mode": "per_item", "platforms": ["bluesky"] } ] } ```
๐ฅ Input Parameters
Parameter | Type | Required | Description |
---|---|---|---|
`platforms` | Array | โ | Platforms to scrape: `["bluesky", "mastodon"]` |
`query` | String | โ | Keywords to search for |
`handles` | Array | โ | Specific user handles to track |
`since` | String | โ | Start date (ISO 8601) |
`until` | String | โ | End date (ISO 8601) |
`maxItems` | Integer | โ | Max posts to collect (default: 1000) |
`languages` | Array | โ | Language codes (e.g., `["en", "de"]`) |
`includeReplies` | Boolean | โ | Include reply posts (default: false) |
`emitWebhooks` | Boolean | โ | Enable webhook delivery |
`webhooks` | Array | โ | Webhook endpoint configurations |
`blueskyCredentials` | Object | โ | Optional auth for higher rate limits |
`mastodonInstances` | Array | โ | Specific Mastodon instances to search |
`maxConcurrency` | Integer | โ | Concurrent requests (default: 5) |
`dryRun` | Boolean | โ | Test mode without storing data |
Note: You must provide either `query` OR `handles` (or both).
๐ค Output Schema
Each post is normalized to a unified format:
```json { "platform": "bluesky", "postId": "at://did:plc:xyz/app.bsky.feed.post/3kff...", "url": "https://bsky.app/profile/jay.bsky.social/post/3kff...", "text": "Building the future of social media...", "language": "en", "author": { "handle": "jay.bsky.social", "did": "did:plc:xyz", "displayName": "Jay Graber", "profileUrl": "https://bsky.app/profile/jay.bsky.social" }, "createdAt": "2025-10-08T10:30:00Z", "metrics": { "replies": 42, "reposts": 128, "likes": 567, "quotes": 23 }, "entities": { "hashtags": ["decentralization", "atproto"], "mentions": ["@handle1.bsky.social"] }, "media": [ { "type": "image", "url": "https://cdn.bsky.app/...", "alt": "Screenshot of the app" } ], "source": { "instance": null }, "references": { "replyTo": null, "quotedPost": "at://did:plc:..." }, "ingest_meta": { "first_seen_at": "2025-10-08T11:00:00Z", "adapter_version": "1.0.0" } } ```
๐ Authentication
Bluesky (Optional)
Works without authentication for public data. For higher rate limits: ```json { "blueskyCredentials": { "identifier": "your-handle.bsky.social", "password": "your-app-password" } } ``` Get app password: Settings โ App Passwords โ Add App Password
Mastodon
No authentication required for public posts.
๐ Mastodon Instance Support
Auto-Detection
The actor automatically detects Mastodon instances from handles: ```json { "handles": ["@user@mastodon.social", "@dev@fosstodon.org"] } ```
Manual Configuration
Specify instances explicitly: ```json { "mastodonInstances": ["mastodon.social", "mas.to", "fosstodon.org"] } ```
๐ช Webhooks
Send posts to your endpoints in real-time:
```json { "emitWebhooks": true, "webhooks": [ { "url": "https://api.example.com/posts", "headers": { "Authorization": "Bearer YOUR_TOKEN", "Content-Type": "application/json" }, "secret": "shared-secret-key", "mode": "per_item", "platforms": ["bluesky", "mastodon"] } ] } ```
Webhook Modes:
- `per_item`: Send each post individually
- `batch`: Send posts in batches (coming soon)
๐ฐ Pricing
Pay-Per-Event Model: Only pay for posts you collect
- $0.002 per post ($2 per 1,000 posts)
- No compute time charges
- No setup fees
- Cancel anytime
Examples:
- 100 posts = $0.20
- 1,000 posts = $2.00
- 10,000 posts = $20.00
- 100,000 posts = $200.00
Simple, transparent pricing - you only pay for what you use.
๐ Scheduling
Run every hour
``` 0 * * * * ```
Run daily at midnight
``` 0 0 * * * ```
Run every 15 minutes
``` */15 * * * * ```
๐ Deduplication
The actor automatically:
- Tracks seen posts with state management
- Skips duplicates across runs
- Cleans up old state entries (30+ days)
โก Performance
- Speed: ~100-200 posts/minute per platform
- Rate Limits: Respects platform rate limits automatically
- Concurrency: Configurable (1-20 concurrent requests)
- Memory: ~256MB typical, ~512MB for large runs
๐ ๏ธ Advanced Configuration
Language Filtering
```json { "languages": ["en", "de", "ja", "es"] } ```
Date Range
```json { "since": "2025-09-01T00:00:00Z", "until": "2025-10-01T00:00:00Z" } ```
Include Replies
```json { "includeReplies": true } ```
Dry Run (Testing)
```json { "dryRun": true } ```
๐ Dataset Views
The actor provides three pre-configured views in Apify Console:
- Overview: All posts with key metrics
- By Platform: Posts grouped by source
- Top Engagement: Sorted by likes/reposts
๐ Search Tips
Keyword Search
- Use specific terms: "machine learning" vs "AI"
- Combine keywords: "climate change policy"
- Use quotes for exact phrases (Bluesky only)
Handle Formats
- Bluesky: `jay.bsky.social` or `handle.domain.com`
- Mastodon: `@username@instance.social` or `instance.social/@username`
Date Ranges
- Use ISO 8601 format: `2025-10-08T10:30:00Z`
- Timezone: Always UTC (Z suffix)
โ ๏ธ Limitations
- Bluesky: Keyword search uses searchActors workaround (may be slower than native search)
- Mastodon: Search quality depends on instance search capabilities
- Rate Limits: Public APIs have rate limits (authentication increases limits)
- Historical Data: Availability depends on platform retention policies
๐ Support
- Email: kontakt@barrierefix.de
- Issues: Report bugs or request features
- Documentation: Full API docs in source code
๐ License
MIT License - Free to use commercially and privately
๐ท๏ธ Tags
`bluesky` `mastodon` `at-protocol` `fediverse` `social-media` `scraper` `aggregator` `decentralized` `web3` `social-listening` `brand-monitoring` `sentiment-analysis` `market-research` `data-collection` `apify`
Built by Barrierefix | Powered by Apify
On this page
Share Actor: