Bluesky & Mastodon Scraper - Decentralized Social Media avatar
Bluesky & Mastodon Scraper - Decentralized Social Media

Pricing

Pay per event

Go to Apify Store
Bluesky & Mastodon Scraper - Decentralized Social Media

Bluesky & Mastodon Scraper - Decentralized Social Media

Developed by

BarriereFix

BarriereFix

Maintained by Community

Extract and monitor posts from Bluesky (AT Protocol) and Mastodon (Fediverse). The most comprehensive social media scraper for decentralized networks - perfect for social listening, brand monitoring, market research, sentiment analysis, and AI training data collection.

0.0 (0)

Pricing

Pay per event

0

2

2

Last modified

2 days ago

Bluesky & Mastodon Scraper API - Decentralized Social Media Data Aggregator

Extract and monitor posts from Bluesky (AT Protocol) and Mastodon (Fediverse) with a unified, normalized JSON API. The most comprehensive social media scraper for decentralized networks - perfect for social listening, brand monitoring, market research, sentiment analysis, and AI training data collection.

๐Ÿ” Search by keywords โ€ข ๐Ÿ‘ฅ Track specific users โ€ข ๐Ÿ“Š Unified data format โ€ข ๐Ÿช Real-time webhooks โ€ข ๐Ÿ’ฐ Pay-per-post pricing

๐Ÿš€ Features

  • Multi-Platform Support: Scrape Bluesky and Mastodon simultaneously
  • Keyword Search: Find posts mentioning specific terms or phrases
  • Handle Tracking: Monitor specific users across platforms
  • Date Range Filtering: Historical and real-time post collection
  • Unified Schema: Normalized output format across all platforms
  • Intelligent Deduplication: Automatic duplicate detection and removal
  • Real-Time Webhooks: Send posts to your endpoints as they're discovered
  • Language Filtering: Filter posts by language (BCP-47 codes)
  • Pay-Per-Event Pricing: Only pay for posts collected, not compute time
  • No Authentication Required: Works with public data (optional auth for higher limits)

๐Ÿ“Š Supported Platforms

Bluesky (AT Protocol)

  • Full keyword search via searchActors workaround
  • User feed tracking
  • Quote posts, replies, reposts, likes
  • Media attachments (images, videos, GIFs)
  • Rich metadata (DIDs, handles, timestamps)

Mastodon (Fediverse)

  • Multi-instance support (mastodon.social, mas.to, fosstodon.org, etc.)
  • Full keyword search across instances
  • User timeline tracking
  • Boosts, replies, favorites
  • Media attachments with alt text
  • Instance-specific data

๐Ÿ’ก Use Cases

  • Social Listening: Track brand mentions and industry keywords
  • Market Research: Analyze trends and conversations in your niche
  • Sentiment Analysis: Collect data for AI/ML sentiment models
  • Brand Monitoring: Monitor your company and competitors
  • Academic Research: Study social media behavior and network effects
  • Content Discovery: Find engaging content for curation
  • Influencer Tracking: Monitor key voices in your industry

๐ŸŽฏ Quick Start

```json { "platforms": ["bluesky", "mastodon"], "query": "artificial intelligence", "maxItems": 100, "languages": ["en"] } ```

Example 2: Track specific users

```json { "platforms": ["bluesky", "mastodon"], "handles": ["jay.bsky.social", "@gargron@mastodon.social"], "maxItems": 500 } ```

Example 3: Historical search with date range

```json { "platforms": ["bluesky"], "query": "climate change", "since": "2025-09-01T00:00:00Z", "until": "2025-10-01T00:00:00Z", "maxItems": 1000 } ```

Example 4: Real-time monitoring with webhooks

```json { "platforms": ["bluesky", "mastodon"], "query": "crypto", "emitWebhooks": true, "webhooks": [ { "url": "https://your-api.com/webhook", "headers": {"Authorization": "Bearer YOUR_TOKEN"}, "mode": "per_item", "platforms": ["bluesky"] } ] } ```

๐Ÿ“ฅ Input Parameters

ParameterTypeRequiredDescription
`platforms`Arrayโœ…Platforms to scrape: `["bluesky", "mastodon"]`
`query`StringโŒKeywords to search for
`handles`ArrayโŒSpecific user handles to track
`since`StringโŒStart date (ISO 8601)
`until`StringโŒEnd date (ISO 8601)
`maxItems`IntegerโŒMax posts to collect (default: 1000)
`languages`ArrayโŒLanguage codes (e.g., `["en", "de"]`)
`includeReplies`BooleanโŒInclude reply posts (default: false)
`emitWebhooks`BooleanโŒEnable webhook delivery
`webhooks`ArrayโŒWebhook endpoint configurations
`blueskyCredentials`ObjectโŒOptional auth for higher rate limits
`mastodonInstances`ArrayโŒSpecific Mastodon instances to search
`maxConcurrency`IntegerโŒConcurrent requests (default: 5)
`dryRun`BooleanโŒTest mode without storing data

Note: You must provide either `query` OR `handles` (or both).

๐Ÿ“ค Output Schema

Each post is normalized to a unified format:

```json { "platform": "bluesky", "postId": "at://did:plc:xyz/app.bsky.feed.post/3kff...", "url": "https://bsky.app/profile/jay.bsky.social/post/3kff...", "text": "Building the future of social media...", "language": "en", "author": { "handle": "jay.bsky.social", "did": "did:plc:xyz", "displayName": "Jay Graber", "profileUrl": "https://bsky.app/profile/jay.bsky.social" }, "createdAt": "2025-10-08T10:30:00Z", "metrics": { "replies": 42, "reposts": 128, "likes": 567, "quotes": 23 }, "entities": { "hashtags": ["decentralization", "atproto"], "mentions": ["@handle1.bsky.social"] }, "media": [ { "type": "image", "url": "https://cdn.bsky.app/...", "alt": "Screenshot of the app" } ], "source": { "instance": null }, "references": { "replyTo": null, "quotedPost": "at://did:plc:..." }, "ingest_meta": { "first_seen_at": "2025-10-08T11:00:00Z", "adapter_version": "1.0.0" } } ```

๐Ÿ” Authentication

Bluesky (Optional)

Works without authentication for public data. For higher rate limits: ```json { "blueskyCredentials": { "identifier": "your-handle.bsky.social", "password": "your-app-password" } } ``` Get app password: Settings โ†’ App Passwords โ†’ Add App Password

Mastodon

No authentication required for public posts.

๐ŸŒ Mastodon Instance Support

Auto-Detection

The actor automatically detects Mastodon instances from handles: ```json { "handles": ["@user@mastodon.social", "@dev@fosstodon.org"] } ```

Manual Configuration

Specify instances explicitly: ```json { "mastodonInstances": ["mastodon.social", "mas.to", "fosstodon.org"] } ```

๐Ÿช Webhooks

Send posts to your endpoints in real-time:

```json { "emitWebhooks": true, "webhooks": [ { "url": "https://api.example.com/posts", "headers": { "Authorization": "Bearer YOUR_TOKEN", "Content-Type": "application/json" }, "secret": "shared-secret-key", "mode": "per_item", "platforms": ["bluesky", "mastodon"] } ] } ```

Webhook Modes:

  • `per_item`: Send each post individually
  • `batch`: Send posts in batches (coming soon)

๐Ÿ’ฐ Pricing

Pay-Per-Event Model: Only pay for posts you collect

  • $0.002 per post ($2 per 1,000 posts)
  • No compute time charges
  • No setup fees
  • Cancel anytime

Examples:

  • 100 posts = $0.20
  • 1,000 posts = $2.00
  • 10,000 posts = $20.00
  • 100,000 posts = $200.00

Simple, transparent pricing - you only pay for what you use.

๐Ÿ“… Scheduling

Run every hour

``` 0 * * * * ```

Run daily at midnight

``` 0 0 * * * ```

Run every 15 minutes

``` */15 * * * * ```

๐Ÿ”„ Deduplication

The actor automatically:

  • Tracks seen posts with state management
  • Skips duplicates across runs
  • Cleans up old state entries (30+ days)

โšก Performance

  • Speed: ~100-200 posts/minute per platform
  • Rate Limits: Respects platform rate limits automatically
  • Concurrency: Configurable (1-20 concurrent requests)
  • Memory: ~256MB typical, ~512MB for large runs

๐Ÿ› ๏ธ Advanced Configuration

Language Filtering

```json { "languages": ["en", "de", "ja", "es"] } ```

Date Range

```json { "since": "2025-09-01T00:00:00Z", "until": "2025-10-01T00:00:00Z" } ```

Include Replies

```json { "includeReplies": true } ```

Dry Run (Testing)

```json { "dryRun": true } ```

๐Ÿ“Š Dataset Views

The actor provides three pre-configured views in Apify Console:

  1. Overview: All posts with key metrics
  2. By Platform: Posts grouped by source
  3. Top Engagement: Sorted by likes/reposts

๐Ÿ” Search Tips

  • Use specific terms: "machine learning" vs "AI"
  • Combine keywords: "climate change policy"
  • Use quotes for exact phrases (Bluesky only)

Handle Formats

  • Bluesky: `jay.bsky.social` or `handle.domain.com`
  • Mastodon: `@username@instance.social` or `instance.social/@username`

Date Ranges

  • Use ISO 8601 format: `2025-10-08T10:30:00Z`
  • Timezone: Always UTC (Z suffix)

โš ๏ธ Limitations

  • Bluesky: Keyword search uses searchActors workaround (may be slower than native search)
  • Mastodon: Search quality depends on instance search capabilities
  • Rate Limits: Public APIs have rate limits (authentication increases limits)
  • Historical Data: Availability depends on platform retention policies

๐Ÿ†˜ Support

  • Email: kontakt@barrierefix.de
  • Issues: Report bugs or request features
  • Documentation: Full API docs in source code

๐Ÿ“œ License

MIT License - Free to use commercially and privately

๐Ÿท๏ธ Tags

`bluesky` `mastodon` `at-protocol` `fediverse` `social-media` `scraper` `aggregator` `decentralized` `web3` `social-listening` `brand-monitoring` `sentiment-analysis` `market-research` `data-collection` `apify`


Built by Barrierefix | Powered by Apify