Bluesky Social Scraper — Posts, Profiles & Engagement avatar

Bluesky Social Scraper — Posts, Profiles & Engagement

Under maintenance

Pricing

Pay per usage

Go to Apify Store
Bluesky Social Scraper — Posts, Profiles & Engagement

Bluesky Social Scraper — Posts, Profiles & Engagement

Under maintenance

Scrape Bluesky posts, profiles, and engagement metrics via the public AT Protocol API. Extract post text, likes, reposts, replies, images, and author data. No login needed. Export JSON, CSV, Excel.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ricardo Akiyoshi

Ricardo Akiyoshi

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 hours ago

Last modified

Categories

Share

Bluesky Social Scraper

Scrape posts, profiles, and engagement data from Bluesky Social (bsky.app). Built on the AT Protocol public API — no login required, no authentication tokens, no scraping workarounds.

Why Bluesky?

Bluesky is the fastest-growing decentralized social network, built on the AT Protocol. Unlike Twitter/X, Bluesky exposes a fully public API that anyone can query without authentication. This makes it the most accessible social network for data collection, brand monitoring, and social intelligence.

What It Does

This actor extracts structured data from Bluesky via two methods:

Primary: AT Protocol Public API (fast, reliable, structured)

  • Profile data: display name, handle, DID, bio, follower/following counts, avatar
  • Author feed: all posts from a user's timeline with full engagement metrics
  • Search: find posts matching keywords, hashtags, or phrases

Fallback: Web Scraping (for edge cases when API is unavailable)

  • CheerioCrawler-based extraction from bsky.app web pages
  • DOM parsing with React hydration data extraction

Post Data (per post)

FieldTypeDescription
authorstringDisplay name of the post author
handlestringBluesky handle (e.g., jay.bsky.social)
textstringFull post text content
likesnumberLike count
repostsnumberRepost count
repliesnumberReply count
postedAtstringISO 8601 timestamp
imagesarrayURLs of attached images
urlstringDirect web link to the post
isReplybooleanWhether the post is a reply
hashtagsarrayHashtags extracted from rich text facets
mentionsarrayMentioned user DIDs
linksarrayURLs mentioned in the post
languagesarrayLanguage codes (e.g., ["en"])
externalLinkobjectAttached link card data (uri, title, description)
quotedPostobjectQuoted post summary (text, author, handle)
isRepostbooleanWhether this is a repost by someone else
authorFollowersnumberAuthor's follower count at time of scraping

Profile Data (per handle)

FieldTypeDescription
handlestringBluesky handle
displayNamestringDisplay name
didstringDecentralized Identifier (DID)
descriptionstringBio/description text
followersCountnumberNumber of followers
followsCountnumberNumber of accounts followed
postsCountnumberTotal posts authored
avatarstringAvatar image URL
createdAtstringAccount creation date

Features

  • No authentication needed — Bluesky's AT Protocol API is public
  • Fast API-first approach — direct XRPC calls, no browser rendering
  • Web fallback — CheerioCrawler backup if API is unavailable
  • Automatic pagination — follows cursors through all feed pages
  • Post deduplication — tracks URIs to prevent duplicate entries
  • Rate limiting — polite crawling with jitter and burst protection
  • Exponential backoff — automatic retry with increasing delays
  • Reply filtering — optionally exclude replies for cleaner data
  • Rich text parsing — extracts hashtags, mentions, and links from facets
  • Image extraction — collects all attached image URLs
  • Quote post detection — identifies and extracts quoted post content
  • Repost detection — identifies reposts with original author attribution
  • Proxy support — configurable proxy for high-volume runs
  • Progress tracking — periodic progress logs with stats
  • Run summary — final summary record in the dataset

Input

FieldTypeDefaultDescription
profileHandlesarray[]Bluesky @handles to scrape (e.g., jay.bsky.social)
searchTermsarray[]Keywords to search for in posts
maxPostsinteger200Maximum total posts to scrape (0 = unlimited)
includeRepliesbooleanfalseInclude reply posts in results
proxyobject-Proxy configuration for rate limit avoidance

Example: Scrape Profiles

{
"profileHandles": [
"jay.bsky.social",
"pfrazee.com",
"bsky.app"
],
"maxPosts": 500,
"includeReplies": false
}

Example: Search Posts

{
"searchTerms": [
"artificial intelligence",
"#buildinpublic",
"startup funding"
],
"maxPosts": 1000
}
{
"profileHandles": ["jay.bsky.social"],
"searchTerms": ["bluesky api", "AT Protocol"],
"maxPosts": 300,
"includeReplies": true
}

Output

Post Record

{
"type": "post",
"author": "Jay Graber",
"handle": "jay.bsky.social",
"text": "Excited to share our latest protocol update! The AT Protocol now supports...",
"likes": 1523,
"reposts": 412,
"replies": 89,
"postedAt": "2026-02-28T18:30:00.000Z",
"images": [
"https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:xxx/bafkrei..."
],
"url": "https://bsky.app/profile/jay.bsky.social/post/3kmxxxxxx",
"isReply": false,
"hashtags": ["ATProtocol", "decentralized"],
"mentions": [],
"links": ["https://atproto.com/blog/update"],
"languages": ["en"],
"externalLink": {
"uri": "https://atproto.com/blog/update",
"title": "AT Protocol Update",
"description": "Latest changes to the AT Protocol specification."
},
"quotedPost": null,
"isRepost": false,
"authorFollowers": 125000,
"scrapedAt": "2026-03-01T10:00:00.000Z"
}

Profile Record

{
"type": "profile",
"handle": "jay.bsky.social",
"displayName": "Jay Graber",
"did": "did:plc:oky5czdrnfjpqslsw2a5iclo",
"description": "CEO @bluesky. Building the AT Protocol.",
"avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:xxx/bafkrei...",
"followersCount": 125000,
"followsCount": 1200,
"postsCount": 3400,
"createdAt": "2023-04-01T00:00:00.000Z",
"url": "https://bsky.app/profile/jay.bsky.social",
"scrapedAt": "2026-03-01T10:00:00.000Z"
}

Run Summary Record

{
"type": "runSummary",
"totalPostsSaved": 500,
"profilesScraped": 3,
"searchResultsScraped": 200,
"handlesCompleted": ["jay.bsky.social", "pfrazee.com"],
"searchesCompleted": ["AT Protocol"],
"duplicatesSkipped": 12,
"errors": 0,
"elapsedSeconds": 45.3,
"apiRequests": 28
}

Use Cases

  • Social Monitoring — Track mentions of your brand, product, or competitors on Bluesky
  • Brand Tracking — Monitor what key influencers and thought leaders are posting
  • Trend Analysis — Identify trending topics, hashtags, and conversations
  • Sentiment Analysis — Feed post text into NLP models for sentiment classification
  • Market Research — Understand what your target audience discusses and shares
  • Competitive Intelligence — Track competitor activity and engagement metrics
  • Content Strategy — Discover what content gets the most engagement
  • Influencer Discovery — Find high-engagement accounts in your niche
  • Academic Research — Collect social data for studies on decentralized networks
  • Crisis Monitoring — Real-time tracking of conversations around events or issues

Technical Details

  • Runtime: Node.js 18 on Apify platform
  • Primary method: AT Protocol XRPC public API (no auth required)
  • Fallback method: CheerioCrawler for bsky.app web pages
  • Rate limiting: 200-1500ms between API calls, burst cooldowns every 10 requests
  • Retries: 5 automatic retries with exponential backoff
  • Deduplication: URI-based tracking prevents duplicate posts
  • Memory: ~256 MB recommended for typical runs

AT Protocol API Endpoints Used

EndpointPurpose
app.bsky.actor.getProfileFetch user profile data
app.bsky.feed.getAuthorFeedFetch posts from a user's feed
app.bsky.feed.searchPostsSearch posts by keyword
com.atproto.identity.resolveHandleResolve handle to DID

All endpoints are on the public relay at https://public.api.bsky.app/xrpc/ and require no authentication.

Pricing (Pay-Per-Event)

This actor uses Apify's pay-per-event model:

  • post-scraped: $0.003 per post extracted

You set a maximum budget per run, and the actor will never exceed it. Profile records and the run summary are included free.

How to Run

On Apify Platform

  1. Go to the actor page on Apify Store
  2. Click "Start" or "Try for free"
  3. Enter profile handles or search terms
  4. Set max posts and reply preference
  5. Click "Run"
  6. Download results as JSON, CSV, or Excel

Locally

# Clone the repository
git clone <repo-url>
cd bluesky-scraper
# Install dependencies
npm install
# Create input file
mkdir -p ./storage/key_value_stores/default
echo '{"profileHandles": ["jay.bsky.social"], "maxPosts": 20}' > ./storage/key_value_stores/default/INPUT.json
# Run
npm start

Via Apify API

curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/runs" \
-H "Authorization: Bearer <YOUR_TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"profileHandles": ["jay.bsky.social"],
"searchTerms": ["AT Protocol"],
"maxPosts": 100,
"includeReplies": false
}'

Limitations

  • The AT Protocol public API may rate-limit high-volume requests. Use Apify proxies and reasonable maxPosts values.
  • Search results are limited by the Bluesky search index. Very recent posts may not appear immediately.
  • Web fallback extraction is less reliable than the API method and may miss engagement metrics.
  • Bluesky's web app uses React SSR, so DOM structure may change over time.
  • Private/blocked accounts may return limited or no data.

Comparison with Twitter/X Scraping

FeatureBluesky ScraperTwitter/X Scraper
AuthenticationNone requiredAPI keys or login
Rate limitsGenerous public APIStrict, paid tiers
Data accessFull post + profileLimited by API tier
CostPPE onlyAPI subscription + PPE
ReliabilityHigh (open protocol)Varies (frequent changes)

Support

Built by Sovereign AI. For issues, feature requests, or custom scraping needs:

Integration — Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("sovereigntaylor/bluesky-scraper").call(run_input={
"searchTerm": "bluesky",
"maxResults": 50
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item.get('title', item.get('name', 'N/A'))}")

Integration — JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('sovereigntaylor/bluesky-scraper').call({
searchTerm: 'bluesky',
maxResults: 50
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => console.log(item.title || item.name || 'N/A'));