Bluesky Scraper avatar

Bluesky Scraper

Pricing

Pay per usage

Go to Apify Store
Bluesky Scraper

Bluesky Scraper

Scrape Bluesky (bsky.app) posts, profiles, and search results using the public AT Protocol API. No authentication required.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

George Kioko

George Kioko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 hours ago

Last modified

Categories

Share

Bluesky Scraper - Extract Posts, Profiles & Search Data from bsky.app

Apify Actor AT Protocol

A fast, reliable Bluesky scraper built on the public AT Protocol API. Extract posts, user profiles, and search results from bsky.app without any authentication or browser automation. Just point it at the data you need and go.

The scraper handles pagination, rate limiting, and retries automatically -- so you get clean, structured JSON every time. Whether you're tracking a hashtag, monitoring a competitor, or building a dataset for research, this tool does the heavy lifting.

Key Features

  • Search posts by keyword, hashtag, or topic across all of Bluesky
  • Scrape user feeds -- get every post from a specific handle
  • Extract full profiles with follower counts, bios, and metadata
  • Search for users/actors matching your query
  • No authentication needed -- uses Bluesky's public AT Protocol API
  • Automatic pagination -- fetches up to 10,000 items per query
  • Smart rate limit handling with exponential backoff and retries
  • Rich post data including likes, reposts, replies, hashtags, images, and direct web URLs
  • Pay only for what you scrape -- $0.003 per item

How It Works

flowchart LR
A["Your Input\n(keywords, handles)"] --> B["Bluesky Scraper"]
B --> C{"Scrape Type?"}
C -->|posts| D["app.bsky.feed.searchPosts\napp.bsky.feed.getAuthorFeed"]
C -->|profiles| E["app.bsky.actor.getProfile\napp.bsky.feed.getAuthorFeed"]
C -->|search| F["app.bsky.actor.searchActors"]
D --> G["AT Protocol\nPublic API"]
E --> G
F --> G
G --> H["Parse & Structure\nJSON Data"]
H --> I["Apify Dataset\n(JSON, CSV, Excel)"]

Input Parameters

ParameterTypeRequiredDefaultDescription
scrapeTypestringYes"posts"What to scrape: posts, profiles, or search
searchTermsstring[]No[]Keywords to search for (e.g., ["artificial intelligence", "web scraping"])
userHandlesstring[]No[]Bluesky handles to scrape (e.g., ["jay.bsky.team", "pfrazee.com"])
maxResultsintegerNo100Max items per search term or handle (1 -- 10,000)

Scrape Type Behavior

ModesearchTermsuserHandlesWhat You Get
postsSearches posts matching keywordsFetches the user's feedPost objects with engagement metrics
profilesSearches for matching usersGets full profile + their feedProfile objects + post objects
searchSearches for matching users--Profile objects only

Example Input

{
"scrapeType": "posts",
"searchTerms": ["bluesky api", "decentralized social"],
"userHandles": ["jay.bsky.team"],
"maxResults": 50
}

Output Data

Post Output

Each scraped post includes engagement metrics, hashtags, images, and a direct link:

{
"uri": "at://did:plc:abc123/app.bsky.feed.post/xyz789",
"cid": "bafyreig...",
"text": "Just shipped a new feature using the AT Protocol. The open social web is happening! #bluesky #atproto",
"authorHandle": "developer.bsky.social",
"authorDisplayName": "Dev Builder",
"authorAvatar": "https://cdn.bsky.app/img/avatar/...",
"likeCount": 42,
"repostCount": 12,
"replyCount": 7,
"quoteCount": 3,
"createdAt": "2026-03-25T14:30:00.000Z",
"indexedAt": "2026-03-25T14:30:01.500Z",
"hashtags": ["#bluesky", "#atproto"],
"images": [
{
"alt": "Screenshot of the new feature",
"thumb": "https://cdn.bsky.app/img/feed_thumbnail/...",
"fullsize": "https://cdn.bsky.app/img/feed_fullsize/..."
}
],
"langs": ["en"],
"webUrl": "https://bsky.app/profile/developer.bsky.social/post/xyz789"
}

Profile Output

Profile data includes follower/following counts, bio, and account metadata:

{
"did": "did:plc:abc123",
"handle": "jay.bsky.team",
"displayName": "Jay Graber",
"description": "CEO of Bluesky. Building the open social web.",
"avatar": "https://cdn.bsky.app/img/avatar/...",
"banner": "https://cdn.bsky.app/img/banner/...",
"followersCount": 285000,
"followsCount": 1200,
"postsCount": 4500,
"indexedAt": "2026-03-25T10:00:00.000Z",
"createdAt": "2023-04-01T00:00:00.000Z",
"labels": [],
"webUrl": "https://bsky.app/profile/jay.bsky.team"
}

Use Cases

  • Brand monitoring -- Track mentions of your company, product, or competitors on Bluesky in real time
  • Market research -- Analyze sentiment and trends around topics in the growing Bluesky community
  • Journalism -- Gather public statements and posts from newsworthy accounts for reporting
  • Academic research -- Build datasets of social media discourse for NLP, network analysis, or sociological studies
  • Influencer discovery -- Find and evaluate Bluesky creators by follower count, engagement, and posting frequency
  • Content strategy -- Study what topics and formats perform best on the platform
  • Competitive intelligence -- Monitor what your industry peers are saying and how their audiences respond

Pricing

This actor uses pay-per-event pricing. You only pay for what you scrape:

EventCost
item-scraped$0.003 per item

A typical run scraping 100 posts costs about $0.30. No monthly fees, no subscriptions -- just pay for results.

How to Run

Via Apify Console

  1. Go to the Bluesky Scraper page on Apify Store
  2. Click Start (or Try for free)
  3. Fill in your search terms, handles, and scrape type
  4. Hit Run and download your data as JSON, CSV, or Excel

Via Apify API

curl -X POST "https://api.apify.com/v2/acts/george.the.developer~bluesky-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"scrapeType": "posts",
"searchTerms": ["bluesky analytics"],
"maxResults": 200
}'

Retrieve results once the run finishes:

$curl "https://api.apify.com/v2/acts/george.the.developer~bluesky-scraper/runs/last/dataset/items?token=YOUR_API_TOKEN"

Via Apify CLI

# Install the Apify CLI
npm install -g apify-cli
# Run the actor
apify call george.the.developer/bluesky-scraper -i '{
"scrapeType": "profiles",
"userHandles": ["jay.bsky.team", "pfrazee.com"],
"maxResults": 50
}'

Via Apify JavaScript Client

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('george.the.developer/bluesky-scraper').call({
scrapeType: 'posts',
searchTerms: ['decentralized social media'],
maxResults: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Via Apify Python Client

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("george.the.developer/bluesky-scraper").call(run_input={
"scrapeType": "posts",
"searchTerms": ["bluesky data extraction"],
"maxResults": 100,
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(f"Scraped {len(items)} posts")

Limitations

  • Public data only -- This scraper uses the public AT Protocol API. It cannot access private/blocked accounts or DMs.
  • Rate limits -- Bluesky's API enforces rate limits. The scraper handles 429 responses with automatic backoff, but extremely large scrapes may take longer.
  • No authentication -- Because no login is required, the scraper is limited to endpoints available via the public api.bsky.app XRPC interface.
  • Max 10,000 items per query -- Each search term or user handle is capped at 10,000 results per run.
  • Search result ordering -- Results are returned in the order provided by Bluesky's search API (relevance-based). Custom sorting is not available.

FAQ

Do I need a Bluesky account to use this scraper?

No. This scraper uses the public AT Protocol API, which requires no authentication. You don't need a Bluesky account, API key, or any credentials. Just provide your search terms or handles and run it.

What is the AT Protocol and why does it matter?

The AT Protocol (Authenticated Transfer Protocol) is the open, decentralized protocol that powers Bluesky. Because it's designed to be open, much of Bluesky's data is publicly accessible through standardized API endpoints -- which is what this scraper leverages. No reverse engineering or browser automation needed.

Can I scrape posts from a specific date range?

Currently, the scraper returns results in the order provided by Bluesky's search API. Date filtering is not directly supported as an input parameter, but you can filter results by the createdAt field after scraping. The search API tends to return recent and relevant posts first.

How does this compare to using the Bluesky API directly?

This scraper wraps the raw AT Protocol API with automatic pagination, rate limit handling (with exponential backoff), retry logic for server errors, and clean data parsing. Instead of writing boilerplate code to handle cursors, HTTP errors, and data normalization, you get structured JSON output ready for analysis. It also runs on Apify's infrastructure, so you don't need to manage servers or worry about IP blocks.

This tool accesses Bluesky's public API -- the same endpoints any developer can call. It collects only publicly available data. As with any data collection, you should comply with applicable laws (GDPR, CCPA) and Bluesky's Terms of Service. This tool is intended for legitimate use cases like research, journalism, and brand monitoring.


Built with the Apify SDK and the AT Protocol public API.