Bluesky Scraper - Profiles, Posts, Followers, Search
Pricing
Pay per usage
Bluesky Scraper - Profiles, Posts, Followers, Search
Scrape Bluesky via the official AT Protocol: profiles, posts, post search, followers, following, threads & custom feeds. No proxy required, no anti-bot — official open API. App password optional.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Khalil Drissi
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Bluesky Scraper — Profiles, Posts, Followers & Search via the Open AT Protocol
Scrape any public Bluesky data through the official AT Protocol API — no proxies, no anti-bot fights, no terms-of-service grey areas. One actor covers seven use cases: profile scraping, post feeds, keyword search, follower/following graphs, thread expansion, and custom feed scraping.
Features
| Feature | Detail |
|---|---|
| 7 scraping modes | profile · posts · search · followers · following · thread · feed |
| No proxy needed | Official public API; Bluesky does not block scrapers |
| Auth optional | Most modes work without a Bluesky account; search & custom feeds work best with an app password |
| Structured output | Typed JSON records with consistent field names; covers text, media, embeds, counts |
| Pagination | Automatically pages through all results up to your maxItems cap |
| Resilient | Per-item error isolation; exponential backoff on rate limits (HTTP 429) |
| Pay-per-event | Only pay for what you scrape — profiles, posts, or connections |
| MIT-licensed API | Uses the MarshalX atproto Python SDK |
Why Bluesky scraping beats Twitter / X scraping
| Aspect | Bluesky (this actor) | Twitter / X |
|---|---|---|
| API type | Official, documented AT Protocol | Reverse-engineered / unofficial |
| Anti-bot measures | None — open protocol | Heavy: CAPTCHAs, rate-limit bans, IP blocks |
| Proxy cost | Zero — direct connection | High — residential proxies often required |
| Legal standing | Public data, open protocol, no ToS conflict | Grey area; ToS explicitly prohibits scraping |
| Authentication | App password optional (free) | Paid API tiers ($100–$5,000/mo) |
| Data freshness | Real-time | Delayed or restricted on free tiers |
Use Cases
- AI training datasets — collect large-scale post corpora with text, language tags, and engagement signals.
- Social media analytics — track follower growth, post volume, engagement rates across accounts.
- Journalism & OSINT — search posts by keyword and date range, expand threads for context.
- Brand monitoring — monitor mentions of a brand, product, or topic across the network.
- AT Protocol research — study the social graph, feed algorithms, or labeling systems.
Input
Mode
Select exactly one mode per run. Combine modes by running the actor multiple times (trivially parallelizable on the Apify platform).
| Mode | What it returns | Required input fields |
|---|---|---|
profile | Full profile records for one or more handles/DIDs | handles |
posts | Recent posts by one or more accounts | handles, maxItems |
search | Posts matching a keyword query | searchQuery, maxItems (+ auth recommended) |
followers | Accounts that follow a handle | handles, maxItems |
following | Accounts a handle follows | handles, maxItems |
thread | Full reply tree for a post URL | postUrls |
feed | Posts from a custom feed generator | feedUrls, maxItems (+ auth recommended) |
All input fields
| Field | Type | Default | Description |
|---|---|---|---|
mode | enum | profile | Scraping mode (required) |
handles | string[] | — | Bluesky handles or DIDs; used by profile, posts, followers, following |
searchQuery | string | — | Keyword/phrase to search (search mode) |
postUrls | string[] | — | Post URLs or at:// URIs (thread mode) |
feedUrls | string[] | — | Feed generator at:// URIs (feed mode) |
maxItems | integer | 100 | Max records per handle/query/feed |
searchSince | string | — | ISO date lower bound for search (e.g. 2024-01-01) |
searchUntil | string | — | ISO date upper bound for search |
searchLanguage | string | — | BCP-47 language filter for search (e.g. en) |
searchSort | enum | latest | latest or top |
threadDepth | integer | 6 | How many reply levels to expand (max 1000) |
threadParentHeight | integer | 80 | How many parent levels to walk up (max 1000) |
blueskyHandle | string | — | Your Bluesky handle (optional auth) |
blueskyAppPassword | string | — | App password — see Authentication section |
proxyConfiguration | object | — | Optional Apify Proxy (rarely needed) |
Example inputs
Profile scrape (no auth needed):
{"mode": "profile","handles": ["bsky.app", "jay.bsky.team", "pfrazee.com"]}
Keyword search:
{"mode": "search","searchQuery": "open source AI","maxItems": 200,"searchSort": "latest","searchLanguage": "en","blueskyHandle": "alice.bsky.social","blueskyAppPassword": "xxxx-xxxx-xxxx-xxxx"}
Follower list:
{"mode": "followers","handles": ["bsky.app"],"maxItems": 500}
Thread expansion:
{"mode": "thread","postUrls": ["https://bsky.app/profile/bsky.app/post/3laahvvjbek2j"],"threadDepth": 10}
Recent posts by a user:
{"mode": "posts","handles": ["pfrazee.com"],"maxItems": 100}
Output
Record families
The actor produces three types of records, all tagged with a mode field.
Profile record (mode: "profile")
| Field | Type | Example |
|---|---|---|
mode | string | "profile" |
scrapedAt | ISO datetime | "2024-11-15T12:34:56.789Z" |
did | string | "did:plc:z72i7hdynmk6r22z27h6tvur" |
handle | string | "bsky.app" |
displayName | string|null | "Bluesky" |
description | string|null | "What's up?" |
followersCount | integer | 1234567 |
followsCount | integer | 42 |
postsCount | integer | 891 |
avatar | url|null | "https://cdn.bsky.app/img/avatar/..." |
banner | url|null | "https://cdn.bsky.app/img/banner/..." |
createdAt | ISO datetime|null | "2022-11-17T00:00:00.000Z" |
indexedAt | ISO datetime|null | "2024-01-01T09:00:00.000Z" |
labels | string[] | [] |
pinnedPostUri | string|null | "at://did:plc:.../app.bsky.feed.post/..." |
profileUrl | url | "https://bsky.app/profile/bsky.app" |
Example:
{"mode": "profile","scrapedAt": "2024-11-15T12:34:56.789000+00:00","did": "did:plc:z72i7hdynmk6r22z27h6tvur","handle": "bsky.app","displayName": "Bluesky","description": "What's up?","followersCount": 1423891,"followsCount": 48,"postsCount": 912,"avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:z72i7hdynmk6r22z27h6tvur/bafkreiabcd@jpeg","banner": null,"createdAt": "2022-11-17T00:00:00.000Z","indexedAt": "2024-01-01T09:00:00.000Z","labels": [],"pinnedPostUri": null,"profileUrl": "https://bsky.app/profile/bsky.app"}
Post record (mode ∈ posts / search / thread / feed)
| Field | Type | Example |
|---|---|---|
mode | string | "posts" |
scrapedAt | ISO datetime | "2024-11-15T12:34:56Z" |
uri | string | "at://did:plc:.../app.bsky.feed.post/3la..." |
cid | string | "bafyreia..." |
authorDid | string | "did:plc:..." |
authorHandle | string | "alice.bsky.social" |
authorDisplayName | string|null | "Alice" |
text | string | "Hello Bluesky!" |
createdAt | ISO datetime|null | "2024-11-15T10:00:00.000Z" |
indexedAt | ISO datetime|null | "2024-11-15T10:00:01.000Z" |
langs | string[] | ["en"] |
replyCount | integer | 12 |
repostCount | integer | 34 |
likeCount | integer | 156 |
quoteCount | integer | 5 |
bookmarkCount | integer|null | null |
isRepost | boolean | false |
isReply | boolean | false |
replyParentUri | string|null | null |
replyRootUri | string|null | null |
images | array | [{"fullsize": "...", "thumb": "...", "alt": "..."}] |
externalLink | object|null | {"uri": "...", "title": "...", "description": "..."} |
quotedPostUri | string|null | null |
quotedPostText | string|null | null |
video | object|null | {"playlist": "...", "thumbnail": "..."} |
postUrl | url|null | "https://bsky.app/profile/alice.bsky.social/post/3la..." |
Connection record (mode ∈ followers / following)
| Field | Type | Example |
|---|---|---|
mode | string | "followers" |
scrapedAt | ISO datetime | "2024-11-15T12:34:56Z" |
subjectDid | string | "did:plc:..." (the queried account) |
subjectHandle | string | "bsky.app" (the queried account) |
did | string | "did:plc:..." (the follower/followed) |
handle | string | "bob.bsky.social" |
displayName | string|null | "Bob" |
avatar | url|null | "https://cdn.bsky.app/img/avatar/..." |
description | string|null | "Builder." |
Pricing
This actor uses pay-per-event pricing — you only pay for records actually pushed to the dataset.
| Event | Price | When charged |
|---|---|---|
| Profile scraped | $0.002 | One full profile record (profile mode) |
| Post scraped | $0.0004 | One post record (posts, search, thread, feed) |
| Connection scraped | $0.0002 | One follower/following edge (followers, following) |
Example costs
| Task | Records | Cost |
|---|---|---|
| 1,000 profiles | 1,000 × $0.002 | $2.00 |
| 10,000 posts (keyword search) | 10,000 × $0.0004 | $4.00 |
| 50,000 follower records | 50,000 × $0.0002 | $10.00 |
| 500 profiles + 5,000 posts | 500 × $0.002 + 5,000 × $0.0004 | $3.00 |
Note: Pay-per-event pricing takes 14 days to take effect after monetization is configured.
Authentication
Most modes (profile, posts, followers, following, thread) work without a Bluesky account.
For search and feed modes, the public AppView may require authentication. Provide:
blueskyHandle— your Bluesky handle, e.g.alice.bsky.socialblueskyAppPassword— an app password (NOT your main account password)
Creating an app password
- Log in to bsky.app
- Go to Settings → App Passwords: https://bsky.app/settings/app-passwords
- Click Add App Password, give it a name (e.g. "Apify Scraper"), and copy the generated code.
- The format is
xxxx-xxxx-xxxx-xxxx. Paste it into theblueskyAppPasswordinput field.
App passwords have limited scope (no DM access, no account deletion) and can be revoked individually at any time without affecting your account. Never enter your main Bluesky password.
Rate Limits & Throughput
Bluesky's public AppView offers generous rate limits for read-only access. The actor:
- Fetches up to 100 records per API request (the maximum page size).
- Automatically retries on HTTP 429 (rate limit) and 5xx errors with exponential backoff, honouring
Retry-Afterheaders when present. - Isolates failures per item — one failed profile/post/edge does not stop the rest of the run.
Practical throughput: expect tens of thousands of records per run without hitting limits, depending on your account's tier and the specific endpoints used. For very large runs (100k+ records), add authentication to benefit from higher per-account limits.
FAQ
1. Will the actor slow down or get blocked at high volumes? No blocking — this is an official API. If you hit a rate limit, the actor backs off automatically and retries. For sustained high-volume runs, provide an app password to use per-account limits (higher than per-IP limits).
2. Do I need a Bluesky account?
No, for most modes. Profile, posts, followers, following, and thread modes all work unauthenticated. Search and custom-feed modes may return a 401 if used without credentials — just add blueskyHandle + blueskyAppPassword in that case.
3. What data is public on Bluesky? All posts, profiles, follower graphs, and custom feeds are public by design (AT Protocol is an open, federated network). DMs and muted/blocked relationships are not accessible via the public API.
4. Why might a field be null?
A few reasons: the post was deleted before scraping, the account is deactivated, the field is viewer-scoped and requires auth (e.g. bookmarkCount), or the field is simply optional in the AT Protocol schema (e.g. banner, displayName).
5. How is this better than scraping Twitter/X? No reverse-engineering, no CAPTCHA, no residential proxy costs, no ToS violation. Bluesky's AT Protocol is documented and open; this actor uses the same API Bluesky's own apps use. See the comparison table at the top of this README.
Legal & Compliance
Bluesky is built on the open AT Protocol. All data scraped by this actor is publicly accessible on the network by design.
- Respect creators: only use scraped content in ways consistent with the rights of the people who created it.
- GDPR / CCPA: if you process personal data from EU or California residents, you are responsible for complying with applicable data-protection law (legal basis, retention limits, subject rights, etc.).
- Redistribution: do not redistribute or commercially exploit scraped content beyond what is permitted by applicable law and the relevant terms.
- Rate limits: do not intentionally bypass rate limits or attempt to extract data at a rate that degrades service for other users.