Bluesky Scraper | All-In-One | $1.5 / 1K avatar

Bluesky Scraper | All-In-One | $1.5 / 1K

Pricing

$1.49 / 1,000 results

Go to Apify Store
Bluesky Scraper | All-In-One | $1.5 / 1K

Bluesky Scraper | All-In-One | $1.5 / 1K

Extract Bluesky posts and full comment threads from searches, subreddits, user pages, and direct post URLs. Built for enterprise-grade speed, richest-in-class data coverage, advanced filtering, and clean JSON for market intelligence, sentiment analysis, and analytics.

Pricing

$1.49 / 1,000 results

Rating

5.0

(2)

Developer

Fatih Tahta

Fatih Tahta

Maintained by Community

Actor stats

2

Bookmarked

60

Total users

20

Monthly active users

a day ago

Last modified

Share

Bluesky Scraper

Slug: fatihtahta/bluesky-scraper

Overview

Bluesky Scraper collects structured public data from https://bsky.app, including posts, replies, profiles, followers, and follows, with key attributes such as author identity, text content, timestamps, engagement counts, and record URLs. Bluesky is a public social network where conversations, creators, communities, and topic-specific discourse are visible through https://bsky.app, making it useful for research, monitoring, enrichment, and reporting workflows. The actor is built for automated, repeatable collection so teams can run the same input configuration on a schedule and receive machine-readable JSON records in a consistent format. The output is structured for downstream systems, helping reduce manual cleanup before analytics, storage, or enrichment. It is well suited to recurring public-data acquisition where stable record handling and operational consistency matter.

Why Use This Actor

  • Market research and analytics teams: collect structured public conversation and profile data for market intelligence, topic analysis, audience mapping, and operational reporting.
  • Product and content teams: monitor posts, replies, and account activity around keywords or specific authors to support editorial planning, community analysis, and content feedback loops.
  • Developers and data engineering teams: feed normalized JSON records into ETL jobs, warehouses, search systems, and downstream APIs with repeatable collection inputs.
  • Lead generation and enrichment teams: build public profile and network datasets that can support enrichment pipelines, account research, and contact prioritization workflows.
  • Monitoring and competitive tracking teams: run recurring searches or account-based collection for change tracking, trend detection, and alert-oriented monitoring workflows.

Common Use Cases

  • Market intelligence: track topic-level discussion, repost velocity, likes, reply volume, and public author activity for ongoing analysis.
  • Lead generation: collect profile, follower, or follows data to assemble targeted prospect lists from public accounts and communities.
  • Competitive monitoring: monitor named accounts or topic queries over time to spot message shifts, engagement changes, or notable posts.
  • Catalog and directory building: populate internal databases with normalized public profiles, network records, and post metadata.
  • Data enrichment: add current public attributes such as author names, bios, counts, URLs, and engagement metrics to internal CRM or BI records.
  • Recurring reporting: schedule repeated runs to refresh dashboards, weekly summaries, or alert pipelines with current public data.
  • Conversation review: collect posts with replies to analyze thread context, response patterns, or sentiment in public discussions.

Quick Start

  1. Choose the action that matches your goal, such as searching posts, collecting an author feed, getting followers, getting follows, or retrieving profile details.
  2. Add one or more queries values, or provide Bluesky pages in startUrls. You can also use both in the same run.
  3. Set maxItems to a small number for the first run so you can validate the output shape before scaling up.
  4. Run the actor in Apify Console and wait for the dataset to populate.
  5. Inspect the first few dataset items to confirm the record type, fields, and volume match your workflow.
  6. Increase the scope, enable optional enrichments, or schedule recurring runs once the initial validation looks correct.

Input Parameters

Use queries together with the action below to define the collection scope, or use startUrls to scrape Bluesky pages directly.

ParameterTypeDescriptionDefault
actionToPerformstringSelects what to collect. Allowed values: searchPosts, getAuthorFeed, getFollowers, getFollows, getProfile.searchPosts
queriesarray of stringsOne or more input values to process. Use keywords for searchPosts and Bluesky handles for user-focused actions such as getAuthorFeed, getFollowers, getFollows, and getProfile.
startUrlsarray of URLsOne or more Bluesky URLs to scrape directly. Supports search pages, post pages, profile pages, followers pages, and follows/following pages. Can be combined with queries.
maxItemsintegerMaximum number of primary records to collect across all queries. Leave empty to collect as many matching records as are available. Minimum: 1.
sortOrderstringSort order for searchPosts. Allowed values: latest, top. This field applies only to keyword-based post search.latest
dateFrom / dateTostringOptional posting date window in YYYY-MM-DD format. The actor treats these as UTC dates, adds UTC search operators, and applies the same boundaries locally to post-like records.
languagestringOptional language selection. For post search, the actor appends lang:<code> to each query.
fromAuthorstringOptional Bluesky handle or DID. For post search, the actor appends from:<handle> to each query.
mentionsAuthorstringOptional Bluesky handle or DID. For post search, the actor appends mentions:<handle> to each query.
hashtagsarray of stringsOptional hashtags to append to post search queries. Values can be entered with or without #.
domainstringOptional linked domain filter. For post search, the actor appends domain:<domain>.
exactUrlstringOptional exact shared URL to append to post search queries.
minLikes / minReposts / minRepliesintegerOptional local engagement filters applied after Bluesky returns post-like records.
includeRepliesbooleanWhen false, skips saved post records that are replies to another post. This does not enable or disable thread reply scraping.true
includeRepostsbooleanWhen false, skips feed/search records returned because they were reposted by another account.true
scrapeCommentsbooleanWhen enabled, the actor also collects replies for each post in searchPosts and getAuthorFeed runs. This expands the dataset with separate comment records.false
maxCommentsintegerMaximum number of replies to collect per source post when scrapeComments is enabled. Minimum: 1.50000
sentiment_analysisbooleanAdds sentiment_score and sentiment_label to supported post and comment records.false
content_analysisbooleanAdds content-category fields to supported post records for topic-oriented grouping and filtering.false

Choosing Inputs

Choose actionToPerform first, then supply queries in the format that action expects. Keyword-based searches are best when you want discovery across public posts, while handle-based actions are better for known accounts and curated monitoring lists. Alternatively, use startUrls when you already have Bluesky pages to collect from.

For startUrls, search URLs use the search query in the URL, post URLs collect the individual post thread, profile URLs collect both the profile record and author feed, followers URLs collect followers, and follows/following URLs collect follows.

If your use case is broad discovery, use wider keyword inputs and leave optional enrichments off for a quick validation pass. If your use case is more targeted, narrower queries and account-specific actions usually produce cleaner datasets with less downstream filtering.

When maxItems is available for your run, start small to validate field coverage and record types, then increase the limit once you confirm the output matches your use case. For searchPosts, sortOrder changes whether the run prioritizes recency or more prominent results. For thread-level analysis, enable scrapeComments; for lighter monitoring runs, leave it off to keep the output focused on primary records.

Advanced search fields make Bluesky's native search operators easier to use. For example, language: "en", fromAuthor: "bsky.app", hashtags: ["science"], and domain: "npr.org" are appended to each search query as lang:en from:bsky.app #science domain:npr.org. Date filters use UTC: dateFrom: "2026-04-01" becomes since:2026-04-01T00:00:00.000Z, and dateTo: "2026-04-24" becomes until:2026-04-25T00:00:00.000Z so the full UTC day is included. Engagement, reply, and repost filters are applied locally after Bluesky returns records.

Example Inputs

Scenario: keyword-based post search with replies and enrichments

{
"actionToPerform": "searchPosts",
"queries": ["open source intelligence", "threat intel"],
"maxItems": 40,
"sortOrder": "latest",
"scrapeComments": true,
"maxComments": 25,
"sentiment_analysis": true,
"content_analysis": true
}

Scenario: language and date-filtered market research

{
"actionToPerform": "searchPosts",
"queries": ["science"],
"language": "en",
"dateFrom": "2026-04-01",
"dateTo": "2026-04-24",
"minLikes": 1,
"includeReplies": false,
"includeReposts": false,
"maxItems": 15
}

Scenario: brand monitoring with author and mention filters

{
"actionToPerform": "searchPosts",
"queries": ["product feedback"],
"fromAuthor": "example.com",
"mentionsAuthor": "support.example.com",
"sortOrder": "latest",
"maxItems": 50
}

Scenario: domain tracking for shared links

{
"actionToPerform": "searchPosts",
"queries": ["climate"],
"domain": "npr.org",
"hashtags": ["science"],
"sortOrder": "top",
"maxItems": 100
}

Scenario: author feed monitoring for a known account

{
"actionToPerform": "getAuthorFeed",
"queries": ["404media.co"],
"maxItems": 30,
"scrapeComments": true,
"maxComments": 15,
"sentiment_analysis": true
}

Scenario: follower collection for enrichment or audience mapping

{
"actionToPerform": "getFollowers",
"queries": ["samleecole.bsky.social"],
"maxItems": 100,
"sentiment_analysis": false,
"content_analysis": false
}

Scenario: scrape direct Bluesky URLs

{
"startUrls": [
{ "url": "https://bsky.app/search?q=open%20source%20intelligence" },
{ "url": "https://bsky.app/profile/404media.co" },
{ "url": "https://bsky.app/profile/bsky.app/post/3kxyzexample" }
],
"maxItems": 25,
"scrapeComments": true,
"maxComments": 10
}

Output

9.1 Output destination

The actor writes results to an Apify dataset as JSON records. The dataset is designed for direct consumption by analytics tools, ETL pipelines, and downstream APIs without post-processing.

Each item contains a stable record envelope plus a type-specific payload. In this actor, the record-type discriminator is the top-level kind field.

9.2 Record envelope (all items)

  • kind (string, required): record type discriminator. Observed values include post, comment, profile, and follower.
  • id (string, required): stable record identifier emitted by the actor for the collected entity.
  • url (string, required): canonical public Bluesky URL for the record.

Recommended idempotency key: kind + ":" + id

Use that key for deduplication and upserts when syncing repeated runs into warehouses, CRMs, search indexes, or application databases. The stable envelope makes records easier to merge, deduplicate, and keep in sync across recurring runs.

9.3 Examples

Example: post (kind = "post")

{
"kind": "post",
"query": "404media.co",
"id": "3mk65zjpcpk25",
"uri": "at://did:plc:pt47oe625rv5cnrkgvntwbiq/app.bsky.feed.post/3mk65zjpcpk25",
"cid": "bafyreiffn73pycwsu7kwysldkimonzph63oqdgmqrjedwluywqnbqgfzfu",
"authorHandle": "samleecole.bsky.social",
"authorDid": "did:plc:pt47oe625rv5cnrkgvntwbiq",
"authorName": "Sam Cole",
"authorAvatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:pt47oe625rv5cnrkgvntwbiq/bafkreif5ko3c7sbtv56cqhpqgwhiru54xozwiu6t5no5pr7xi2sbi4e3uy",
"text": "super interesting series of experiments here, where researchers cosplayed as a vulnerable user in various scenarios, like romance-seeking or delusions of grandeur, and watched how chatbots responded over the course of 100+ turns: www.404media.co/delusion-usi...",
"createdAt": "2026-04-23T13:55:07.278Z",
"indexedAt": "2026-04-23T13:55:18.457Z",
"labels": [],
"languages": [
"en"
],
"facets": [
{
"features": [
{
"$type": "app.bsky.richtext.facet#link",
"uri": "https://www.404media.co/delusion-using-chatgpt-gemini-claude-grok-safety-ai-psychosis-study/"
}
],
"index": {
"byteEnd": 261,
"byteStart": 230
}
}
],
"viewer": {
"threadMuted": false
},
"reason": {
"type": "repost",
"by": {
"did": "did:plc:vcepp6trx4vpe5ourxso4tjl",
"handle": "404media.co",
"displayName": "404 Media",
"avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:vcepp6trx4vpe5ourxso4tjl/bafkreiee23yjug2vlf3b3dj6lws32iqoug3jd6y5ciwvgo5qf2rc2wgfli"
},
"indexedAt": "2026-04-23T14:18:09.851Z"
},
"embed": {
"type": "external",
"uri": "https://www.404media.co/delusion-using-chatgpt-gemini-claude-grok-safety-ai-psychosis-study/",
"title": "Researchers Simulated a Delusional User to Test Chatbot Safety",
"description": "Grok and Gemini encouraged delusions and isolated users, while the newer ChatGPT model and Claude hit the emotional brakes.",
"thumb": "https://cdn.bsky.app/img/feed_thumbnail/plain/did:plc:pt47oe625rv5cnrkgvntwbiq/bafkreidcpedhwolkwimlcbndl2poywims55lzwhqdvi3336u4gocih6dku"
},
"replyCount": 3,
"repostCount": 57,
"likeCount": 189,
"url": "https://bsky.app/profile/did:plc:pt47oe625rv5cnrkgvntwbiq/post/3mk65zjpcpk25",
"content_category_label": "Artificial Intelligence",
"content_category_path": [
"Technology & Computing",
"Artificial Intelligence"
],
"content_category_confidence": 0.92,
"content_category_match_type": "exact_alias",
"sentiment_score": 5,
"sentiment_label": "positive"
}

Example: comment (kind = "comment")

{
"kind": "comment",
"query": "404media.co",
"id": "3mk66abcxyz2q",
"uri": "at://did:plc:replyexample123/app.bsky.feed.post/3mk66abcxyz2q",
"cid": "bafyreibcommentexamplecid1234567890",
"authorHandle": "reply-user.bsky.social",
"authorDid": "did:plc:replyexample123",
"authorName": "Reply User",
"authorAvatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:replyexample123/bafkreicomentavatar",
"text": "This is a useful write-up. The thread adds a lot of context to the reporting.",
"createdAt": "2026-04-23T14:02:11.000Z",
"indexedAt": "2026-04-23T14:02:20.000Z",
"labels": [],
"languages": [
"en"
],
"facets": [],
"viewer": {
"threadMuted": false
},
"threadgate": {
"uri": "at://did:plc:replyexample123/app.bsky.feed.threadgate/3mk66abcxyz2q",
"cid": "bafyreibthreadgateexample",
"allowLists": []
},
"replyParentUri": "at://did:plc:pt47oe625rv5cnrkgvntwbiq/app.bsky.feed.post/3mk65zjpcpk25",
"replyRootUri": "at://did:plc:pt47oe625rv5cnrkgvntwbiq/app.bsky.feed.post/3mk65zjpcpk25",
"embed": {
"type": "unknown",
"sourceType": "app.bsky.embed.record#view"
},
"replyCount": 0,
"repostCount": 1,
"likeCount": 7,
"url": "https://bsky.app/profile/did:plc:replyexample123/post/3mk66abcxyz2q",
"sentiment_score": 3,
"sentiment_label": "positive",
"sourcePostId": "3mk65zjpcpk25",
"sourcePostUri": "at://did:plc:pt47oe625rv5cnrkgvntwbiq/app.bsky.feed.post/3mk65zjpcpk25",
"sourcePostUrl": "https://bsky.app/profile/did:plc:pt47oe625rv5cnrkgvntwbiq/post/3mk65zjpcpk25",
"sourcePostAuthorHandle": "samleecole.bsky.social",
"commentDepth": 1
}

Example: profile (kind = "profile")

{
"kind": "profile",
"query": "404media.co",
"id": "did:plc:vcepp6trx4vpe5ourxso4tjl",
"uri": "did:plc:vcepp6trx4vpe5ourxso4tjl",
"authorHandle": "404media.co",
"authorName": "404 Media",
"avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:vcepp6trx4vpe5ourxso4tjl/bafkreiexampleavatar",
"banner": "https://cdn.bsky.app/img/banner/plain/did:plc:vcepp6trx4vpe5ourxso4tjl/bafkreiexamplebanner",
"text": "Independent journalism covering technology, media, and online culture.",
"followersCount": 123456,
"followsCount": 128,
"postsCount": 5421,
"createdAt": "2023-04-12T09:22:51.000Z",
"indexedAt": "2026-04-24T08:15:00.000Z",
"labels": [],
"associated": {
"lists": 2,
"feedgens": 1,
"starterPacks": 0,
"labeler": false,
"chatAllowIncoming": "all",
"activitySubscriptionAllowSubscriptions": "followers"
},
"joinedViaStarterPack": {
"uri": "at://did:plc:creator123/app.bsky.graph.starterpack/3abcstarter",
"cid": "bafyreistarterpackcid123",
"creatorDid": "did:plc:creator123",
"creatorHandle": "creator.bsky.social",
"listItemCount": 25,
"joinedWeekCount": 10,
"joinedAllTimeCount": 640,
"indexedAt": "2026-04-20T10:00:00.000Z"
},
"viewer": {
"muted": false,
"blockedBy": false,
"followingUri": "at://did:plc:viewer123/app.bsky.graph.follow/3followsubject",
"followedByUri": "at://did:plc:vcepp6trx4vpe5ourxso4tjl/app.bsky.graph.follow/3followviewer",
"knownFollowersCount": 3
},
"url": "https://bsky.app/profile/404media.co"
}

Example: follower (kind = "follower")

{
"kind": "follower",
"query": "404media.co",
"id": "did:plc:followerexample123",
"uri": "did:plc:followerexample123",
"authorHandle": "analyst.bsky.social",
"authorName": "Industry Analyst",
"text": "Researching online communities, media systems, and platform behavior.",
"avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:followerexample123/bafkreifolloweravatar",
"createdAt": "2024-01-10T12:00:00.000Z",
"labels": [],
"indexedAt": "2026-04-24T08:10:00.000Z",
"associated": {
"lists": 0,
"feedgens": 0,
"starterPacks": 1,
"labeler": false,
"chatAllowIncoming": "none",
"activitySubscriptionAllowSubscriptions": "followers"
},
"viewer": {
"muted": false,
"blockedBy": false,
"followingUri": "at://did:plc:viewer123/app.bsky.graph.follow/3followanalyst",
"knownFollowersCount": 1
},
"url": "https://bsky.app/profile/analyst.bsky.social"
}

Field Reference

post

  • kind (string, required): Record type. Always post.
  • query (string, required): Input query that produced the record.
  • id (string, required): Post identifier.
  • uri (string, required): AT URI for the post.
  • cid (string, optional): Content identifier for the post version.
  • authorHandle (string, required): Author handle.
  • authorDid (string, required): Author decentralized identifier.
  • authorName (string, optional): Author display name.
  • authorAvatar (string, optional): Author avatar URL.
  • text (string, optional): Post text.
  • createdAt (string, optional): Post creation timestamp in ISO 8601 format.
  • indexedAt (string, optional): Record indexing timestamp in ISO 8601 format.
  • labels (array, optional): Source labels attached to the record.
  • languages (array of strings, optional): Language codes detected or supplied for the post.
  • facets (array, optional): Rich-text facets such as links or tags.
  • viewer.likeUri / viewer.repostUri (string, optional): Viewer relationship URIs when available.
  • viewer.threadMuted / viewer.replyDisabled (boolean, optional): Viewer state flags for the post.
  • threadgate.uri / threadgate.cid (string, optional): Thread gate identifiers when present.
  • threadgate.allowLists (array, optional): Lists allowed by the thread gate.
  • feedContext (string, optional): Feed context value when present in author-feed output.
  • reason.type (string, optional): Reason type. Currently repost when present.
  • reason.by.did / reason.by.handle / reason.by.displayName / reason.by.avatar (string, optional): Basic details for the actor that caused the reason context.
  • reason.indexedAt (string, optional): Reason timestamp in ISO 8601 format.
  • replyParentUri / replyRootUri (string, optional): Parent and root post URIs for replies.
  • embed.type (string, optional): Embed discriminator such as images, external, video, recordWithMedia, record, or unknown.
  • embed.images[].alt / embed.images[].fullsize / embed.images[].thumb (string, optional): Image embed metadata.
  • embed.images[].aspectRatio.width / embed.images[].aspectRatio.height (number, optional): Image aspect-ratio values.
  • embed.uri / embed.title / embed.description / embed.thumb (string, optional): External embed metadata.
  • embed.cid / embed.playlist / embed.thumbnail / embed.alt (string, optional): Video embed metadata.
  • embed.aspectRatio.width / embed.aspectRatio.height (number, optional): Video aspect-ratio values.
  • embed.record.type (string, optional): Embedded record subtype.
  • embed.record.uri / embed.record.cid (string, optional): Embedded record identifiers.
  • embed.record.author.did / embed.record.author.handle / embed.record.author.displayName / embed.record.author.avatar (string, optional): Embedded post author details.
  • embed.record.text / embed.record.createdAt / embed.record.indexedAt (string, optional): Embedded post text and timestamps.
  • embed.record.labels / embed.record.languages / embed.record.facets (array, optional): Embedded post metadata collections.
  • embed.record.replyCount / embed.record.repostCount / embed.record.likeCount / embed.record.listItemCount / embed.record.joinedWeekCount / embed.record.joinedAllTimeCount (number, optional): Embedded record counters when relevant.
  • embed.record.did / embed.record.displayName / embed.record.description / embed.record.avatar / embed.record.authorDid / embed.record.purpose / embed.record.sourceType (string, optional): Embedded generator, list, blocked, or unknown-record metadata when available.
  • embed.record.creator.did / embed.record.creator.handle / embed.record.creator.displayName / embed.record.creator.avatar (string, optional): Embedded labeler creator details when present.
  • embed.record.embeds (array, optional): Nested embeds on embedded post records.
  • embed.media (object, optional): Media payload for recordWithMedia embeds.
  • replyCount / repostCount / likeCount (number, optional): Public engagement counters.
  • url (string, required): Canonical public post URL.
  • content_category_label (string, optional): Content category label when content analysis is enabled and a match is available.
  • content_category_path (array of strings, optional): Category path from broad to specific.
  • content_category_confidence (number, optional): Match confidence score when available.
  • content_category_match_type (string, optional): Match type used for the classification.
  • sentiment_score (number, optional): Numeric sentiment score when sentiment analysis is enabled.
  • sentiment_label (string, optional): Sentiment label such as positive, negative, or neutral.

comment

  • kind (string, required): Record type. Always comment.
  • query (string, required): Input query that produced the record.
  • id / uri / cid / authorHandle / authorDid / authorName / authorAvatar / text / createdAt / indexedAt / labels / languages / facets / viewer / threadgate / replyParentUri / replyRootUri / embed / replyCount / repostCount / likeCount / url (same types as post, optional where noted above): Core reply fields inherited from post records.
  • sourcePostId (string, optional): Identifier of the source post whose thread was collected.
  • sourcePostUri (string, optional): AT URI of the source post.
  • sourcePostUrl (string, optional): Canonical public URL of the source post.
  • sourcePostAuthorHandle (string, optional): Handle of the source post author.
  • content_category_label / content_category_path / content_category_confidence / content_category_match_type (string / array / number, optional): Comment text category fields when content analysis is enabled.
  • sourcePostContentCategoryLabel / sourcePostContentCategoryPath / sourcePostContentCategoryConfidence / sourcePostContentCategoryMatchType (string / array / number, optional): Category fields inherited from the source post when content analysis is enabled.
  • commentDepth (number, required): Reply depth within the collected thread.
  • sentiment_score / sentiment_label (number / string, optional): Sentiment fields when sentiment analysis is enabled.

profile

  • kind (string, required): Record type. Always profile.
  • query (string, required): Input handle that produced the record.
  • id (string, required): Profile DID.
  • uri (string, required): Profile DID, emitted as the canonical profile identifier.
  • authorHandle (string, required): Profile handle.
  • authorName (string, optional): Display name.
  • avatar / banner (string, optional): Public media URLs for the profile.
  • text (string, optional): Profile description or bio.
  • followersCount / followsCount / postsCount (number, optional): Public count fields.
  • createdAt / indexedAt (string, optional): Profile timestamps in ISO 8601 format.
  • labels (array, optional): Source labels attached to the profile.
  • associated.lists / associated.feedgens / associated.starterPacks (number, optional): Public associated-object counts.
  • associated.labeler (boolean, optional): Whether the account is associated with a labeler.
  • associated.chatAllowIncoming / associated.activitySubscriptionAllowSubscriptions (string, optional): Public account setting values when available.
  • joinedViaStarterPack.uri / joinedViaStarterPack.cid (string, optional): Starter-pack identifiers when present.
  • joinedViaStarterPack.creatorDid / joinedViaStarterPack.creatorHandle (string, optional): Starter-pack creator identifiers.
  • joinedViaStarterPack.listItemCount / joinedViaStarterPack.joinedWeekCount / joinedViaStarterPack.joinedAllTimeCount (number, optional): Starter-pack counters.
  • joinedViaStarterPack.indexedAt (string, optional): Starter-pack indexing timestamp.
  • viewer.muted / viewer.blockedBy (boolean, optional): Viewer state flags.
  • viewer.blockingUri / viewer.followingUri / viewer.followedByUri (string, optional): Viewer relationship URIs when available.
  • viewer.mutedByList.uri / viewer.mutedByList.cid / viewer.mutedByList.name / viewer.mutedByList.purpose / viewer.mutedByList.avatar (string, optional): Muted-list metadata when present.
  • viewer.mutedByList.listItemCount (number, optional): Muted-list item count.
  • viewer.blockingByList.uri / viewer.blockingByList.cid / viewer.blockingByList.name / viewer.blockingByList.purpose / viewer.blockingByList.avatar (string, optional): Blocking-list metadata when present.
  • viewer.blockingByList.listItemCount (number, optional): Blocking-list item count.
  • viewer.knownFollowersCount (number, optional): Count of known followers visible to the viewer.
  • url (string, required): Canonical public profile URL.

follower

  • kind (string, required): Record type. Always follower.
  • query (string, required): Input handle that produced the record.
  • id (string, required): Profile DID for the follower or followed account.
  • uri (string, required): DID emitted as the canonical profile identifier.
  • authorHandle (string, required): Handle of the returned account.
  • authorName (string, optional): Display name.
  • text (string, optional): Profile description or bio.
  • avatar (string, optional): Public avatar URL.
  • createdAt / indexedAt (string, optional): Timestamps in ISO 8601 format.
  • labels (array, optional): Source labels attached to the profile.
  • associated.lists / associated.feedgens / associated.starterPacks (number, optional): Public associated-object counts.
  • associated.labeler (boolean, optional): Whether the account is associated with a labeler.
  • associated.chatAllowIncoming / associated.activitySubscriptionAllowSubscriptions (string, optional): Public account setting values when available.
  • viewer.muted / viewer.blockedBy (boolean, optional): Viewer state flags.
  • viewer.blockingUri / viewer.followingUri / viewer.followedByUri (string, optional): Viewer relationship URIs when available.
  • viewer.mutedByList.uri / viewer.mutedByList.cid / viewer.mutedByList.name / viewer.mutedByList.purpose / viewer.mutedByList.avatar (string, optional): Muted-list metadata when present.
  • viewer.mutedByList.listItemCount (number, optional): Muted-list item count.
  • viewer.blockingByList.uri / viewer.blockingByList.cid / viewer.blockingByList.name / viewer.blockingByList.purpose / viewer.blockingByList.avatar (string, optional): Blocking-list metadata when present.
  • viewer.blockingByList.listItemCount (number, optional): Blocking-list item count.
  • viewer.knownFollowersCount (number, optional): Count of known followers visible to the viewer.
  • url (string, required): Canonical public profile URL.

Data Quality, Guarantees, And Handling

  • Structured records: results are normalized into predictable JSON objects for downstream use.
  • Best-effort extraction: fields may vary by region, session state, availability, or target-side presentation changes.
  • Optional fields: null-check in downstream code.
  • Deduplication: recommend kind + ":" + id.
  • Freshness: results reflect the publicly available data at run time.
  • Repeated runs: use the recommended idempotency key when syncing data into warehouses, CRMs, or search indexes.

Tips For Best Results

  • Start with a small maxItems value to validate the output shape before scaling up.
  • Use searchPosts when you want discovery across public discussions; use handle-based actions when you already know the accounts you want.
  • Enable scrapeComments only when thread context is important to your workflow.
  • Increase maxComments gradually so you can see how reply collection affects dataset size and run duration.
  • Use sortOrder: "latest" for monitoring and sortOrder: "top" for prominence-oriented review.
  • Turn on sentiment_analysis or content_analysis only when those enrichment fields are needed downstream.
  • Store records with kind + ":" + id so repeated runs can be merged cleanly over time.

How to Run on Apify

  1. Open the actor in Apify Console.
  2. Configure the available input fields for the target scope.
  3. Set the maximum number of outputs to collect.
  4. Click Start and wait for the run to finish.
  5. Review the dataset and export results in JSON, CSV, Excel, or another supported format.

Scheduling & Automation

Scheduling

Automated Data Collection

You can schedule runs to keep post, profile, or network datasets current without manual intervention. This is useful for recurring monitoring, reporting, and enrichment workflows.

  • Navigate to Schedules in Apify Console
  • Create a new schedule (daily, weekly, or custom cron)
  • Configure input parameters
  • Enable notifications for run completion
  • Add webhooks for automated processing

Integration Options

  • BI dashboards: track engagement, activity, topic coverage, and account-level change over time.
  • Data warehouses: load normalized JSON records into historical reporting and analytics models.
  • Webhooks: trigger ingestion, validation, or alerting workflows after each completed run.
  • API access: pull datasets programmatically into internal applications and ETL pipelines.
  • Google Sheets or Excel review workflows: share smaller validation runs with non-technical stakeholders for review and triage.
  • Enrichment pipelines: join public Bluesky records with CRM, research, or audience datasets.

Export Formats And Downstream Use

Apify datasets can be exported directly or consumed programmatically, which makes the actor practical for both operational review and automated delivery into downstream systems.

  • JSON: for APIs, applications, and data pipelines.
  • CSV or Excel: for spreadsheet workflows and manual review.
  • API access: for automated ingestion into internal systems.
  • BI and warehouses: for reporting, dashboards, and historical analysis.

Performance

Estimated run times:

  • Small runs (< 1,000 outputs): ~3–5 minutes
  • Medium runs (1,000–5,000 outputs): ~5–15 minutes
  • Large runs (5,000+ outputs): ~15–30 minutes

Execution time varies based on filters, result volume, and how much information is returned per record. Highly filtered runs may finish faster, while broader discovery runs or detail-rich records can take longer.

Limitations

  • Availability depends on what https://bsky.app publicly exposes at run time.
  • Some optional fields may be missing when a post or profile does not provide that information publicly.
  • Very broad searches can take longer and may require a higher maxItems setting to capture the desired volume.
  • Output field availability can change when Bluesky changes public record presentation or metadata exposure.
  • Results can vary across accounts, topics, and public visibility conditions.

Troubleshooting

  • No results returned: check the queries or startUrls values, confirm the action matches the input type when using queries, and verify that the target keyword, account, or page has matching public records.
  • Fewer results than expected: broaden the query set, raise maxItems, or confirm that enough matching public records exist.
  • Some fields are empty: optional fields depend on what each post, reply, or profile publicly provides.
  • Run takes longer than expected: reduce scope, lower maxItems for validation, or split broad collection into smaller runs.
  • Output changed: compare the current dataset with the field reference and share a small sample if support is needed.

FAQ

What data does this actor collect?

It collects public Bluesky posts, replies, profiles, followers, and follows, depending on the selected action and input.

Can I filter by keyword or account?

Yes. Use queries for keywords in searchPosts and Bluesky handles for account-based actions such as getAuthorFeed, getFollowers, getFollows, and getProfile. You can also use startUrls with Bluesky search, post, profile, followers, and follows pages.

Why did I receive fewer results than my limit?

maxItems is an upper bound, not a guarantee. The final count depends on how many matching public records are available for the chosen action, queries, and start URLs.

Can I collect replies as well as posts?

Yes. Enable scrapeComments to collect replies for searchPosts, getAuthorFeed, and direct post URL runs.

Can I schedule recurring runs?

Yes. Apify schedules can run the actor automatically on a recurring timetable.

How do I avoid duplicates across runs?

Use kind + ":" + id as the idempotency key when storing or syncing records.

Can I export the data to CSV, Excel, or JSON?

Yes. Apify datasets support JSON export and tabular exports such as CSV and Excel-compatible formats.

Does this actor collect private data?

No. The actor is intended for publicly available information from https://bsky.app.

What should I include when reporting an issue?

Include the input you used with any sensitive values redacted, the Apify run ID, a short expected-versus-actual summary, and a small sample of the output if possible.

Compliance & Ethics

Responsible Data Collection

This actor collects publicly available Bluesky social data from https://bsky.app for legitimate business purposes, including:

  • Market intelligence research and trend analysis
  • Content and community monitoring
  • Data enrichment and reporting

Users are responsible for ensuring their collection and downstream use comply with applicable laws, regulations, and platform terms. This section is informational and not legal advice.

Best Practices

  • Use collected data in accordance with applicable laws, regulations, and the target site’s terms
  • Respect individual privacy and personal information
  • Use data responsibly and avoid disruptive or excessive collection
  • Do not use this actor for spamming, harassment, or other harmful purposes
  • Follow relevant data protection requirements where applicable (for example GDPR or CCPA)

Support

For help, use the actor page support channel or open an issue on the actor page. Include the input used with sensitive values redacted, the run ID, a short description of expected versus actual behavior, and an optional small output sample so the issue can be reproduced and reviewed efficiently.