Pricing

$1.49 / 1,000 results

Try for free

Go to Apify Store

Bluesky Scraper | Enterprise Grade

Try for free

Extract Bluesky posts and full comment threads from searches, subreddits, user pages, and direct post URLs. Built for enterprise-grade speed, richest-in-class data coverage, advanced filtering, and clean JSON for market intelligence, sentiment analysis, and analytics.

Pricing

$1.49 / 1,000 results

Rating

5.0

(2)

Developer

Fatih Tahta

Actor stats

Bookmarked

121

Total users

Monthly active users

0.048 hours

Issues response

a month ago

Last modified

Bluesky Scraper

Slug: fatihtahta/bluesky-scraper

Overview

Bluesky Scraper collects structured public data from https://bsky.app, including posts, replies, profiles, followers, and follows, with key attributes such as author identity, text content, timestamps, engagement counts, and record URLs. Bluesky is a public social network where conversations, creators, communities, and topic-specific discourse are visible through https://bsky.app, making it useful for research, monitoring, enrichment, and reporting workflows. The actor is built for automated, repeatable collection so teams can run the same input configuration on a schedule and receive machine-readable JSON records in a consistent format. The output is structured for downstream systems, helping reduce manual cleanup before analytics, storage, or enrichment. It is well suited to recurring public-data acquisition where stable record handling and operational consistency matter.

Why Use This Actor

Market research and analytics teams: collect structured public conversation and profile data for market intelligence, topic analysis, audience mapping, and operational reporting.
Product and content teams: monitor posts, replies, and account activity around keywords or specific authors to support editorial planning, community analysis, and content feedback loops.
Developers and data engineering teams: feed normalized JSON records into ETL jobs, warehouses, search systems, and downstream APIs with repeatable collection inputs.
Lead generation and enrichment teams: build public profile and network datasets that can support enrichment pipelines, account research, and contact prioritization workflows.
Monitoring and competitive tracking teams: run recurring searches or account-based collection for change tracking, trend detection, and alert-oriented monitoring workflows.

Common Use Cases

Market intelligence: track topic-level discussion, repost velocity, likes, reply volume, and public author activity for ongoing analysis.
Lead generation: collect profile, follower, or follows data to assemble targeted prospect lists from public accounts and communities.
Competitive monitoring: monitor named accounts or topic queries over time to spot message shifts, engagement changes, or notable posts.
Catalog and directory building: populate internal databases with normalized public profiles, network records, and post metadata.
Data enrichment: add current public attributes such as author names, bios, counts, URLs, and engagement metrics to internal CRM or BI records.
Recurring reporting: schedule repeated runs to refresh dashboards, weekly summaries, or alert pipelines with current public data.
Conversation review: collect posts with replies to analyze thread context, response patterns, or sentiment in public discussions.

Quick Start

Choose the action that matches your goal, such as searching posts, collecting an author feed, getting followers, getting follows, or retrieving profile details.
Add one or more queries values, or provide Bluesky pages in startUrls. You can also use both in the same run.
Set maxItems to a small number for the first run so you can validate the output shape before scaling up.
Run the actor in Apify Console and wait for the dataset to populate.
Inspect the first few dataset items to confirm the record type, fields, and volume match your workflow.
Increase the scope, enable optional enrichments, or schedule recurring runs once the initial validation looks correct.

Input Parameters

Use queries together with the action below to define the collection scope, or use startUrls to scrape Bluesky pages directly.

Parameter	Type	Description	Default
`actionToPerform`	string	Selects what to collect. Allowed values: `searchPosts`, `getAuthorFeed`, `getFollowers`, `getFollows`, `getProfile`.	`searchPosts`
`queries`	array of strings	One or more input values to process. Use keywords for `searchPosts` and Bluesky handles for user-focused actions such as `getAuthorFeed`, `getFollowers`, `getFollows`, and `getProfile`.	–
`startUrls`	array of URLs	One or more Bluesky URLs to scrape directly. Supports search pages, post pages, profile pages, followers pages, and follows/following pages. Can be combined with `queries`.	–
`maxItems`	integer	Maximum number of primary records to collect across all queries. Leave empty to collect as many matching records as are available. Minimum: `1`.	–
`sortOrder`	string	Sort order for `searchPosts`. Allowed values: `latest`, `top`. This field applies only to keyword-based post search.	`latest`
`dateFrom` / `dateTo`	string	Optional posting date window in `YYYY-MM-DD` format. The actor treats these as UTC dates, adds UTC search operators, and applies the same boundaries locally to post-like records.	–
`language`	string	Optional language selection. For post search, the actor appends `lang:<code>` to each query.	–
`fromAuthor`	string	Optional Bluesky handle or DID. For post search, the actor appends `from:<handle>` to each query.	–
`mentionsAuthor`	string	Optional Bluesky handle or DID. For post search, the actor appends `mentions:<handle>` to each query.	–
`hashtags`	array of strings	Optional hashtags to append to post search queries. Values can be entered with or without `#`.	–
`domain`	string	Optional linked domain filter. For post search, the actor appends `domain:<domain>`.	–
`exactUrl`	string	Optional exact shared URL to append to post search queries.	–
`minLikes` / `minReposts` / `minReplies`	integer	Optional local engagement filters applied after Bluesky returns post-like records.	–
`includeReplies`	boolean	When false, skips saved post records that are replies to another post. This does not enable or disable thread reply scraping.	`true`
`includeReposts`	boolean	When false, skips feed/search records returned because they were reposted by another account.	`true`
`scrapeComments`	boolean	When enabled, the actor also collects replies for each post in `searchPosts` and `getAuthorFeed` runs. This expands the dataset with separate comment records.	`false`
`maxComments`	integer	Maximum number of replies to collect per source post when `scrapeComments` is enabled. Minimum: `1`.	`50000`
`sentiment_analysis`	boolean	Adds `sentiment_score` and `sentiment_label` to supported post and comment records.	`false`
`content_analysis`	boolean	Adds content-category fields to supported post records for topic-oriented grouping and filtering.	`false`

Choosing Inputs

Choose actionToPerform first, then supply queries in the format that action expects. Keyword-based searches are best when you want discovery across public posts, while handle-based actions are better for known accounts and curated monitoring lists. Alternatively, use startUrls when you already have Bluesky pages to collect from.

For startUrls, search URLs use the search query in the URL, post URLs collect the individual post thread, profile URLs collect both the profile record and author feed, followers URLs collect followers, and follows/following URLs collect follows.

If your use case is broad discovery, use wider keyword inputs and leave optional enrichments off for a quick validation pass. If your use case is more targeted, narrower queries and account-specific actions usually produce cleaner datasets with less downstream filtering.

When maxItems is available for your run, start small to validate field coverage and record types, then increase the limit once you confirm the output matches your use case. For searchPosts, sortOrder changes whether the run prioritizes recency or more prominent results. For thread-level analysis, enable scrapeComments; for lighter monitoring runs, leave it off to keep the output focused on primary records.

Advanced search fields make Bluesky's native search operators easier to use. For example, language: "en", fromAuthor: "bsky.app", hashtags: ["science"], and domain: "npr.org" are appended to each search query as lang:en from:bsky.app #science domain:npr.org. Date filters use UTC: dateFrom: "2026-04-01" becomes since:2026-04-01T00:00:00.000Z, and dateTo: "2026-04-24" becomes until:2026-04-25T00:00:00.000Z so the full UTC day is included. Engagement, reply, and repost filters are applied locally after Bluesky returns records.

Example Inputs

Scenario: keyword-based post search with replies and enrichments

{
  "actionToPerform": "searchPosts",
  "queries": ["open source intelligence", "threat intel"],
  "maxItems": 40,
  "sortOrder": "latest",
  "scrapeComments": true,
  "maxComments": 25,
  "sentiment_analysis": true,
  "content_analysis": true
}

Scenario: language and date-filtered market research

{
  "actionToPerform": "searchPosts",
  "queries": ["science"],
  "language": "en",
  "dateFrom": "2026-04-01",
  "dateTo": "2026-04-24",
  "minLikes": 1,
  "includeReplies": false,
  "includeReposts": false,
  "maxItems": 15
}

Scenario: brand monitoring with author and mention filters

{
  "actionToPerform": "searchPosts",
  "queries": ["product feedback"],
  "fromAuthor": "example.com",
  "mentionsAuthor": "support.example.com",
  "sortOrder": "latest",
  "maxItems": 50
}

Scenario: domain tracking for shared links

{
  "actionToPerform": "searchPosts",
  "queries": ["climate"],
  "domain": "npr.org",
  "hashtags": ["science"],
  "sortOrder": "top",
  "maxItems": 100
}

Scenario: author feed monitoring for a known account

{
  "actionToPerform": "getAuthorFeed",
  "queries": ["404media.co"],
  "maxItems": 30,
  "scrapeComments": true,
  "maxComments": 15,
  "sentiment_analysis": true
}

Scenario: follower collection for enrichment or audience mapping

{
  "actionToPerform": "getFollowers",
  "queries": ["samleecole.bsky.social"],
  "maxItems": 100,
  "sentiment_analysis": false,
  "content_analysis": false
}

Scenario: scrape direct Bluesky URLs

{
  "startUrls": [
    { "url": "https://bsky.app/search?q=open%20source%20intelligence" },
    { "url": "https://bsky.app/profile/404media.co" },
    { "url": "https://bsky.app/profile/bsky.app/post/3kxyzexample" }
  ],
  "maxItems": 25,
  "scrapeComments": true,
  "maxComments": 10
}

Output

9.1 Output destination

The actor writes results to an Apify dataset as JSON records. The dataset is designed for direct consumption by analytics tools, ETL pipelines, and downstream APIs without post-processing.

Each item contains a stable record envelope plus a type-specific payload. In this actor, the record-type discriminator is the top-level kind field.

9.2 Record envelope (all items)

kind (string, required): record type discriminator. Observed values include post, comment, profile, and follower.
id (string, required): stable record identifier emitted by the actor for the collected entity.
url (string, required): canonical public Bluesky URL for the record.

Recommended idempotency key: kind + ":" + id

Use that key for deduplication and upserts when syncing repeated runs into warehouses, CRMs, search indexes, or application databases. The stable envelope makes records easier to merge, deduplicate, and keep in sync across recurring runs.

9.3 Examples

Example: post (kind = "post")

{
  "kind": "post",
  "query": "404media.co",
  "id": "3mk65zjpcpk25",
  "uri": "at://did:plc:pt47oe625rv5cnrkgvntwbiq/app.bsky.feed.post/3mk65zjpcpk25",
  "cid": "bafyreiffn73pycwsu7kwysldkimonzph63oqdgmqrjedwluywqnbqgfzfu",
  "authorHandle": "samleecole.bsky.social",
  "authorDid": "did:plc:pt47oe625rv5cnrkgvntwbiq",
  "authorName": "Sam Cole",
  "authorAvatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:pt47oe625rv5cnrkgvntwbiq/bafkreif5ko3c7sbtv56cqhpqgwhiru54xozwiu6t5no5pr7xi2sbi4e3uy",
  "text": "super interesting series of experiments here, where researchers cosplayed as a vulnerable user in various scenarios, like romance-seeking or delusions of grandeur, and watched how chatbots responded over the course of 100+ turns: www.404media.co/delusion-usi...",
  "createdAt": "2026-04-23T13:55:07.278Z",
  "indexedAt": "2026-04-23T13:55:18.457Z",
  "labels": [],
  "languages": [
    "en"
  ],
  "facets": [
    {
      "features": [
        {
          "$type": "app.bsky.richtext.facet#link",
          "uri": "https://www.404media.co/delusion-using-chatgpt-gemini-claude-grok-safety-ai-psychosis-study/"
        }
      ],
      "index": {
        "byteEnd": 261,
        "byteStart": 230
      }
    }
  ],
  "viewer": {
    "threadMuted": false
  },
  "reason": {
    "type": "repost",
    "by": {
      "did": "did:plc:vcepp6trx4vpe5ourxso4tjl",
      "handle": "404media.co",
      "displayName": "404 Media",
      "avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:vcepp6trx4vpe5ourxso4tjl/bafkreiee23yjug2vlf3b3dj6lws32iqoug3jd6y5ciwvgo5qf2rc2wgfli"
    },
    "indexedAt": "2026-04-23T14:18:09.851Z"
  },
  "embed": {
    "type": "external",
    "uri": "https://www.404media.co/delusion-using-chatgpt-gemini-claude-grok-safety-ai-psychosis-study/",
    "title": "Researchers Simulated a Delusional User to Test Chatbot Safety",
    "description": "Grok and Gemini encouraged delusions and isolated users, while the newer ChatGPT model and Claude hit the emotional brakes.",
    "thumb": "https://cdn.bsky.app/img/feed_thumbnail/plain/did:plc:pt47oe625rv5cnrkgvntwbiq/bafkreidcpedhwolkwimlcbndl2poywims55lzwhqdvi3336u4gocih6dku"
  },
  "replyCount": 3,
  "repostCount": 57,
  "likeCount": 189,
  "url": "https://bsky.app/profile/did:plc:pt47oe625rv5cnrkgvntwbiq/post/3mk65zjpcpk25",
  "content_category_label": "Artificial Intelligence",
  "content_category_path": [
    "Technology & Computing",
    "Artificial Intelligence"
  ],
  "content_category_confidence": 0.92,
  "content_category_match_type": "exact_alias",
  "sentiment_score": 5,
  "sentiment_label": "positive"
}

Example: comment (kind = "comment")

{
  "kind": "comment",
  "query": "404media.co",
  "id": "3mk66abcxyz2q",
  "uri": "at://did:plc:replyexample123/app.bsky.feed.post/3mk66abcxyz2q",
  "cid": "bafyreibcommentexamplecid1234567890",
  "authorHandle": "reply-user.bsky.social",
  "authorDid": "did:plc:replyexample123",
  "authorName": "Reply User",
  "authorAvatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:replyexample123/bafkreicomentavatar",
  "text": "This is a useful write-up. The thread adds a lot of context to the reporting.",
  "createdAt": "2026-04-23T14:02:11.000Z",
  "indexedAt": "2026-04-23T14:02:20.000Z",
  "labels": [],
  "languages": [
    "en"
  ],
  "facets": [],
  "viewer": {
    "threadMuted": false
  },
  "threadgate": {
    "uri": "at://did:plc:replyexample123/app.bsky.feed.threadgate/3mk66abcxyz2q",
    "cid": "bafyreibthreadgateexample",
    "allowLists": []
  },
  "replyParentUri": "at://did:plc:pt47oe625rv5cnrkgvntwbiq/app.bsky.feed.post/3mk65zjpcpk25",
  "replyRootUri": "at://did:plc:pt47oe625rv5cnrkgvntwbiq/app.bsky.feed.post/3mk65zjpcpk25",
  "embed": {
    "type": "unknown",
    "sourceType": "app.bsky.embed.record#view"
  },
  "replyCount": 0,
  "repostCount": 1,
  "likeCount": 7,
  "url": "https://bsky.app/profile/did:plc:replyexample123/post/3mk66abcxyz2q",
  "sentiment_score": 3,
  "sentiment_label": "positive",
  "sourcePostId": "3mk65zjpcpk25",
  "sourcePostUri": "at://did:plc:pt47oe625rv5cnrkgvntwbiq/app.bsky.feed.post/3mk65zjpcpk25",
  "sourcePostUrl": "https://bsky.app/profile/did:plc:pt47oe625rv5cnrkgvntwbiq/post/3mk65zjpcpk25",
  "sourcePostAuthorHandle": "samleecole.bsky.social",
  "commentDepth": 1
}

Example: profile (kind = "profile")

{
  "kind": "profile",
  "query": "404media.co",
  "id": "did:plc:vcepp6trx4vpe5ourxso4tjl",
  "uri": "did:plc:vcepp6trx4vpe5ourxso4tjl",
  "authorHandle": "404media.co",
  "authorName": "404 Media",
  "avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:vcepp6trx4vpe5ourxso4tjl/bafkreiexampleavatar",
  "banner": "https://cdn.bsky.app/img/banner/plain/did:plc:vcepp6trx4vpe5ourxso4tjl/bafkreiexamplebanner",
  "text": "Independent journalism covering technology, media, and online culture.",
  "followersCount": 123456,
  "followsCount": 128,
  "postsCount": 5421,
  "createdAt": "2023-04-12T09:22:51.000Z",
  "indexedAt": "2026-04-24T08:15:00.000Z",
  "labels": [],
  "associated": {
    "lists": 2,
    "feedgens": 1,
    "starterPacks": 0,
    "labeler": false,
    "chatAllowIncoming": "all",
    "activitySubscriptionAllowSubscriptions": "followers"
  },
  "joinedViaStarterPack": {
    "uri": "at://did:plc:creator123/app.bsky.graph.starterpack/3abcstarter",
    "cid": "bafyreistarterpackcid123",
    "creatorDid": "did:plc:creator123",
    "creatorHandle": "creator.bsky.social",
    "listItemCount": 25,
    "joinedWeekCount": 10,
    "joinedAllTimeCount": 640,
    "indexedAt": "2026-04-20T10:00:00.000Z"
  },
  "viewer": {
    "muted": false,
    "blockedBy": false,
    "followingUri": "at://did:plc:viewer123/app.bsky.graph.follow/3followsubject",
    "followedByUri": "at://did:plc:vcepp6trx4vpe5ourxso4tjl/app.bsky.graph.follow/3followviewer",
    "knownFollowersCount": 3
  },
  "url": "https://bsky.app/profile/404media.co"
}

Example: follower (kind = "follower")

{
  "kind": "follower",
  "query": "404media.co",
  "id": "did:plc:followerexample123",
  "uri": "did:plc:followerexample123",
  "authorHandle": "analyst.bsky.social",
  "authorName": "Industry Analyst",
  "text": "Researching online communities, media systems, and platform behavior.",
  "avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:followerexample123/bafkreifolloweravatar",
  "createdAt": "2024-01-10T12:00:00.000Z",
  "labels": [],
  "indexedAt": "2026-04-24T08:10:00.000Z",
  "associated": {
    "lists": 0,
    "feedgens": 0,
    "starterPacks": 1,
    "labeler": false,
    "chatAllowIncoming": "none",
    "activitySubscriptionAllowSubscriptions": "followers"
  },
  "viewer": {
    "muted": false,
    "blockedBy": false,
    "followingUri": "at://did:plc:viewer123/app.bsky.graph.follow/3followanalyst",
    "knownFollowersCount": 1
  },
  "url": "https://bsky.app/profile/analyst.bsky.social"
}

Field Reference

`post`

kind (string, required): Record type. Always post.
query (string, required): Input query that produced the record.
id (string, required): Post identifier.
uri (string, required): AT URI for the post.
cid (string, optional): Content identifier for the post version.
authorHandle (string, required): Author handle.
authorDid (string, required): Author decentralized identifier.
authorName (string, optional): Author display name.
authorAvatar (string, optional): Author avatar URL.
text (string, optional): Post text.
createdAt (string, optional): Post creation timestamp in ISO 8601 format.
indexedAt (string, optional): Record indexing timestamp in ISO 8601 format.
labels (array, optional): Source labels attached to the record.
languages (array of strings, optional): Language codes detected or supplied for the post.
facets (array, optional): Rich-text facets such as links or tags.
viewer.likeUri / viewer.repostUri (string, optional): Viewer relationship URIs when available.
viewer.threadMuted / viewer.replyDisabled (boolean, optional): Viewer state flags for the post.
threadgate.uri / threadgate.cid (string, optional): Thread gate identifiers when present.
threadgate.allowLists (array, optional): Lists allowed by the thread gate.
feedContext (string, optional): Feed context value when present in author-feed output.
reason.type (string, optional): Reason type. Currently repost when present.
reason.by.did / reason.by.handle / reason.by.displayName / reason.by.avatar (string, optional): Basic details for the actor that caused the reason context.
reason.indexedAt (string, optional): Reason timestamp in ISO 8601 format.
replyParentUri / replyRootUri (string, optional): Parent and root post URIs for replies.
embed.type (string, optional): Embed discriminator such as images, external, video, recordWithMedia, record, or unknown.
embed.images[].alt / embed.images[].fullsize / embed.images[].thumb (string, optional): Image embed metadata.
embed.images[].aspectRatio.width / embed.images[].aspectRatio.height (number, optional): Image aspect-ratio values.
embed.uri / embed.title / embed.description / embed.thumb (string, optional): External embed metadata.
embed.cid / embed.playlist / embed.thumbnail / embed.alt (string, optional): Video embed metadata.
embed.aspectRatio.width / embed.aspectRatio.height (number, optional): Video aspect-ratio values.
embed.record.type (string, optional): Embedded record subtype.
embed.record.uri / embed.record.cid (string, optional): Embedded record identifiers.
embed.record.author.did / embed.record.author.handle / embed.record.author.displayName / embed.record.author.avatar (string, optional): Embedded post author details.
embed.record.text / embed.record.createdAt / embed.record.indexedAt (string, optional): Embedded post text and timestamps.
embed.record.labels / embed.record.languages / embed.record.facets (array, optional): Embedded post metadata collections.
embed.record.replyCount / embed.record.repostCount / embed.record.likeCount / embed.record.listItemCount / embed.record.joinedWeekCount / embed.record.joinedAllTimeCount (number, optional): Embedded record counters when relevant.
embed.record.did / embed.record.displayName / embed.record.description / embed.record.avatar / embed.record.authorDid / embed.record.purpose / embed.record.sourceType (string, optional): Embedded generator, list, blocked, or unknown-record metadata when available.
embed.record.creator.did / embed.record.creator.handle / embed.record.creator.displayName / embed.record.creator.avatar (string, optional): Embedded labeler creator details when present.
embed.record.embeds (array, optional): Nested embeds on embedded post records.
embed.media (object, optional): Media payload for recordWithMedia embeds.
replyCount / repostCount / likeCount (number, optional): Public engagement counters.
url (string, required): Canonical public post URL.
content_category_label (string, optional): Content category label when content analysis is enabled and a match is available.
content_category_path (array of strings, optional): Category path from broad to specific.
content_category_confidence (number, optional): Match confidence score when available.
content_category_match_type (string, optional): Match type used for the classification.
sentiment_score (number, optional): Numeric sentiment score when sentiment analysis is enabled.
sentiment_label (string, optional): Sentiment label such as positive, negative, or neutral.

`comment`

kind (string, required): Record type. Always comment.
query (string, required): Input query that produced the record.
id / uri / cid / authorHandle / authorDid / authorName / authorAvatar / text / createdAt / indexedAt / labels / languages / facets / viewer / threadgate / replyParentUri / replyRootUri / embed / replyCount / repostCount / likeCount / url (same types as post, optional where noted above): Core reply fields inherited from post records.
sourcePostId (string, optional): Identifier of the source post whose thread was collected.
sourcePostUri (string, optional): AT URI of the source post.
sourcePostUrl (string, optional): Canonical public URL of the source post.
sourcePostAuthorHandle (string, optional): Handle of the source post author.
content_category_label / content_category_path / content_category_confidence / content_category_match_type (string / array / number, optional): Comment text category fields when content analysis is enabled.
sourcePostContentCategoryLabel / sourcePostContentCategoryPath / sourcePostContentCategoryConfidence / sourcePostContentCategoryMatchType (string / array / number, optional): Category fields inherited from the source post when content analysis is enabled.
commentDepth (number, required): Reply depth within the collected thread.
sentiment_score / sentiment_label (number / string, optional): Sentiment fields when sentiment analysis is enabled.

`profile`

kind (string, required): Record type. Always profile.
query (string, required): Input handle that produced the record.
id (string, required): Profile DID.
uri (string, required): Profile DID, emitted as the canonical profile identifier.
authorHandle (string, required): Profile handle.
authorName (string, optional): Display name.
avatar / banner (string, optional): Public media URLs for the profile.
text (string, optional): Profile description or bio.
followersCount / followsCount / postsCount (number, optional): Public count fields.
createdAt / indexedAt (string, optional): Profile timestamps in ISO 8601 format.
labels (array, optional): Source labels attached to the profile.
associated.lists / associated.feedgens / associated.starterPacks (number, optional): Public associated-object counts.
associated.labeler (boolean, optional): Whether the account is associated with a labeler.
associated.chatAllowIncoming / associated.activitySubscriptionAllowSubscriptions (string, optional): Public account setting values when available.
joinedViaStarterPack.uri / joinedViaStarterPack.cid (string, optional): Starter-pack identifiers when present.
joinedViaStarterPack.creatorDid / joinedViaStarterPack.creatorHandle (string, optional): Starter-pack creator identifiers.
joinedViaStarterPack.listItemCount / joinedViaStarterPack.joinedWeekCount / joinedViaStarterPack.joinedAllTimeCount (number, optional): Starter-pack counters.
joinedViaStarterPack.indexedAt (string, optional): Starter-pack indexing timestamp.
viewer.muted / viewer.blockedBy (boolean, optional): Viewer state flags.
viewer.blockingUri / viewer.followingUri / viewer.followedByUri (string, optional): Viewer relationship URIs when available.
viewer.mutedByList.uri / viewer.mutedByList.cid / viewer.mutedByList.name / viewer.mutedByList.purpose / viewer.mutedByList.avatar (string, optional): Muted-list metadata when present.
viewer.mutedByList.listItemCount (number, optional): Muted-list item count.
viewer.blockingByList.uri / viewer.blockingByList.cid / viewer.blockingByList.name / viewer.blockingByList.purpose / viewer.blockingByList.avatar (string, optional): Blocking-list metadata when present.
viewer.blockingByList.listItemCount (number, optional): Blocking-list item count.
viewer.knownFollowersCount (number, optional): Count of known followers visible to the viewer.
url (string, required): Canonical public profile URL.

`follower`

kind (string, required): Record type. Always follower.
query (string, required): Input handle that produced the record.
id (string, required): Profile DID for the follower or followed account.
uri (string, required): DID emitted as the canonical profile identifier.
authorHandle (string, required): Handle of the returned account.
authorName (string, optional): Display name.
text (string, optional): Profile description or bio.
avatar (string, optional): Public avatar URL.
createdAt / indexedAt (string, optional): Timestamps in ISO 8601 format.
labels (array, optional): Source labels attached to the profile.
associated.lists / associated.feedgens / associated.starterPacks (number, optional): Public associated-object counts.
associated.labeler (boolean, optional): Whether the account is associated with a labeler.
associated.chatAllowIncoming / associated.activitySubscriptionAllowSubscriptions (string, optional): Public account setting values when available.
viewer.muted / viewer.blockedBy (boolean, optional): Viewer state flags.
viewer.blockingUri / viewer.followingUri / viewer.followedByUri (string, optional): Viewer relationship URIs when available.
viewer.mutedByList.uri / viewer.mutedByList.cid / viewer.mutedByList.name / viewer.mutedByList.purpose / viewer.mutedByList.avatar (string, optional): Muted-list metadata when present.
viewer.mutedByList.listItemCount (number, optional): Muted-list item count.
viewer.blockingByList.uri / viewer.blockingByList.cid / viewer.blockingByList.name / viewer.blockingByList.purpose / viewer.blockingByList.avatar (string, optional): Blocking-list metadata when present.
viewer.blockingByList.listItemCount (number, optional): Blocking-list item count.
viewer.knownFollowersCount (number, optional): Count of known followers visible to the viewer.
url (string, required): Canonical public profile URL.

Data Quality, Guarantees, And Handling

Structured records: results are normalized into predictable JSON objects for downstream use.
Best-effort extraction: fields may vary by region, session state, availability, or target-side presentation changes.
Optional fields: null-check in downstream code.
Deduplication: recommend kind + ":" + id.
Freshness: results reflect the publicly available data at run time.
Repeated runs: use the recommended idempotency key when syncing data into warehouses, CRMs, or search indexes.

Tips For Best Results

Start with a small maxItems value to validate the output shape before scaling up.
Use searchPosts when you want discovery across public discussions; use handle-based actions when you already know the accounts you want.
Enable scrapeComments only when thread context is important to your workflow.
Increase maxComments gradually so you can see how reply collection affects dataset size and run duration.
Use sortOrder: "latest" for monitoring and sortOrder: "top" for prominence-oriented review.
Turn on sentiment_analysis or content_analysis only when those enrichment fields are needed downstream.
Store records with kind + ":" + id so repeated runs can be merged cleanly over time.

How to Run on Apify

Open the actor in Apify Console.
Configure the available input fields for the target scope.
Set the maximum number of outputs to collect.
Click Start and wait for the run to finish.
Review the dataset and export results in JSON, CSV, Excel, or another supported format.

Scheduling & Automation

Scheduling

Automated Data Collection

You can schedule runs to keep post, profile, or network datasets current without manual intervention. This is useful for recurring monitoring, reporting, and enrichment workflows.

Navigate to Schedules in Apify Console
Create a new schedule (daily, weekly, or custom cron)
Configure input parameters
Enable notifications for run completion
Add webhooks for automated processing

Integration Options

BI dashboards: track engagement, activity, topic coverage, and account-level change over time.
Data warehouses: load normalized JSON records into historical reporting and analytics models.
Webhooks: trigger ingestion, validation, or alerting workflows after each completed run.
API access: pull datasets programmatically into internal applications and ETL pipelines.
Google Sheets or Excel review workflows: share smaller validation runs with non-technical stakeholders for review and triage.
Enrichment pipelines: join public Bluesky records with CRM, research, or audience datasets.

Export Formats And Downstream Use

Apify datasets can be exported directly or consumed programmatically, which makes the actor practical for both operational review and automated delivery into downstream systems.

JSON: for APIs, applications, and data pipelines.
CSV or Excel: for spreadsheet workflows and manual review.
API access: for automated ingestion into internal systems.
BI and warehouses: for reporting, dashboards, and historical analysis.

Performance

Estimated run times:

Small runs (< 1,000 outputs): ~3–5 minutes
Medium runs (1,000–5,000 outputs): ~5–15 minutes
Large runs (5,000+ outputs): ~15–30 minutes

Execution time varies based on filters, result volume, and how much information is returned per record. Highly filtered runs may finish faster, while broader discovery runs or detail-rich records can take longer.

Limitations

Availability depends on what https://bsky.app publicly exposes at run time.
Some optional fields may be missing when a post or profile does not provide that information publicly.
Very broad searches can take longer and may require a higher maxItems setting to capture the desired volume.
Output field availability can change when Bluesky changes public record presentation or metadata exposure.
Results can vary across accounts, topics, and public visibility conditions.

Troubleshooting

No results returned: check the queries or startUrls values, confirm the action matches the input type when using queries, and verify that the target keyword, account, or page has matching public records.
Fewer results than expected: broaden the query set, raise maxItems, or confirm that enough matching public records exist.
Some fields are empty: optional fields depend on what each post, reply, or profile publicly provides.
Run takes longer than expected: reduce scope, lower maxItems for validation, or split broad collection into smaller runs.
Output changed: compare the current dataset with the field reference and share a small sample if support is needed.

FAQ

What data does this actor collect?

It collects public Bluesky posts, replies, profiles, followers, and follows, depending on the selected action and input.

Can I filter by keyword or account?

Yes. Use queries for keywords in searchPosts and Bluesky handles for account-based actions such as getAuthorFeed, getFollowers, getFollows, and getProfile. You can also use startUrls with Bluesky search, post, profile, followers, and follows pages.

Why did I receive fewer results than my limit?

maxItems is an upper bound, not a guarantee. The final count depends on how many matching public records are available for the chosen action, queries, and start URLs.

Can I collect replies as well as posts?

Yes. Enable scrapeComments to collect replies for searchPosts, getAuthorFeed, and direct post URL runs.

Can I schedule recurring runs?

Yes. Apify schedules can run the actor automatically on a recurring timetable.

How do I avoid duplicates across runs?

Use kind + ":" + id as the idempotency key when storing or syncing records.

Can I export the data to CSV, Excel, or JSON?

Yes. Apify datasets support JSON export and tabular exports such as CSV and Excel-compatible formats.

Does this actor collect private data?

No. The actor is intended for publicly available information from https://bsky.app.

What should I include when reporting an issue?

Include the input you used with any sensitive values redacted, the Apify run ID, a short expected-versus-actual summary, and a small sample of the output if possible.

Compliance & Ethics

Responsible Data Collection

This actor collects publicly available Bluesky social data from https://bsky.app for legitimate business purposes, including:

Market intelligence research and trend analysis
Content and community monitoring
Data enrichment and reporting

Users are responsible for ensuring their collection and downstream use comply with applicable laws, regulations, and platform terms. This section is informational and not legal advice.

Best Practices

Use collected data in accordance with applicable laws, regulations, and the target site’s terms
Respect individual privacy and personal information
Use data responsibly and avoid disruptive or excessive collection
Do not use this actor for spamming, harassment, or other harmful purposes
Follow relevant data protection requirements where applicable (for example GDPR or CCPA)

Support

For help, use the actor page support channel or open an issue on the actor page. Include the input used with sensitive values redacted, the run ID, a short description of expected versus actual behavior, and an optional small output sample so the issue can be reproduced and reviewed efficiently.

Reddit Scraper | Enterprise Grade

fatihtahta/reddit-scraper-search-fast

Extract Reddit posts and full comment threads from searches, subreddits, user pages, and direct post URLs. Built for enterprise-grade speed, richest-in-class data coverage, advanced filtering, and clean JSON for market intelligence, sentiment analysis and analytics.

Fatih Tahta

4.3K

4.1

Bluesky Posts Scraper

lexis-solutions/bluesky-posts-scraper

The Apify Bluesky Posts Scraper allows a programmatic search for posts on Bluesky and the option to export to CSV, JSON, Excel, or integration with Zapier, Make, or any custom workflow.

Lexis Solutions

257

4.5

Xing Jobs Scraper | Fast & Reliable

fatihtahta/xing-jobs-scraper

Extract structured Xing job listings including salaries, company data, roles, and hiring signals across DACH markets. Built for enterprise-grade hiring intelligence, talent market analysis, and automated recruiting or analytics pipelines.

Fatih Tahta

VivaReal Scraper| $1.5 / 1k | Fast & Reliable

fatihtahta/vivareal-scraper

Extract structured Brazil property listings from VivaReal Imoveis with pricing, locations, seller details, media assets, and contact data. Built for enterprise-grade Brazil real estate intelligence, listing monitoring, lead enrichment and automated analytics pipelines.

Fatih Tahta

Product Hunt Scraper with Founders & Emails

fatihtahta/product-hunt-scraper-fast-reliable-4-1k

Extract structured Product Hunt launches with founders, emails, votes, comments, topics, websites, social media and more. Built for enterprise-grade startup intelligence, founder discovery, market analysis, and automated lead enrichment or analytics pipelines.

Fatih Tahta

397

3.7

Stepstone Scraper | All-In-One

fatihtahta/stepstone-scraper-fast-reliable-4-1k

Extract job postings and hiring company details at scale from Stepstone .de, .at, .be, & .nl. Use direct URLs or build searches with filters for location, salary, & more. Get clean, structured job data Ideal for market research and job aggregation.

Fatih Tahta

356

4.3

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

ando

210

1.0

Alibaba Listings Scraper

piotrv1001/alibaba-listings-scraper

The Alibaba Listings Scraper extracts paginated product results from Alibaba based on a search query, capturing product titles, prices, images, URLs, discounts, delivery estimates, ratings, and reviews—ideal for market research and supplier analysis.

FalconScrape

572

5.0

BetaList Scraper

michael.g/betalist-scraper

Scrape data on BetaList startups and their makers from the BetaList topics directory. Export scraped data, schedule via API, and integrate with other tools or AI workflows.

Michael G

5.0

REMAX Real State Agents Scraper

scrapingxpert/remax-real-state-agents-scraper

RE/MAX Real-States Agents Scraper is an Apify actor designed to extract real estate agent data from RE/MAX. It efficiently gathers information such as agent names, contact details, office locations, and more ideal for real estate professionals, marketers, and lead generation.

scrapingxpert

3.6

Bluesky Scraper | Enterprise Grade

Bluesky Scraper

Overview

Why Use This Actor

Common Use Cases

Quick Start

Input Parameters

Choosing Inputs

Example Inputs

Output

9.1 Output destination

9.2 Record envelope (all items)

9.3 Examples

Field Reference

post

comment

profile

follower

Data Quality, Guarantees, And Handling

Tips For Best Results

How to Run on Apify

Scheduling & Automation

Scheduling

Integration Options

Export Formats And Downstream Use

Performance

Limitations

Troubleshooting

FAQ

Compliance & Ethics

Responsible Data Collection

Best Practices

Support

You might also like

Reddit Scraper | Enterprise Grade

Bluesky Posts Scraper

Xing Jobs Scraper | Fast & Reliable

VivaReal Scraper| $1.5 / 1k | Fast & Reliable

Product Hunt Scraper with Founders & Emails

Stepstone Scraper | All-In-One

Find Sitemap from url

Alibaba Listings Scraper

BetaList Scraper

REMAX Real State Agents Scraper

`post`

`comment`

`profile`

`follower`