Farcaster Hub Scraper

Pricing

Pay per event

Try for free

Go to Apify Store

Farcaster Hub Scraper

Try for free

Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and real-time events directly from Farcaster Hubs via HTTP API.

Pricing

Pay per event

Rating

5.0

(2)

Developer

BarriereFix

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

Features

✅ Protocol-First Design - Direct Hub HTTP API integration (no third-party dependencies) ✅ Three Ingestion Modes - Deterministic backfill by FIDs, time-bounded studies, or incremental event tailing ✅ Comprehensive Data - Casts, reactions (likes/recasts), follows, user profiles, and events ✅ Optional Enrichment - Parse Frames/Mini-Apps metadata from embedded URLs ✅ State Checkpointing - Migration-safe, resumable runs with automatic state persistence ✅ Rate Limiting & Retries - Production-grade reliability with exponential backoff ✅ Neynar v2 Support - Optional integration with Neynar hosted hubs ✅ Multiple Views - Pre-configured dataset views for easy data exploration

Who Uses This Actor?

🎯 Target Users

📊 Web3 Data Analysts & Researchers (Dune, Flipside)

Export Farcaster data to SQL databases for analytics dashboards
Track protocol growth, user engagement trends, and network effects
Cross-reference social data with onchain transactions

🛠️ Farcaster Frame/Mini-App Developers

Monitor Frame engagement and interaction patterns
Track which users interact with your Mini-Apps
Analyze viral content and user acquisition funnels

📢 Web3 Marketing Agencies & Brands

Track influencer campaigns and brand mentions
Measure content reach and engagement rates
Identify key opinion leaders in the Farcaster ecosystem

🎓 Academic Researchers

Study decentralized social network dynamics
Analyze information diffusion and community formation
Research Web3 social graph topology

Use Cases by Persona

📊 For Data Analysts

Influencer Ranking Dashboard

{
  "mode": "byFids",
  "fids": [2, 3, 6833, 5650, 7890],
  "include": {"casts": true, "reactions": true, "userData": true},
  "maxRecords": 50000
}

→ Export to Dune to calculate engagement rates, follower growth, content velocity

Protocol Growth Metrics

{
  "mode": "tailEvents",
  "maxRecords": 100000
}

→ Stream all events to track daily active users, network growth, retention

🛠️ For Frame Developers

Frame Interaction Analysis

{
  "mode": "byFids",
  "fids": [list of users who interacted],
  "include": {"casts": true, "reactions": true},
  "fetchEmbeds": true
}

→ Identify which casts contain your Frame, track engagement patterns

Real-Time Frame Monitoring

{
  "mode": "tailEvents",
  "tail": {"fromEventId": "latest"},
  "maxRecords": 10000
}

→ Get notified when users interact with your Frames in real-time

📢 For Marketing Agencies

Campaign Performance Tracking

{
  "mode": "byFids",
  "fids": [brand_account, influencer1, influencer2],
  "startTimestamp": 130000000,
  "stopTimestamp": 130100000,
  "include": {"casts": true, "reactions": true}
}

→ Measure campaign reach during specific time window

Influencer Discovery

{
  "mode": "byFids",
  "fids": [competitor_followers],
  "include": {"links": true, "userData": true, "reactions": true}
}

→ Find high-engagement users in target communities

🎓 For Researchers

Social Network Topology Study

{
  "mode": "byFids",
  "discoverFids": true,
  "shardIds": [0, 1, 2],
  "include": {"links": true, "userData": true},
  "maxRecords": 500000
}

→ Build complete follow graph for network analysis

Information Diffusion Analysis

{
  "mode": "byTime",
  "fids": [seed_users],
  "startTimestamp": 100000000,
  "stopTimestamp": 100500000,
  "include": {"casts": true, "reactions": true}
}

→ Track how content spreads through the network over time

Quick Start

Basic Example: Backfill by FIDs

{
  "hubBaseUrl": "https://hub.pinata.cloud",
  "mode": "byFids",
  "fids": [2, 3, 6833],
  "include": {
    "casts": true,
    "reactions": true,
    "links": true,
    "userData": true
  },
  "pageSize": 1000,
  "maxRecords": 10000
}

Time-Bounded Study

{
  "hubBaseUrl": "https://hub.pinata.cloud",
  "mode": "byTime",
  "fids": [2, 3],
  "startTimestamp": 100000000,
  "stopTimestamp": 100050000,
  "include": {
    "casts": true,
    "reactions": true
  }
}

Real-Time Event Tail

{
  "hubBaseUrl": "https://hub.pinata.cloud",
  "mode": "tailEvents",
  "tail": {
    "fromEventId": "0",
    "shardIndex": 0
  },
  "maxRecords": 1000
}

Auto-Discover FIDs via Shard Scan

{
  "hubBaseUrl": "https://hub.pinata.cloud",
  "mode": "byFids",
  "discoverFids": true,
  "shardIds": [0, 1],
  "include": {
    "casts": true,
    "userData": true
  },
  "maxRecords": 5000
}

With Frame/Mini-App Metadata Parsing

{
  "hubBaseUrl": "https://hub.pinata.cloud",
  "mode": "byFids",
  "fids": [2],
  "fetchEmbeds": true,
  "maxEmbedsPerRun": 100,
  "proxy": "RESIDENTIAL",
  "include": {
    "casts": true
  }
}

Input Configuration

Required Fields

Field	Type	Description	Default
`hubBaseUrl`	`string`	HTTP endpoint of Farcaster Hub	`https://hub.pinata.cloud`
`mode`	`enum`	Ingestion mode: `byFids`, `byTime`, `tailEvents`	`byFids`

Mode-Specific Fields

By FIDs Mode

Field	Type	Description	Default
`fids`	`array<integer>`	List of Farcaster IDs to scrape	`[]`
`discoverFids`	`boolean`	Auto-discover FIDs via shard scan	`false`
`shardIds`	`array<integer>`	Shard IDs to scan when discovering	`[]`

By Time Mode

Field	Type	Description	Default
`fids`	`array<integer>`	FIDs to scrape (required)	`[]`
`startTimestamp`	`integer`	Start time (Farcaster epoch seconds)	-
`stopTimestamp`	`integer`	Stop time (Farcaster epoch seconds)	-

Tail Events Mode

Field	Type	Description	Default
`tail.fromEventId`	`string`	Start from event ID (empty = start from 0)	`"0"`
`tail.shardIndex`	`integer`	Shard index to tail (optional)	-

Entity Filters

Field	Type	Description	Default
`include.casts`	`boolean`	Include cast messages	`true`
`include.reactions`	`boolean`	Include reactions (likes/recasts)	`true`
`include.links`	`boolean`	Include follows	`true`
`include.userData`	`boolean`	Include user profiles	`true`

Optional Features

Field	Type	Description	Default
`fetchEmbeds`	`boolean`	Parse embedded URLs for Frames/Mini-Apps	`false`
`maxEmbedsPerRun`	`integer`	Max embeds to fetch per run	`500`
`neynarApiKey`	`string`	Neynar v2 API key (optional)	-
`clientApi`	`boolean`	Enable Farcaster Client API (experimental)	`false`
`proxy`	`string`	Apify Proxy groups or custom URL	-

Performance & Limits

Field	Type	Description	Default
`pageSize`	`integer`	Records per page (max 1000)	`1000`
`maxRecords`	`integer`	Stop after N records (safety limit)	-
`requestPerMinute`	`integer`	Rate limit for Hub API calls	`600`

Output Schema

The actor produces normalized entities with the following types:

Cast Entity

{
  "entity_type": "cast",
  "fid": 2,
  "hash": "0x1234567890abcdef",
  "ts": 123456789,
  "ts_iso": "2025-01-15T10:30:00.000Z",
  "text": "Hello Farcaster!",
  "mentions": [3, 6833],
  "parent": {
    "castId": { "fid": 2, "hash": "0xabc..." }
  },
  "embeds": {
    "urls": ["https://example.com"],
    "castIds": []
  },
  "derived": {
    "urls": ["https://example.com"],
    "frame_meta": {
      "name": "My App",
      "url": "https://app.example.com"
    }
  },
  "ingest_source": "hub_http",
  "ingest_ts": "2025-01-15T10:31:00.000Z",
  "raw": { /* original Hub message */ }
}

Reaction Entity

{
  "entity_type": "reaction",
  "fid": 3,
  "type": "like",
  "target": {
    "castId": { "fid": 2, "hash": "0x1234..." }
  },
  "ts": 123456790,
  "ts_iso": "2025-01-15T10:31:00.000Z",
  "hash": "0xabcd...",
  "ingest_source": "hub_http",
  "ingest_ts": "2025-01-15T10:32:00.000Z",
  "raw": { /* original Hub message */ }
}

Link Entity (Follow)

{
  "entity_type": "link",
  "fid": 3,
  "targetFid": 2,
  "type": "follow",
  "ts": 123456791,
  "ts_iso": "2025-01-15T10:32:00.000Z",
  "hash": "0xdef...",
  "ingest_source": "hub_http",
  "ingest_ts": "2025-01-15T10:33:00.000Z",
  "raw": { /* original Hub message */ }
}

User Data Entity

{
  "entity_type": "user_data",
  "fid": 2,
  "username": "vitalik.eth",
  "display": "Vitalik",
  "pfp": "https://example.com/pfp.png",
  "bio": "Ethereum co-founder",
  "url": "https://vitalik.ca",
  "location": "Singapore",
  "github": "vbuterin",
  "twitter": "VitalikButerin",
  "ts": 123456792,
  "ts_iso": "2025-01-15T10:33:00.000Z",
  "ingest_source": "hub_http",
  "ingest_ts": "2025-01-15T10:34:00.000Z",
  "raw": [ /* original Hub messages */ ]
}

Event Entity (Tail Mode)

{
  "entity_type": "event",
  "event_id": "12345",
  "event_type": "MERGE_MESSAGE",
  "ts": 123456793,
  "ts_iso": "2025-01-15T10:34:00.000Z",
  "shard_index": 0,
  "message": { /* hydrated message if MERGE_MESSAGE */ },
  "ingest_source": "hub_http",
  "ingest_ts": "2025-01-15T10:35:00.000Z",
  "raw": { /* original Hub event */ }
}

Farcaster Timestamps

Important: Farcaster uses a custom epoch starting at 2021-01-01T00:00:00.000Z.

All entities include both ts (Farcaster epoch seconds) and ts_iso (ISO 8601) fields
Use ts_iso for human-readable timestamps and data analysis
Use ts for filtering Hub API requests

Example conversion:

Farcaster epoch 100000000 = 2024-03-03T01:46:40.000Z
Current time: isoToFarcasterEpoch(new Date().toISOString())

Ingestion Modes Explained

Mode 1: By FIDs (Deterministic Backfill)

Use Case: Research specific users, backfill known accounts

How it works:

For each FID in the input list (or discovered via shard scan):
- Fetch all casts with pagination
- Fetch all reactions (likes/recasts)
- Fetch all follows
- Fetch user profile data
Maintains checkpoint per FID (lastTs, lastPageToken) for resumable runs
Optionally discover FIDs by scanning specified shards

Best for: User-centric analysis, follower studies, content backfills

Mode 2: By Time Window (Targeted Study)

Use Case: Time-bounded analysis (e.g., "all activity during an event")

How it works:

For each FID, fetch only messages within startTimestamp to stopTimestamp
Applies time filters to casts (Hub native support)
Filters reactions and links manually (Hub doesn't support time filters)
Faster than full backfill when studying specific time periods

Best for: Event analysis, temporal studies, A/B testing

Mode 3: Tail Events (Near-Real-Time)

Use Case: Live monitoring, incremental ingestion

How it works:

Poll /v1/events starting from fromEventId (or last checkpoint)
For MERGE_MESSAGE events, hydrate and push the message entity
Update lastEventId checkpoint per shard
Sleeps 5s between polls (configurable)

Important: Hubs prune events older than ~3 days. Run frequently (every 1-2 days) to avoid data loss.

Best for: Real-time dashboards, notifications, streaming pipelines

Optional Features

Frame/Mini-App Metadata Parsing

When fetchEmbeds: true, the actor will:

Extract all unique URLs from cast embeds
Fetch each URL (up to maxEmbedsPerRun limit)
Parse fc:miniapp:* and fc:frame:* meta tags
Enrich cast entities with derived.frame_meta object

Use Proxy: Set proxy field to avoid rate limits (e.g., "RESIDENTIAL" for Apify Proxy)

Performance: Adds ~2-5s per URL. Use maxEmbedsPerRun to cap crawling time.

Neynar v2 Integration

Provide neynarApiKey to use Neynar's hosted Hub endpoints instead of direct Hub HTTP.

Benefits:

Faster, managed infrastructure
No self-hosted Hub required
Additional features (v2 only; v1 EOL March 31, 2025)

Records flagged: All entities get ingest_source: "neynar_v2"

Client API (Experimental)

Set clientApi: true to enable Warpcast-specific endpoints (e.g., trending, channels).

Warning: Non-protocol data. Records flagged as ingest_source: "client_api" to avoid confusion.

State Checkpointing & Resumability

The actor automatically persists state every 30 seconds and on Apify migration events:

Per-FID checkpoints: { lastTs, lastPageToken } for resuming mid-pagination
Per-Shard checkpoints: { lastEventId } for event tail mode
Migration-safe: Survives container restarts and platform migrations

To resume a run:

Start the actor with same input
State is automatically restored
Scraping continues from last checkpoint

Performance Tips

Use time filters: Narrow startTimestamp/stopTimestamp for faster runs
Batch FIDs: Process related users together to share dedup cache
Tune pageSize: Larger pages (1000) = fewer requests, but slower per-request
Set maxRecords: Safety limit prevents runaway costs
Monitor rate limits: Default 600 req/min is conservative; increase if Hub allows
Schedule tail runs: Run every 1-2 days to avoid event pruning

Limitations & Best Practices

Hub Event Pruning

Limitation: Hubs prune events older than ~3 days
Best Practice: Schedule tail runs every 1-2 days for continuous ingestion

Reaction/Link Time Filters

Limitation: Hub API doesn't support time filters for reactions/links
Workaround: Actor fetches all and filters manually in byTime mode (slower)

Embed Fetching

Limitation: Some URLs may be slow, dead, or behind auth
Best Practice: Use maxEmbedsPerRun cap and Apify Proxy to avoid timeouts

Rate Limiting

Default: 600 req/min (conservative)
Tuning: Increase requestPerMinute if your Hub supports higher rates
Public Hubs: May have stricter limits; monitor 429 responses

Pricing & Compute

Approximate compute units (based on default settings):

Run Type	Records	Compute Units	Notes
Small backfill	<10k	~0.01	2-3 FIDs, no embeds
Medium backfill	100k	~0.5	10-20 FIDs, all entities
Large backfill	1M	~5	100+ FIDs or full shard scan
Tail (1 hour)	1k events	~0.005	Near-real-time streaming
With embeds	+100 URLs	+0.02 per 100	Crawlee overhead

Formula: ~0.5 CU per 100k records (without embeds)

Example Use Cases

{
  "mode": "byFids",
  "fids": [2, 3, 6833, 5650],
  "include": {
    "links": true,
    "userData": true
  }
}

Output: Follow relationships + user profiles for network analysis

Content Research

{
  "mode": "byTime",
  "fids": [2],
  "startTimestamp": 100000000,
  "stopTimestamp": 100050000,
  "include": {
    "casts": true,
    "reactions": true
  }
}

Output: All casts + reactions during a specific event

Real-Time Dashboard

{
  "mode": "tailEvents",
  "tail": { "fromEventId": "0" },
  "maxRecords": 10000
}

Output: Live stream of all protocol events (schedule every hour)

Frame/Mini-App Catalog

{
  "mode": "byFids",
  "fids": [2, 3],
  "fetchEmbeds": true,
  "maxEmbedsPerRun": 200,
  "include": {
    "casts": true
  }
}

Output: Casts with Frame/Mini-App metadata extracted

Troubleshooting

"Failed to connect to Hub"

Verify hubBaseUrl is correct and accessible
Check Hub is running and serving HTTP API on port 3381
Try public Hub: https://hub.pinata.cloud

"No data returned"

Verify FIDs exist and have activity
Check time window isn't too narrow (byTime mode)
Ensure include.* filters aren't excluding all data

"Max records limit reached"

Increase maxRecords or remove limit for full backfill
Use checkpointing to resume in multiple runs

"Rate limit errors (429)"

Decrease requestPerMinute
Add delays between runs
Use Neynar hosted Hub (better rate limits)

"Event tail missing data"

Events pruned >3 days ago
Schedule runs more frequently (every 1-2 days)
Use byFids mode for historical backfill

Data Views

The actor provides pre-configured dataset views:

Overview: All entities with key identifiers
Casts: Cast content, timestamps, and URLs
Reactions: Likes and recasts by FID
Follows: Follow relationships (social graph edges)
Users: User profiles and metadata

Access views in Apify Console → Dataset → Views tab

Support

Email: kontakt@barrierefix.de
Documentation: Farcaster Hub API Docs
Issues: Report bugs or request features via email

Version History

1.0.0 (2025-01) - Initial release
- Three ingestion modes (byFids, byTime, tailEvents)
- Hub HTTP API integration
- State checkpointing
- Optional Frame/Mini-App parsing
- Neynar v2 support

🔗 Explore More of Our Actors

📰 Content & Publishing

Actor	Description
Notion Marketplace Scraper	Scrape Notion templates and marketplace listings
Ghost Newsletter Scraper	Extract Ghost newsletter content and subscriber data
Google Play Reviews Scraper	Extract app reviews from Google Play Store

Actor	Description
Reddit Scraper Pro	Monitor subreddits and track keywords with sentiment analysis
Discord Scraper Pro	Extract Discord messages and chat history for community insights
YouTube Comments Harvester	Comprehensive YouTube comments scraper with channel-wide enumeration
YouTube Contact Scraper	Extract YouTube channel contact information for outreach
YouTube Shorts Scraper	Scrape YouTube Shorts for viral content research

License

MIT License - Free for commercial and non-commercial use

Competitor-Based Keyword Recommendations for On-Page SEO

antonio_espresso/keyword-competitor-recommendation

This actor takes a keyword, language, and Google engine, then returns structured SEO insights: ideal word count, title/content terms with usage ranges, relevant questions (H1–H3, PAA), and competitor data including URLs, rankings, titles, and content scores.

Antonio Blago

3.9

Reddit User Profile Posts & Comments Scraper

louisdeconinck/reddit-user-profile-posts-scraper

Unlock Reddit's potential with our advanced scraper! Effortlessly gather comprehensive user data from public profiles. Perfect for researchers, marketers, and analysts. Enjoy high-speed performance, structured JSON output, and zero setup. Start scraping today with Apify's reliable infrastructure!

Louis Deconinck

200

5.0

Reddit Posts Search Scraper

easyapi/reddit-posts-search-scraper

Extract Reddit posts from search results with rich metadata, including media content, engagement metrics, and community information. Perfect for content research, trend analysis, and social media monitoring across Reddit communities.

EasyApi

139

5.0

Reddit Subreddit Members Scraper

louisdeconinck/reddit-subreddit-users

Scrape all members of a subreddit. Find the most active and influential users within Reddit communities. Perfect for market research, community analysis, and finding key players in your target niche.

Louis Deconinck

118

2.3

Ultimate Reddit Profile Scraper

potatopeeler/reddit-scraper

Seamlessly download full Reddit user accounts, capturing posts, images, activity, and historical data, including URLs and media comments. Export detailed insights to CSV, JSON, XML, EXCEL formats, or effortlessly import them into your email for comprehensive analysis and easy access.

Jamie Potato

322

5.0

Reddit Scraper

trudax/reddit-scraper

Unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

Gustavo Rudiger

9.4K

4.6

Reddit Scraper Lite

trudax/reddit-scraper-lite

Pay Per Result, unlimited Reddit web scraper to crawl posts, comments, communities, and users without login. Limit web scraping by number of posts or items and extract all data in a dataset in multiple formats.

Gustavo Rudiger

12K

1.8

Ultimate Reddit Profile Scraper (Lite)

potatopeeler/reddit-account-scraper-lite

Pay per result. Seamlessly download full Reddit user accounts, capturing posts, images, activity, and historical data, including URLs and media comments. Export detailed insights to CSV, JSON, XML, EXCEL formats, or effortlessly import them into your email for comprehensive analysis and easy access.

Jamie Potato

128

4.6

Tech News Article Scraper

inquisitive_sarangi/news-article-scraper

Tech News Article Scraper is a simple yet powerful tool to extract news articles from a variety of popular news websites. Supported The Verge, CNET, Wired, TechCrunch, Ars Technica, Tech Radar, Engadget

API Master

Reddit Scraper | All-In-One | $1.5 / 1K

fatihtahta/reddit-scraper-search-fast

Reddit All-in-one Scraper. Scrape posts and full comment threads from any search, subreddit, user, or direct post URL. This enterprise-grade scraper is the fastest in the market and delivers clean and detailed JSON.

Fatih Tahta

621

5.0

Farcaster Hub Scraper

Farcaster Hub Scraper

Features

Who Uses This Actor?

🎯 Target Users

Use Cases by Persona

📊 For Data Analysts

🛠️ For Frame Developers

📢 For Marketing Agencies

🎓 For Researchers

Quick Start

Basic Example: Backfill by FIDs

Time-Bounded Study

Real-Time Event Tail

Auto-Discover FIDs via Shard Scan

With Frame/Mini-App Metadata Parsing

Input Configuration

Required Fields

Mode-Specific Fields

By FIDs Mode

By Time Mode

Tail Events Mode

Entity Filters

Optional Features

Performance & Limits

Output Schema

Cast Entity

Reaction Entity

Link Entity (Follow)

User Data Entity

Event Entity (Tail Mode)

Farcaster Timestamps

Ingestion Modes Explained

Mode 1: By FIDs (Deterministic Backfill)

Mode 2: By Time Window (Targeted Study)

Mode 3: Tail Events (Near-Real-Time)

Optional Features

Frame/Mini-App Metadata Parsing

Neynar v2 Integration

Client API (Experimental)

State Checkpointing & Resumability

Performance Tips

Limitations & Best Practices

Hub Event Pruning

Reaction/Link Time Filters

Embed Fetching

Rate Limiting

Pricing & Compute

Example Use Cases

Social Graph Analysis

Content Research

Real-Time Dashboard

Frame/Mini-App Catalog

Troubleshooting

"Failed to connect to Hub"

"No data returned"

"Max records limit reached"

"Rate limit errors (429)"

"Event tail missing data"

Data Views

Support

Version History

🔗 Explore More of Our Actors

📰 Content & Publishing

💬 Social Media & Community

License

You might also like

Competitor-Based Keyword Recommendations for On-Page SEO

Reddit User Profile Posts & Comments Scraper

Reddit Posts Search Scraper

Reddit Subreddit Members Scraper

Ultimate Reddit Profile Scraper

Reddit Scraper

Reddit Scraper Lite

Ultimate Reddit Profile Scraper (Lite)

Tech News Article Scraper

Reddit Scraper | All-In-One | $1.5 / 1K