
Farcaster Hub Scraper
Pricing
Pay per event

Farcaster Hub Scraper
Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and real-time events directly from Farcaster Hubs via HTTP API.
0.0 (0)
Pricing
Pay per event
0
1
1
Last modified
15 days ago
Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and real-time events directly from Farcaster Hubs via HTTP API.
Features
✅ Protocol-First Design - Direct Hub HTTP API integration (no third-party dependencies) ✅ Three Ingestion Modes - Deterministic backfill by FIDs, time-bounded studies, or incremental event tailing ✅ Comprehensive Data - Casts, reactions (likes/recasts), follows, user profiles, and events ✅ Optional Enrichment - Parse Frames/Mini-Apps metadata from embedded URLs ✅ State Checkpointing - Migration-safe, resumable runs with automatic state persistence ✅ Rate Limiting & Retries - Production-grade reliability with exponential backoff ✅ Neynar v2 Support - Optional integration with Neynar hosted hubs ✅ Multiple Views - Pre-configured dataset views for easy data exploration
Who Uses This Actor?
🎯 Target Users
📊 Web3 Data Analysts & Researchers (Dune, Flipside)
- Export Farcaster data to SQL databases for analytics dashboards
- Track protocol growth, user engagement trends, and network effects
- Cross-reference social data with onchain transactions
🛠️ Farcaster Frame/Mini-App Developers
- Monitor Frame engagement and interaction patterns
- Track which users interact with your Mini-Apps
- Analyze viral content and user acquisition funnels
📢 Web3 Marketing Agencies & Brands
- Track influencer campaigns and brand mentions
- Measure content reach and engagement rates
- Identify key opinion leaders in the Farcaster ecosystem
🎓 Academic Researchers
- Study decentralized social network dynamics
- Analyze information diffusion and community formation
- Research Web3 social graph topology
Use Cases by Persona
📊 For Data Analysts
Influencer Ranking Dashboard
{"mode": "byFids","fids": [2, 3, 6833, 5650, 7890],"include": {"casts": true, "reactions": true, "userData": true},"maxRecords": 50000}
→ Export to Dune to calculate engagement rates, follower growth, content velocity
Protocol Growth Metrics
{"mode": "tailEvents","maxRecords": 100000}
→ Stream all events to track daily active users, network growth, retention
🛠️ For Frame Developers
Frame Interaction Analysis
{"mode": "byFids","fids": [list of users who interacted],"include": {"casts": true, "reactions": true},"fetchEmbeds": true}
→ Identify which casts contain your Frame, track engagement patterns
Real-Time Frame Monitoring
{"mode": "tailEvents","tail": {"fromEventId": "latest"},"maxRecords": 10000}
→ Get notified when users interact with your Frames in real-time
📢 For Marketing Agencies
Campaign Performance Tracking
{"mode": "byFids","fids": [brand_account, influencer1, influencer2],"startTimestamp": 130000000,"stopTimestamp": 130100000,"include": {"casts": true, "reactions": true}}
→ Measure campaign reach during specific time window
Influencer Discovery
{"mode": "byFids","fids": [competitor_followers],"include": {"links": true, "userData": true, "reactions": true}}
→ Find high-engagement users in target communities
🎓 For Researchers
Social Network Topology Study
{"mode": "byFids","discoverFids": true,"shardIds": [0, 1, 2],"include": {"links": true, "userData": true},"maxRecords": 500000}
→ Build complete follow graph for network analysis
Information Diffusion Analysis
{"mode": "byTime","fids": [seed_users],"startTimestamp": 100000000,"stopTimestamp": 100500000,"include": {"casts": true, "reactions": true}}
→ Track how content spreads through the network over time
Quick Start
Basic Example: Backfill by FIDs
{"hubBaseUrl": "https://hub.pinata.cloud","mode": "byFids","fids": [2, 3, 6833],"include": {"casts": true,"reactions": true,"links": true,"userData": true},"pageSize": 1000,"maxRecords": 10000}
Time-Bounded Study
{"hubBaseUrl": "https://hub.pinata.cloud","mode": "byTime","fids": [2, 3],"startTimestamp": 100000000,"stopTimestamp": 100050000,"include": {"casts": true,"reactions": true}}
Real-Time Event Tail
{"hubBaseUrl": "https://hub.pinata.cloud","mode": "tailEvents","tail": {"fromEventId": "0","shardIndex": 0},"maxRecords": 1000}
Auto-Discover FIDs via Shard Scan
{"hubBaseUrl": "https://hub.pinata.cloud","mode": "byFids","discoverFids": true,"shardIds": [0, 1],"include": {"casts": true,"userData": true},"maxRecords": 5000}
With Frame/Mini-App Metadata Parsing
{"hubBaseUrl": "https://hub.pinata.cloud","mode": "byFids","fids": [2],"fetchEmbeds": true,"maxEmbedsPerRun": 100,"proxy": "RESIDENTIAL","include": {"casts": true}}
Input Configuration
Required Fields
Field | Type | Description | Default |
---|---|---|---|
hubBaseUrl | string | HTTP endpoint of Farcaster Hub | https://hub.pinata.cloud |
mode | enum | Ingestion mode: byFids , byTime , tailEvents | byFids |
Mode-Specific Fields
By FIDs Mode
Field | Type | Description | Default |
---|---|---|---|
fids | array<integer> | List of Farcaster IDs to scrape | [] |
discoverFids | boolean | Auto-discover FIDs via shard scan | false |
shardIds | array<integer> | Shard IDs to scan when discovering | [] |
By Time Mode
Field | Type | Description | Default |
---|---|---|---|
fids | array<integer> | FIDs to scrape (required) | [] |
startTimestamp | integer | Start time (Farcaster epoch seconds) | - |
stopTimestamp | integer | Stop time (Farcaster epoch seconds) | - |
Tail Events Mode
Field | Type | Description | Default |
---|---|---|---|
tail.fromEventId | string | Start from event ID (empty = start from 0) | "0" |
tail.shardIndex | integer | Shard index to tail (optional) | - |
Entity Filters
Field | Type | Description | Default |
---|---|---|---|
include.casts | boolean | Include cast messages | true |
include.reactions | boolean | Include reactions (likes/recasts) | true |
include.links | boolean | Include follows | true |
include.userData | boolean | Include user profiles | true |
Optional Features
Field | Type | Description | Default |
---|---|---|---|
fetchEmbeds | boolean | Parse embedded URLs for Frames/Mini-Apps | false |
maxEmbedsPerRun | integer | Max embeds to fetch per run | 500 |
neynarApiKey | string | Neynar v2 API key (optional) | - |
clientApi | boolean | Enable Farcaster Client API (experimental) | false |
proxy | string | Apify Proxy groups or custom URL | - |
Performance & Limits
Field | Type | Description | Default |
---|---|---|---|
pageSize | integer | Records per page (max 1000) | 1000 |
maxRecords | integer | Stop after N records (safety limit) | - |
requestPerMinute | integer | Rate limit for Hub API calls | 600 |
Output Schema
The actor produces normalized entities with the following types:
Cast Entity
{"entity_type": "cast","fid": 2,"hash": "0x1234567890abcdef","ts": 123456789,"ts_iso": "2025-01-15T10:30:00.000Z","text": "Hello Farcaster!","mentions": [3, 6833],"parent": {"castId": { "fid": 2, "hash": "0xabc..." }},"embeds": {"urls": ["https://example.com"],"castIds": []},"derived": {"urls": ["https://example.com"],"frame_meta": {"name": "My App","url": "https://app.example.com"}},"ingest_source": "hub_http","ingest_ts": "2025-01-15T10:31:00.000Z","raw": { /* original Hub message */ }}
Reaction Entity
{"entity_type": "reaction","fid": 3,"type": "like","target": {"castId": { "fid": 2, "hash": "0x1234..." }},"ts": 123456790,"ts_iso": "2025-01-15T10:31:00.000Z","hash": "0xabcd...","ingest_source": "hub_http","ingest_ts": "2025-01-15T10:32:00.000Z","raw": { /* original Hub message */ }}
Link Entity (Follow)
{"entity_type": "link","fid": 3,"targetFid": 2,"type": "follow","ts": 123456791,"ts_iso": "2025-01-15T10:32:00.000Z","hash": "0xdef...","ingest_source": "hub_http","ingest_ts": "2025-01-15T10:33:00.000Z","raw": { /* original Hub message */ }}
User Data Entity
{"entity_type": "user_data","fid": 2,"username": "vitalik.eth","display": "Vitalik","pfp": "https://example.com/pfp.png","bio": "Ethereum co-founder","url": "https://vitalik.ca","location": "Singapore","github": "vbuterin","twitter": "VitalikButerin","ts": 123456792,"ts_iso": "2025-01-15T10:33:00.000Z","ingest_source": "hub_http","ingest_ts": "2025-01-15T10:34:00.000Z","raw": [ /* original Hub messages */ ]}
Event Entity (Tail Mode)
{"entity_type": "event","event_id": "12345","event_type": "MERGE_MESSAGE","ts": 123456793,"ts_iso": "2025-01-15T10:34:00.000Z","shard_index": 0,"message": { /* hydrated message if MERGE_MESSAGE */ },"ingest_source": "hub_http","ingest_ts": "2025-01-15T10:35:00.000Z","raw": { /* original Hub event */ }}
Farcaster Timestamps
Important: Farcaster uses a custom epoch starting at 2021-01-01T00:00:00.000Z
.
- All entities include both
ts
(Farcaster epoch seconds) andts_iso
(ISO 8601) fields - Use
ts_iso
for human-readable timestamps and data analysis - Use
ts
for filtering Hub API requests
Example conversion:
- Farcaster epoch
100000000
=2024-03-03T01:46:40.000Z
- Current time:
isoToFarcasterEpoch(new Date().toISOString())
Ingestion Modes Explained
Mode 1: By FIDs (Deterministic Backfill)
Use Case: Research specific users, backfill known accounts
How it works:
- For each FID in the input list (or discovered via shard scan):
- Fetch all casts with pagination
- Fetch all reactions (likes/recasts)
- Fetch all follows
- Fetch user profile data
- Maintains checkpoint per FID (
lastTs
,lastPageToken
) for resumable runs - Optionally discover FIDs by scanning specified shards
Best for: User-centric analysis, follower studies, content backfills
Mode 2: By Time Window (Targeted Study)
Use Case: Time-bounded analysis (e.g., "all activity during an event")
How it works:
- For each FID, fetch only messages within
startTimestamp
tostopTimestamp
- Applies time filters to casts (Hub native support)
- Filters reactions and links manually (Hub doesn't support time filters)
- Faster than full backfill when studying specific time periods
Best for: Event analysis, temporal studies, A/B testing
Mode 3: Tail Events (Near-Real-Time)
Use Case: Live monitoring, incremental ingestion
How it works:
- Poll
/v1/events
starting fromfromEventId
(or last checkpoint) - For
MERGE_MESSAGE
events, hydrate and push the message entity - Update
lastEventId
checkpoint per shard - Sleeps 5s between polls (configurable)
Important: Hubs prune events older than ~3 days. Run frequently (every 1-2 days) to avoid data loss.
Best for: Real-time dashboards, notifications, streaming pipelines
Optional Features
Frame/Mini-App Metadata Parsing
When fetchEmbeds: true
, the actor will:
- Extract all unique URLs from cast embeds
- Fetch each URL (up to
maxEmbedsPerRun
limit) - Parse
fc:miniapp:*
andfc:frame:*
meta tags - Enrich cast entities with
derived.frame_meta
object
Use Proxy: Set proxy
field to avoid rate limits (e.g., "RESIDENTIAL"
for Apify Proxy)
Performance: Adds ~2-5s per URL. Use maxEmbedsPerRun
to cap crawling time.
Neynar v2 Integration
Provide neynarApiKey
to use Neynar's hosted Hub endpoints instead of direct Hub HTTP.
Benefits:
- Faster, managed infrastructure
- No self-hosted Hub required
- Additional features (v2 only; v1 EOL March 31, 2025)
Records flagged: All entities get ingest_source: "neynar_v2"
Client API (Experimental)
Set clientApi: true
to enable Warpcast-specific endpoints (e.g., trending, channels).
Warning: Non-protocol data. Records flagged as ingest_source: "client_api"
to avoid confusion.
State Checkpointing & Resumability
The actor automatically persists state every 30 seconds and on Apify migration events:
- Per-FID checkpoints:
{ lastTs, lastPageToken }
for resuming mid-pagination - Per-Shard checkpoints:
{ lastEventId }
for event tail mode - Migration-safe: Survives container restarts and platform migrations
To resume a run:
- Start the actor with same input
- State is automatically restored
- Scraping continues from last checkpoint
Performance Tips
- Use time filters: Narrow
startTimestamp
/stopTimestamp
for faster runs - Batch FIDs: Process related users together to share dedup cache
- Tune
pageSize
: Larger pages (1000) = fewer requests, but slower per-request - Set
maxRecords
: Safety limit prevents runaway costs - Monitor rate limits: Default 600 req/min is conservative; increase if Hub allows
- Schedule tail runs: Run every 1-2 days to avoid event pruning
Limitations & Best Practices
Hub Event Pruning
- Limitation: Hubs prune events older than ~3 days
- Best Practice: Schedule tail runs every 1-2 days for continuous ingestion
Reaction/Link Time Filters
- Limitation: Hub API doesn't support time filters for reactions/links
- Workaround: Actor fetches all and filters manually in
byTime
mode (slower)
Embed Fetching
- Limitation: Some URLs may be slow, dead, or behind auth
- Best Practice: Use
maxEmbedsPerRun
cap and Apify Proxy to avoid timeouts
Rate Limiting
- Default: 600 req/min (conservative)
- Tuning: Increase
requestPerMinute
if your Hub supports higher rates - Public Hubs: May have stricter limits; monitor 429 responses
Pricing & Compute
Approximate compute units (based on default settings):
Run Type | Records | Compute Units | Notes |
---|---|---|---|
Small backfill | <10k | ~0.01 | 2-3 FIDs, no embeds |
Medium backfill | 100k | ~0.5 | 10-20 FIDs, all entities |
Large backfill | 1M | ~5 | 100+ FIDs or full shard scan |
Tail (1 hour) | 1k events | ~0.005 | Near-real-time streaming |
With embeds | +100 URLs | +0.02 per 100 | Crawlee overhead |
Formula: ~0.5 CU per 100k records (without embeds)
Example Use Cases
Social Graph Analysis
{"mode": "byFids","fids": [2, 3, 6833, 5650],"include": {"links": true,"userData": true}}
Output: Follow relationships + user profiles for network analysis
Content Research
{"mode": "byTime","fids": [2],"startTimestamp": 100000000,"stopTimestamp": 100050000,"include": {"casts": true,"reactions": true}}
Output: All casts + reactions during a specific event
Real-Time Dashboard
{"mode": "tailEvents","tail": { "fromEventId": "0" },"maxRecords": 10000}
Output: Live stream of all protocol events (schedule every hour)
Frame/Mini-App Catalog
{"mode": "byFids","fids": [2, 3],"fetchEmbeds": true,"maxEmbedsPerRun": 200,"include": {"casts": true}}
Output: Casts with Frame/Mini-App metadata extracted
Troubleshooting
"Failed to connect to Hub"
- Verify
hubBaseUrl
is correct and accessible - Check Hub is running and serving HTTP API on port 3381
- Try public Hub:
https://hub.pinata.cloud
"No data returned"
- Verify FIDs exist and have activity
- Check time window isn't too narrow (
byTime
mode) - Ensure
include.*
filters aren't excluding all data
"Max records limit reached"
- Increase
maxRecords
or remove limit for full backfill - Use checkpointing to resume in multiple runs
"Rate limit errors (429)"
- Decrease
requestPerMinute
- Add delays between runs
- Use Neynar hosted Hub (better rate limits)
"Event tail missing data"
- Events pruned >3 days ago
- Schedule runs more frequently (every 1-2 days)
- Use
byFids
mode for historical backfill
Data Views
The actor provides pre-configured dataset views:
- Overview: All entities with key identifiers
- Casts: Cast content, timestamps, and URLs
- Reactions: Likes and recasts by FID
- Follows: Follow relationships (social graph edges)
- Users: User profiles and metadata
Access views in Apify Console → Dataset → Views tab
Support
- Email: kontakt@barrierefix.de
- Documentation: Farcaster Hub API Docs
- Issues: Report bugs or request features via email
Version History
- 1.0.0 (2025-01) - Initial release
- Three ingestion modes (byFids, byTime, tailEvents)
- Hub HTTP API integration
- State checkpointing
- Optional Frame/Mini-App parsing
- Neynar v2 support
License
MIT License - Free for commercial and non-commercial use
On this page
Share Actor: