Farcaster Hub Scraper avatar
Farcaster Hub Scraper

Pricing

Pay per event

Go to Apify Store
Farcaster Hub Scraper

Farcaster Hub Scraper

Developed by

BarriereFix

BarriereFix

Maintained by Community

Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and real-time events directly from Farcaster Hubs via HTTP API.

0.0 (0)

Pricing

Pay per event

0

1

1

Last modified

15 days ago

Protocol-native Farcaster data ingestion for research, analytics, and social graph analysis. Collect casts, reactions, follows, user profiles, and real-time events directly from Farcaster Hubs via HTTP API.

Features

Protocol-First Design - Direct Hub HTTP API integration (no third-party dependencies) ✅ Three Ingestion Modes - Deterministic backfill by FIDs, time-bounded studies, or incremental event tailing ✅ Comprehensive Data - Casts, reactions (likes/recasts), follows, user profiles, and events ✅ Optional Enrichment - Parse Frames/Mini-Apps metadata from embedded URLs ✅ State Checkpointing - Migration-safe, resumable runs with automatic state persistence ✅ Rate Limiting & Retries - Production-grade reliability with exponential backoff ✅ Neynar v2 Support - Optional integration with Neynar hosted hubs ✅ Multiple Views - Pre-configured dataset views for easy data exploration

Who Uses This Actor?

🎯 Target Users

📊 Web3 Data Analysts & Researchers (Dune, Flipside)

  • Export Farcaster data to SQL databases for analytics dashboards
  • Track protocol growth, user engagement trends, and network effects
  • Cross-reference social data with onchain transactions

🛠️ Farcaster Frame/Mini-App Developers

  • Monitor Frame engagement and interaction patterns
  • Track which users interact with your Mini-Apps
  • Analyze viral content and user acquisition funnels

📢 Web3 Marketing Agencies & Brands

  • Track influencer campaigns and brand mentions
  • Measure content reach and engagement rates
  • Identify key opinion leaders in the Farcaster ecosystem

🎓 Academic Researchers

  • Study decentralized social network dynamics
  • Analyze information diffusion and community formation
  • Research Web3 social graph topology

Use Cases by Persona

📊 For Data Analysts

Influencer Ranking Dashboard

{
"mode": "byFids",
"fids": [2, 3, 6833, 5650, 7890],
"include": {"casts": true, "reactions": true, "userData": true},
"maxRecords": 50000
}

→ Export to Dune to calculate engagement rates, follower growth, content velocity

Protocol Growth Metrics

{
"mode": "tailEvents",
"maxRecords": 100000
}

→ Stream all events to track daily active users, network growth, retention

🛠️ For Frame Developers

Frame Interaction Analysis

{
"mode": "byFids",
"fids": [list of users who interacted],
"include": {"casts": true, "reactions": true},
"fetchEmbeds": true
}

→ Identify which casts contain your Frame, track engagement patterns

Real-Time Frame Monitoring

{
"mode": "tailEvents",
"tail": {"fromEventId": "latest"},
"maxRecords": 10000
}

→ Get notified when users interact with your Frames in real-time

📢 For Marketing Agencies

Campaign Performance Tracking

{
"mode": "byFids",
"fids": [brand_account, influencer1, influencer2],
"startTimestamp": 130000000,
"stopTimestamp": 130100000,
"include": {"casts": true, "reactions": true}
}

→ Measure campaign reach during specific time window

Influencer Discovery

{
"mode": "byFids",
"fids": [competitor_followers],
"include": {"links": true, "userData": true, "reactions": true}
}

→ Find high-engagement users in target communities

🎓 For Researchers

Social Network Topology Study

{
"mode": "byFids",
"discoverFids": true,
"shardIds": [0, 1, 2],
"include": {"links": true, "userData": true},
"maxRecords": 500000
}

→ Build complete follow graph for network analysis

Information Diffusion Analysis

{
"mode": "byTime",
"fids": [seed_users],
"startTimestamp": 100000000,
"stopTimestamp": 100500000,
"include": {"casts": true, "reactions": true}
}

→ Track how content spreads through the network over time

Quick Start

Basic Example: Backfill by FIDs

{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "byFids",
"fids": [2, 3, 6833],
"include": {
"casts": true,
"reactions": true,
"links": true,
"userData": true
},
"pageSize": 1000,
"maxRecords": 10000
}

Time-Bounded Study

{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "byTime",
"fids": [2, 3],
"startTimestamp": 100000000,
"stopTimestamp": 100050000,
"include": {
"casts": true,
"reactions": true
}
}

Real-Time Event Tail

{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "tailEvents",
"tail": {
"fromEventId": "0",
"shardIndex": 0
},
"maxRecords": 1000
}

Auto-Discover FIDs via Shard Scan

{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "byFids",
"discoverFids": true,
"shardIds": [0, 1],
"include": {
"casts": true,
"userData": true
},
"maxRecords": 5000
}

With Frame/Mini-App Metadata Parsing

{
"hubBaseUrl": "https://hub.pinata.cloud",
"mode": "byFids",
"fids": [2],
"fetchEmbeds": true,
"maxEmbedsPerRun": 100,
"proxy": "RESIDENTIAL",
"include": {
"casts": true
}
}

Input Configuration

Required Fields

FieldTypeDescriptionDefault
hubBaseUrlstringHTTP endpoint of Farcaster Hubhttps://hub.pinata.cloud
modeenumIngestion mode: byFids, byTime, tailEventsbyFids

Mode-Specific Fields

By FIDs Mode

FieldTypeDescriptionDefault
fidsarray<integer>List of Farcaster IDs to scrape[]
discoverFidsbooleanAuto-discover FIDs via shard scanfalse
shardIdsarray<integer>Shard IDs to scan when discovering[]

By Time Mode

FieldTypeDescriptionDefault
fidsarray<integer>FIDs to scrape (required)[]
startTimestampintegerStart time (Farcaster epoch seconds)-
stopTimestampintegerStop time (Farcaster epoch seconds)-

Tail Events Mode

FieldTypeDescriptionDefault
tail.fromEventIdstringStart from event ID (empty = start from 0)"0"
tail.shardIndexintegerShard index to tail (optional)-

Entity Filters

FieldTypeDescriptionDefault
include.castsbooleanInclude cast messagestrue
include.reactionsbooleanInclude reactions (likes/recasts)true
include.linksbooleanInclude followstrue
include.userDatabooleanInclude user profilestrue

Optional Features

FieldTypeDescriptionDefault
fetchEmbedsbooleanParse embedded URLs for Frames/Mini-Appsfalse
maxEmbedsPerRunintegerMax embeds to fetch per run500
neynarApiKeystringNeynar v2 API key (optional)-
clientApibooleanEnable Farcaster Client API (experimental)false
proxystringApify Proxy groups or custom URL-

Performance & Limits

FieldTypeDescriptionDefault
pageSizeintegerRecords per page (max 1000)1000
maxRecordsintegerStop after N records (safety limit)-
requestPerMinuteintegerRate limit for Hub API calls600

Output Schema

The actor produces normalized entities with the following types:

Cast Entity

{
"entity_type": "cast",
"fid": 2,
"hash": "0x1234567890abcdef",
"ts": 123456789,
"ts_iso": "2025-01-15T10:30:00.000Z",
"text": "Hello Farcaster!",
"mentions": [3, 6833],
"parent": {
"castId": { "fid": 2, "hash": "0xabc..." }
},
"embeds": {
"urls": ["https://example.com"],
"castIds": []
},
"derived": {
"urls": ["https://example.com"],
"frame_meta": {
"name": "My App",
"url": "https://app.example.com"
}
},
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:31:00.000Z",
"raw": { /* original Hub message */ }
}

Reaction Entity

{
"entity_type": "reaction",
"fid": 3,
"type": "like",
"target": {
"castId": { "fid": 2, "hash": "0x1234..." }
},
"ts": 123456790,
"ts_iso": "2025-01-15T10:31:00.000Z",
"hash": "0xabcd...",
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:32:00.000Z",
"raw": { /* original Hub message */ }
}
{
"entity_type": "link",
"fid": 3,
"targetFid": 2,
"type": "follow",
"ts": 123456791,
"ts_iso": "2025-01-15T10:32:00.000Z",
"hash": "0xdef...",
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:33:00.000Z",
"raw": { /* original Hub message */ }
}

User Data Entity

{
"entity_type": "user_data",
"fid": 2,
"username": "vitalik.eth",
"display": "Vitalik",
"pfp": "https://example.com/pfp.png",
"bio": "Ethereum co-founder",
"url": "https://vitalik.ca",
"location": "Singapore",
"github": "vbuterin",
"twitter": "VitalikButerin",
"ts": 123456792,
"ts_iso": "2025-01-15T10:33:00.000Z",
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:34:00.000Z",
"raw": [ /* original Hub messages */ ]
}

Event Entity (Tail Mode)

{
"entity_type": "event",
"event_id": "12345",
"event_type": "MERGE_MESSAGE",
"ts": 123456793,
"ts_iso": "2025-01-15T10:34:00.000Z",
"shard_index": 0,
"message": { /* hydrated message if MERGE_MESSAGE */ },
"ingest_source": "hub_http",
"ingest_ts": "2025-01-15T10:35:00.000Z",
"raw": { /* original Hub event */ }
}

Farcaster Timestamps

Important: Farcaster uses a custom epoch starting at 2021-01-01T00:00:00.000Z.

  • All entities include both ts (Farcaster epoch seconds) and ts_iso (ISO 8601) fields
  • Use ts_iso for human-readable timestamps and data analysis
  • Use ts for filtering Hub API requests

Example conversion:

  • Farcaster epoch 100000000 = 2024-03-03T01:46:40.000Z
  • Current time: isoToFarcasterEpoch(new Date().toISOString())

Ingestion Modes Explained

Mode 1: By FIDs (Deterministic Backfill)

Use Case: Research specific users, backfill known accounts

How it works:

  1. For each FID in the input list (or discovered via shard scan):
    • Fetch all casts with pagination
    • Fetch all reactions (likes/recasts)
    • Fetch all follows
    • Fetch user profile data
  2. Maintains checkpoint per FID (lastTs, lastPageToken) for resumable runs
  3. Optionally discover FIDs by scanning specified shards

Best for: User-centric analysis, follower studies, content backfills

Mode 2: By Time Window (Targeted Study)

Use Case: Time-bounded analysis (e.g., "all activity during an event")

How it works:

  1. For each FID, fetch only messages within startTimestamp to stopTimestamp
  2. Applies time filters to casts (Hub native support)
  3. Filters reactions and links manually (Hub doesn't support time filters)
  4. Faster than full backfill when studying specific time periods

Best for: Event analysis, temporal studies, A/B testing

Mode 3: Tail Events (Near-Real-Time)

Use Case: Live monitoring, incremental ingestion

How it works:

  1. Poll /v1/events starting from fromEventId (or last checkpoint)
  2. For MERGE_MESSAGE events, hydrate and push the message entity
  3. Update lastEventId checkpoint per shard
  4. Sleeps 5s between polls (configurable)

Important: Hubs prune events older than ~3 days. Run frequently (every 1-2 days) to avoid data loss.

Best for: Real-time dashboards, notifications, streaming pipelines

Optional Features

Frame/Mini-App Metadata Parsing

When fetchEmbeds: true, the actor will:

  1. Extract all unique URLs from cast embeds
  2. Fetch each URL (up to maxEmbedsPerRun limit)
  3. Parse fc:miniapp:* and fc:frame:* meta tags
  4. Enrich cast entities with derived.frame_meta object

Use Proxy: Set proxy field to avoid rate limits (e.g., "RESIDENTIAL" for Apify Proxy)

Performance: Adds ~2-5s per URL. Use maxEmbedsPerRun to cap crawling time.

Neynar v2 Integration

Provide neynarApiKey to use Neynar's hosted Hub endpoints instead of direct Hub HTTP.

Benefits:

  • Faster, managed infrastructure
  • No self-hosted Hub required
  • Additional features (v2 only; v1 EOL March 31, 2025)

Records flagged: All entities get ingest_source: "neynar_v2"

Client API (Experimental)

Set clientApi: true to enable Warpcast-specific endpoints (e.g., trending, channels).

Warning: Non-protocol data. Records flagged as ingest_source: "client_api" to avoid confusion.

State Checkpointing & Resumability

The actor automatically persists state every 30 seconds and on Apify migration events:

  • Per-FID checkpoints: { lastTs, lastPageToken } for resuming mid-pagination
  • Per-Shard checkpoints: { lastEventId } for event tail mode
  • Migration-safe: Survives container restarts and platform migrations

To resume a run:

  1. Start the actor with same input
  2. State is automatically restored
  3. Scraping continues from last checkpoint

Performance Tips

  1. Use time filters: Narrow startTimestamp/stopTimestamp for faster runs
  2. Batch FIDs: Process related users together to share dedup cache
  3. Tune pageSize: Larger pages (1000) = fewer requests, but slower per-request
  4. Set maxRecords: Safety limit prevents runaway costs
  5. Monitor rate limits: Default 600 req/min is conservative; increase if Hub allows
  6. Schedule tail runs: Run every 1-2 days to avoid event pruning

Limitations & Best Practices

Hub Event Pruning

  • Limitation: Hubs prune events older than ~3 days
  • Best Practice: Schedule tail runs every 1-2 days for continuous ingestion
  • Limitation: Hub API doesn't support time filters for reactions/links
  • Workaround: Actor fetches all and filters manually in byTime mode (slower)

Embed Fetching

  • Limitation: Some URLs may be slow, dead, or behind auth
  • Best Practice: Use maxEmbedsPerRun cap and Apify Proxy to avoid timeouts

Rate Limiting

  • Default: 600 req/min (conservative)
  • Tuning: Increase requestPerMinute if your Hub supports higher rates
  • Public Hubs: May have stricter limits; monitor 429 responses

Pricing & Compute

Approximate compute units (based on default settings):

Run TypeRecordsCompute UnitsNotes
Small backfill<10k~0.012-3 FIDs, no embeds
Medium backfill100k~0.510-20 FIDs, all entities
Large backfill1M~5100+ FIDs or full shard scan
Tail (1 hour)1k events~0.005Near-real-time streaming
With embeds+100 URLs+0.02 per 100Crawlee overhead

Formula: ~0.5 CU per 100k records (without embeds)

Example Use Cases

Social Graph Analysis

{
"mode": "byFids",
"fids": [2, 3, 6833, 5650],
"include": {
"links": true,
"userData": true
}
}

Output: Follow relationships + user profiles for network analysis

Content Research

{
"mode": "byTime",
"fids": [2],
"startTimestamp": 100000000,
"stopTimestamp": 100050000,
"include": {
"casts": true,
"reactions": true
}
}

Output: All casts + reactions during a specific event

Real-Time Dashboard

{
"mode": "tailEvents",
"tail": { "fromEventId": "0" },
"maxRecords": 10000
}

Output: Live stream of all protocol events (schedule every hour)

Frame/Mini-App Catalog

{
"mode": "byFids",
"fids": [2, 3],
"fetchEmbeds": true,
"maxEmbedsPerRun": 200,
"include": {
"casts": true
}
}

Output: Casts with Frame/Mini-App metadata extracted

Troubleshooting

"Failed to connect to Hub"

  • Verify hubBaseUrl is correct and accessible
  • Check Hub is running and serving HTTP API on port 3381
  • Try public Hub: https://hub.pinata.cloud

"No data returned"

  • Verify FIDs exist and have activity
  • Check time window isn't too narrow (byTime mode)
  • Ensure include.* filters aren't excluding all data

"Max records limit reached"

  • Increase maxRecords or remove limit for full backfill
  • Use checkpointing to resume in multiple runs

"Rate limit errors (429)"

  • Decrease requestPerMinute
  • Add delays between runs
  • Use Neynar hosted Hub (better rate limits)

"Event tail missing data"

  • Events pruned >3 days ago
  • Schedule runs more frequently (every 1-2 days)
  • Use byFids mode for historical backfill

Data Views

The actor provides pre-configured dataset views:

  1. Overview: All entities with key identifiers
  2. Casts: Cast content, timestamps, and URLs
  3. Reactions: Likes and recasts by FID
  4. Follows: Follow relationships (social graph edges)
  5. Users: User profiles and metadata

Access views in Apify Console → Dataset → Views tab

Support

Version History

  • 1.0.0 (2025-01) - Initial release
    • Three ingestion modes (byFids, byTime, tailEvents)
    • Hub HTTP API integration
    • State checkpointing
    • Optional Frame/Mini-App parsing
    • Neynar v2 support

License

MIT License - Free for commercial and non-commercial use