U.S. Senate Trading Pipeline avatar

U.S. Senate Trading Pipeline

Pricing

from $1.50 / 1,000 transaction records

Go to Apify Store
U.S. Senate Trading Pipeline

U.S. Senate Trading Pipeline

Fetches U.S. Senate Periodic Transaction Reports (PTRs) directly from the official efdsearch.senate.gov source. Normalizes filings into a clean, deduplicated dataset with politician, ticker, asset, type, amount range, dates, and owner. No third-party vendors. Public domain data. STOCK Act compliant.

Pricing

from $1.50 / 1,000 transaction records

Rating

0.0

(0)

Developer

Fatih İlhan

Fatih İlhan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 hours ago

Last modified

Share

Congress Trading Pipeline — API

Ingests U.S. Senate Periodic Transaction Reports (PTRs) directly from the Senate Electronic Financial Disclosures office, normalizes them, and exposes a JSON API compatible with the existing frontend — replacing the QuiverQuant dependency entirely.

No third-party data vendor required. No scraping. Source is public domain U.S. government disclosure data.


Prerequisites

  • Node.js 18+
  • No external services, databases, or API keys required for MVP

Setup

npm install
cp .env.example .env # edit as needed — all vars have defaults
npm run dev

Server starts on http://localhost:3001.
On first boot the scheduler runs the pipeline immediately, then every 6 hours.

Environment variables

VariableDefaultDescription
PORT3001HTTP listen port
DB_PATH./data/pipeline.dbSQLite file path (created automatically)
LOG_LEVELinfodebug / info / warn / error
NODE_ENVdevelopmentSet to production for JSON-lines log output
CRON_SCHEDULE0 */6 * * *node-cron schedule expression
FETCH_DAYS_BACK90Rolling window of PTRs to fetch
CRON_SECRET(empty)Shared secret for /api/cron and /api/sync-committees
FRONTEND_ORIGINhttp://localhost:3000Allowed CORS origin when running standalone
LAST_RUN_PATH./data/last_run.jsonPersisted last-run stats file

Pipeline architecture

┌──────────┐ ┌──────────┐ ┌─────────────┐ ┌────────┐ ┌────────┐
│ Fetch │──▶│ Parse │──▶│ Transform │──▶│ Dedup │──▶│ Store │
│ │ │ │ │ │ │ │ │ │
│ Senate │ │ JSON │ │ type │ key: │ │ SQLite │
EFD API │ │ primary │ │ amount │ │ name + │ │ INSERT
GET │ │ │ │ dates │ │ date + │ │ OR
100/page │ │ HTML │ │ owner │ │ asset +│ │ IGNORE
│ │ │ fallback │ │ ticker │ │ amount │ │ │
└──────────┘ └──────────┘ └─────────────┘ └────────┘ └────────┘
┌──────────────┐
│ Express API
:3001
└──────────────┘

Source endpoint: GET https://efts.senate.gov/LATEST/search-index
Pagination: 100 records/page, loops until hits.total exhausted
Fallback: if JSON parse yields empty asset_name on all rows, re-parses raw HTML
Retry: 3 attempts with exponential backoff + ±25% jitter on all HTTP calls


API reference

GET /health

$curl http://localhost:3001/health
{
"status": "ok",
"db_count": 847,
"last_run": "2026-04-29T14:23:00.000Z"
}

GET /api/refresh

Returns timestamp of most recently stored record. Called by the frontend on every page mount.

$curl http://localhost:3001/api/refresh
{ "lastUpdated": "2026-04-29T14:23:00.000Z" }

lastUpdated is null if no records exist yet.


POST /api/refresh

Triggers a full pipeline run. Called when the user clicks "Refresh Data" in the frontend.

$curl -X POST http://localhost:3001/api/refresh
{ "ok": true, "signals": 14, "lastUpdated": "2026-04-29T14:23:00.000Z" }

On failure:

{ "ok": false, "error": "Fetch failed: HTTP 503 Service Unavailable" }

GET /api/cron

Same pipeline run as POST /api/refresh, protected by CRON_SECRET. Called by an external scheduler (Cloudflare Worker, cron job, etc.).

curl -H "x-cron-secret: your-secret" http://localhost:3001/api/cron
# or
curl "http://localhost:3001/api/cron?secret=your-secret"
{
"ok": true,
"summary": {
"ingested": 340,
"newTrades": 14,
"signalsGenerated": 14,
"topScore": null,
"topScoreTicker": null,
"runAt": "2026-04-29T14:23:00.000Z"
}
}

Returns 401 if secret is missing or wrong.


GET /api/sync-committees

Syncs congressional committee membership. Protected by CRON_SECRET. Run once on setup, then weekly.

$curl -H "x-cron-secret: your-secret" http://localhost:3001/api/sync-committees
{ "ok": true, "synced": 0 }

GET /api/transactions

Queryable read endpoint. Returns transactions serialized to match the frontend Signal field names.

# All recent transactions (default limit 500)
curl http://localhost:3001/api/transactions
# Filter by ticker
curl "http://localhost:3001/api/transactions?ticker=AAPL"
# Filter by politician (LIKE match, case-insensitive)
curl "http://localhost:3001/api/transactions?politician=Pelosi"
# Date range
curl "http://localhost:3001/api/transactions?date_from=2026-04-01&date_to=2026-04-30"
# Type + owner + pagination
curl "http://localhost:3001/api/transactions?type=buy&owner=joint&limit=50&offset=0"
{
"count": 2,
"data": [
{
"id": "a3f...c1",
"filer_name": "Nancy Pelosi",
"filer_type": "congress",
"trade_type": "purchase",
"ticker": "NVDA",
"asset_name": "NVIDIA Corporation",
"asset_type": "Stock",
"amount_low": 1000001,
"amount_high": 5000000,
"amount_midpoint": 3000000,
"trade_date": "2026-04-29",
"filing_date": "2026-04-29",
"owner": "joint",
"is_active": true
}
]
}

Query parameters:

ParamTypeDescription
politicianstringSubstring match (LIKE)
tickerstringExact match, auto-uppercased
date_fromYYYY-MM-DDInclusive lower bound on transaction_date
date_toYYYY-MM-DDInclusive upper bound on transaction_date
typebuy | sellExact match
ownerself | joint | spouse | childExact match
limitinteger 1–1000Default 500
offsetinteger ≥ 0Default 0

Invalid params return 400:

{ "error": { "date_from": ["Must be YYYY-MM-DD"] } }

GET /api/debug

Dev diagnostics. No auth. Returns DB count and 2 sample records.

$curl http://localhost:3001/api/debug

Cron schedule

Default: 0 */6 * * * (every 6 hours).

Change via CRON_SCHEDULE env var — any valid node-cron expression.

CRON_SCHEDULE="0 */2 * * *" npm run dev # every 2 hours
CRON_SCHEDULE="0 8 * * *" npm run dev # once daily at 08:00

Last run stats (timestamp, inserted, skipped, errors) are persisted to ./data/last_run.json after each run.


Seeding and smoke test

Load 20 realistic fake records covering edge cases (null tickers, spouse/child owners, large amounts, same-day multi-trades, clusters):

$npm run seed

Verify the running server responds correctly:

# Terminal 1
npm run dev
# Terminal 2
npm run smoke

Smoke test exits 0 on all pass, 1 on any failure.


Phase 2 roadmap

House of Representatives disclosures (efd.house.gov) use a different filing format and will be added after Senate coverage is stable. Planned additions: PDF parsing for older PTRs that lack structured data, ticker enrichment via OpenFIGI or a static CUSIP mapping table (resolving the ticker: null cases currently stored as-is), a scoring engine that ranks transactions by conviction signal (cluster detection, filing delay, filer track record), and Telegram/email alerts for high-score transactions. Multi-tenant auth (Supabase RLS + Paddle billing) is tracked separately under the SaaS roadmap.


Data source

All data is sourced from the U.S. Senate Electronic Financial Disclosures system — a public government database. Senate PTR filings are required under the STOCK Act and are public domain. This pipeline does not scrape third-party aggregators.