ποΈ U.S. House Congress Trade Tracker
Pricing
from $1.00 / 1,000 transaction records
ποΈ U.S. House Congress Trade Tracker
Track every U.S. House member stock trade automatically. Clean, structured data from official disclosures β member name, ticker, trade date, amount. Perfect for investors following congress trades & quant researchers. No PDF parsing needed.
Pricing
from $1.00 / 1,000 transaction records
Rating
0.0
(0)
Developer
Fatih Δ°lhan
Maintained by CommunityActor stats
0
Bookmarked
10
Total users
5
Monthly active users
a day ago
Last modified
Categories
Share
U.S. House Trading Pipeline
Nancy Pelosi files a $500kβ$1M purchase of Nvidia options. Three days later it's on Reddit. Two weeks later it's on the news.
This pipeline delivers that filing β and every other House PTR β as clean JSON, within 24 hours of the official disclosure. No third-party aggregators. Direct from the Clerk of the House.
Sister project to senate-trading-pipeline. Same target schema, separate fetcher + PDF parser. Run either or both.
Who uses this
- Retail traders tracking which Congress members are buying/selling before major legislation β defense stocks before NDAA votes, pharma before drug pricing bills, tech before antitrust hearings
- Fintech developers building portfolio tools, alert systems, or dashboards on top of STOCK Act data
- Journalists and researchers monitoring congressional trading patterns β no account, no paywall, raw government data
- Quiver Quantitative / Capitol Trades users who want the raw feed instead of a third-party UI
Why this instead of Quiver or Capitol Trades? Both aggregate from the same source β the Clerk of the House. This pipeline pulls directly from the official ZIP archive. No middleman, no rate limits, no subscription. You own the pipeline.
What it produces
One row per individual transaction reported in a House PTR:
{"id": "4d6016b44239f646476ffac6798f21ae3e32c8ed75ea6c5b50a0bbdf9e5d3296","politician": "Mark Alford","transaction_date": "2026-03-16","filing_date": "2026-03-31","ticker": "AMZN","asset_name": "Amazon.com, Inc. - Common Stock","asset_type": "Stock","type": "sell","amount_min": 1001,"amount_max": 15000,"owner": "self"}
| Field | Type | Notes |
|---|---|---|
id | string | SHA-256 of politician|date|asset|amount_min|amount_max β stable dedup key |
politician | string | Filer name as it appears on the PTR |
transaction_date | YYYY-MM-DD | Trade execution date |
filing_date | YYYY-MM-DD | Date the PTR was submitted to the House Clerk |
ticker | string | null | null for bonds, municipals, structured notes |
asset_name | string | Full asset description |
asset_type | string | Stock, Stock Option, Mutual Fund, Corporate Bond, etc. |
type | 'buy' | 'sell' | Purchase β buy; Sale (Full)/Sale (Partial) β sell |
amount_min | integer | Lower bound of reported amount range, USD |
amount_max | integer | null | Upper bound. null for unbounded "Over $X" disclosures |
owner | 'self' | 'joint' | 'spouse' | 'child' | Account owner per STOCK Act categories |
How it works
ZIP fetch XML parse PDF download Text extract Normalizeββββββββββββββββ ββββββββββββββββββ βββββββββββββββββ ββββββββββββββββ βββββββββββββ <YEAR>FD.zip βββΆβ <YEAR>FD.xml βββΆβ /ptr-pdfs/ βββΆβ pdf-parse βββΆβ buy/sell ββ from β β filter β β <YEAR>/ β β + marker- β β + amount ββ disclosures- β β FilingType='P' β β <DocID>.pdf β β anchored β β ranges ββ clerk β β + date window β β (~600ms each) β β regex β β + dates βββββββββββββββββ ββββββββββββββββββ βββββββββββββββββ ββββββββββββββββ ββββββββββββββΌβββββββββββββββββββββ Dedup (SHA-256) ββ + Apify Dataset βββββββββββββββββββββ
1. ZIP fetch. A single HTTPS GET pulls the year-to-date ZIP from https://disclosures-clerk.house.gov/public_disc/financial-pdfs/<YEAR>FD.zip. No proxy needed β plain HTTPS, no Akamai, no terms gate.
2. XML index. Inside the ZIP is <YEAR>FD.xml listing every disclosure for the year. Filter to FilingType=P (Periodic Transaction Report) within the configured date window.
3. Per-PTR PDF fetch. Each XML entry has a DocID. Fetch https://disclosures-clerk.house.gov/public_disc/ptr-pdfs/<YEAR>/<DocID>.pdf for each one. Rate-limited to 600ms between requests.
4. Text extraction. pdf-parse reads the PDF and returns text. House PTRs are machine-generated so the text is clean β but the layout has quirks (header null bytes, glued fields, comment-block bleed).
5. Marker-anchored parsing. Each transaction row in the PDF includes a (TICKER) [TYPE] marker. The parser anchors on these markers, walks backward for the asset name, forward for the transaction details, and emits one record per marker.
6. Normalize + dedup + push. Map source codes (P/S/S (partial), SP/DC/JT) to the canonical schema, hash the natural key for dedup, push to the default Apify dataset.
Older filings filed on paper produce scanned-image PDFs that pdf-parse can't extract from. The parser logs them as unparseable and continues β about 5% of historical PTRs. OCR fallback is on the Phase 2 list.
Apify deployment
The actor lives at apify.com/seralifatih/congress-trading-pipeline-1.
To run it via API:
# Trigger a runcurl -X POST "https://api.apify.com/v2/acts/seralifatih~congress-trading-pipeline-1/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{ "fetchDaysBack": 30 }'# Read the datasetcurl "https://api.apify.com/v2/datasets/<dataset-id>/items?token=YOUR_TOKEN&format=json"
Input schema
| Field | Type | Default | Description |
|---|---|---|---|
fetchDaysBack | integer | 90 | Rolling window of PTRs to fetch (1-365) |
fromDate | string (YYYY-MM-DD) | β | Explicit start date. Overrides fetchDaysBack |
toDate | string (YYYY-MM-DD) | today | Explicit end date |
debugPtrLimit | integer | 0 | Diagnostic β fetch only first N PTRs |
debugPdfText | boolean | false | Log first 2KB of any PDF where regex finds 0 rows |
Self-hosting
If you'd rather run it yourself:
git clone https://github.com/seralifatih/house-trading-pipelinecd house-trading-pipelinenpm installcp .env.example .envnpm run buildnode dist/apify.js # or wire your own runner around runPipeline()
The pipeline's main export is in src/scheduler/pipeline.ts:
import { runPipeline } from './scheduler/pipeline.js';import { SqliteStore } from './store/sqliteStore.js';const stats = await runPipeline(SqliteStore.getInstance(), {fromDate: '2026-01-01',toDate: '2026-04-30',});console.log(stats); // { inserted, skipped, errors }
Storage is pluggable β StoreAdapter interface in src/types/index.ts. The repo ships with a SQLite implementation for local runs and an Apify Dataset implementation for cloud runs. Add Postgres or whatever else by implementing the same interface.
Project layout
src/βββ apify.ts Actor entry point β wires runPipeline + ApifyStoreβββ fetcher/β βββ houseFetcher.ts ZIP download + XML index + per-PDF fetchβββ parser/β βββ housePdfParser.ts Marker-anchored regex extractorβββ transformer/β βββ normalize.ts Source codes β canonical schemaβββ store/β βββ sqliteStore.ts Local SQLite via better-sqlite3β βββ apifyStore.ts Apify Dataset via Apify SDKβββ scheduler/β βββ pipeline.ts Fetch β parse β normalize β dedup β saveβββ utils/β βββ config.ts Zod-validated env varsβ βββ dedup.ts SHA-256 ID generationβ βββ retry.ts Exponential backoff with jitterβ βββ logger.ts JSON-lines structured loggerβββ types/βββ index.ts RawTransaction, Transaction, StoreAdapter, schemas
Data source
Clerk of the U.S. House β Financial Disclosure Reports
Public domain government records published under the STOCK Act of 2012. The Clerk publishes a fresh ZIP daily containing every disclosure filed that year.
This pipeline does not scrape third-party aggregators. It pulls only from the official source.
Phase 2
- OCR fallback for scanned PDFs (older paper filings)
- Ticker enrichment for bond/muni rows where the source omits the ticker
- Cross-chamber merge actor that consumes both Senate + House datasets and emits a single Congress-wide stream
License
MIT. Use the actor or the source however you want.