πŸ›οΈ U.S. House Congress Trade Tracker avatar

πŸ›οΈ U.S. House Congress Trade Tracker

Pricing

from $1.00 / 1,000 transaction records

Go to Apify Store
πŸ›οΈ U.S. House Congress Trade Tracker

πŸ›οΈ U.S. House Congress Trade Tracker

Track every U.S. House member stock trade automatically. Clean, structured data from official disclosures β€” member name, ticker, trade date, amount. Perfect for investors following congress trades & quant researchers. No PDF parsing needed.

Pricing

from $1.00 / 1,000 transaction records

Rating

0.0

(0)

Developer

Fatih Δ°lhan

Fatih Δ°lhan

Maintained by Community

Actor stats

0

Bookmarked

10

Total users

5

Monthly active users

a day ago

Last modified

Share

U.S. House Trading Pipeline

Nancy Pelosi files a $500k–$1M purchase of Nvidia options. Three days later it's on Reddit. Two weeks later it's on the news.

This pipeline delivers that filing β€” and every other House PTR β€” as clean JSON, within 24 hours of the official disclosure. No third-party aggregators. Direct from the Clerk of the House.

Sister project to senate-trading-pipeline. Same target schema, separate fetcher + PDF parser. Run either or both.

Who uses this

  • Retail traders tracking which Congress members are buying/selling before major legislation β€” defense stocks before NDAA votes, pharma before drug pricing bills, tech before antitrust hearings
  • Fintech developers building portfolio tools, alert systems, or dashboards on top of STOCK Act data
  • Journalists and researchers monitoring congressional trading patterns β€” no account, no paywall, raw government data
  • Quiver Quantitative / Capitol Trades users who want the raw feed instead of a third-party UI

Why this instead of Quiver or Capitol Trades? Both aggregate from the same source β€” the Clerk of the House. This pipeline pulls directly from the official ZIP archive. No middleman, no rate limits, no subscription. You own the pipeline.


What it produces

One row per individual transaction reported in a House PTR:

{
"id": "4d6016b44239f646476ffac6798f21ae3e32c8ed75ea6c5b50a0bbdf9e5d3296",
"politician": "Mark Alford",
"transaction_date": "2026-03-16",
"filing_date": "2026-03-31",
"ticker": "AMZN",
"asset_name": "Amazon.com, Inc. - Common Stock",
"asset_type": "Stock",
"type": "sell",
"amount_min": 1001,
"amount_max": 15000,
"owner": "self"
}
FieldTypeNotes
idstringSHA-256 of politician|date|asset|amount_min|amount_max β€” stable dedup key
politicianstringFiler name as it appears on the PTR
transaction_dateYYYY-MM-DDTrade execution date
filing_dateYYYY-MM-DDDate the PTR was submitted to the House Clerk
tickerstring | nullnull for bonds, municipals, structured notes
asset_namestringFull asset description
asset_typestringStock, Stock Option, Mutual Fund, Corporate Bond, etc.
type'buy' | 'sell'Purchase β†’ buy; Sale (Full)/Sale (Partial) β†’ sell
amount_minintegerLower bound of reported amount range, USD
amount_maxinteger | nullUpper bound. null for unbounded "Over $X" disclosures
owner'self' | 'joint' | 'spouse' | 'child'Account owner per STOCK Act categories

How it works

ZIP fetch XML parse PDF download Text extract Normalize
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ <YEAR>FD.zip │─▢│ <YEAR>FD.xml │─▢│ /ptr-pdfs/ │─▢│ pdf-parse │─▢│ buy/sell β”‚
β”‚ from β”‚ β”‚ filter β”‚ β”‚ <YEAR>/ β”‚ β”‚ + marker- β”‚ β”‚ + amount β”‚
β”‚ disclosures- β”‚ β”‚ FilingType='P' β”‚ β”‚ <DocID>.pdf β”‚ β”‚ anchored β”‚ β”‚ ranges β”‚
β”‚ clerk β”‚ β”‚ + date window β”‚ β”‚ (~600ms each) β”‚ β”‚ regex β”‚ β”‚ + dates β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Dedup (SHA-256) β”‚
β”‚ + Apify Dataset β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. ZIP fetch. A single HTTPS GET pulls the year-to-date ZIP from https://disclosures-clerk.house.gov/public_disc/financial-pdfs/<YEAR>FD.zip. No proxy needed β€” plain HTTPS, no Akamai, no terms gate.

2. XML index. Inside the ZIP is <YEAR>FD.xml listing every disclosure for the year. Filter to FilingType=P (Periodic Transaction Report) within the configured date window.

3. Per-PTR PDF fetch. Each XML entry has a DocID. Fetch https://disclosures-clerk.house.gov/public_disc/ptr-pdfs/<YEAR>/<DocID>.pdf for each one. Rate-limited to 600ms between requests.

4. Text extraction. pdf-parse reads the PDF and returns text. House PTRs are machine-generated so the text is clean β€” but the layout has quirks (header null bytes, glued fields, comment-block bleed).

5. Marker-anchored parsing. Each transaction row in the PDF includes a (TICKER) [TYPE] marker. The parser anchors on these markers, walks backward for the asset name, forward for the transaction details, and emits one record per marker.

6. Normalize + dedup + push. Map source codes (P/S/S (partial), SP/DC/JT) to the canonical schema, hash the natural key for dedup, push to the default Apify dataset.

Older filings filed on paper produce scanned-image PDFs that pdf-parse can't extract from. The parser logs them as unparseable and continues β€” about 5% of historical PTRs. OCR fallback is on the Phase 2 list.


Apify deployment

The actor lives at apify.com/seralifatih/congress-trading-pipeline-1.

To run it via API:

# Trigger a run
curl -X POST "https://api.apify.com/v2/acts/seralifatih~congress-trading-pipeline-1/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "fetchDaysBack": 30 }'
# Read the dataset
curl "https://api.apify.com/v2/datasets/<dataset-id>/items?token=YOUR_TOKEN&format=json"

Input schema

FieldTypeDefaultDescription
fetchDaysBackinteger90Rolling window of PTRs to fetch (1-365)
fromDatestring (YYYY-MM-DD)β€”Explicit start date. Overrides fetchDaysBack
toDatestring (YYYY-MM-DD)todayExplicit end date
debugPtrLimitinteger0Diagnostic β€” fetch only first N PTRs
debugPdfTextbooleanfalseLog first 2KB of any PDF where regex finds 0 rows

Self-hosting

If you'd rather run it yourself:

git clone https://github.com/seralifatih/house-trading-pipeline
cd house-trading-pipeline
npm install
cp .env.example .env
npm run build
node dist/apify.js # or wire your own runner around runPipeline()

The pipeline's main export is in src/scheduler/pipeline.ts:

import { runPipeline } from './scheduler/pipeline.js';
import { SqliteStore } from './store/sqliteStore.js';
const stats = await runPipeline(SqliteStore.getInstance(), {
fromDate: '2026-01-01',
toDate: '2026-04-30',
});
console.log(stats); // { inserted, skipped, errors }

Storage is pluggable β€” StoreAdapter interface in src/types/index.ts. The repo ships with a SQLite implementation for local runs and an Apify Dataset implementation for cloud runs. Add Postgres or whatever else by implementing the same interface.


Project layout

src/
β”œβ”€β”€ apify.ts Actor entry point β€” wires runPipeline + ApifyStore
β”œβ”€β”€ fetcher/
β”‚ └── houseFetcher.ts ZIP download + XML index + per-PDF fetch
β”œβ”€β”€ parser/
β”‚ └── housePdfParser.ts Marker-anchored regex extractor
β”œβ”€β”€ transformer/
β”‚ └── normalize.ts Source codes β†’ canonical schema
β”œβ”€β”€ store/
β”‚ β”œβ”€β”€ sqliteStore.ts Local SQLite via better-sqlite3
β”‚ └── apifyStore.ts Apify Dataset via Apify SDK
β”œβ”€β”€ scheduler/
β”‚ └── pipeline.ts Fetch β†’ parse β†’ normalize β†’ dedup β†’ save
β”œβ”€β”€ utils/
β”‚ β”œβ”€β”€ config.ts Zod-validated env vars
β”‚ β”œβ”€β”€ dedup.ts SHA-256 ID generation
β”‚ β”œβ”€β”€ retry.ts Exponential backoff with jitter
β”‚ └── logger.ts JSON-lines structured logger
└── types/
└── index.ts RawTransaction, Transaction, StoreAdapter, schemas

Data source

Clerk of the U.S. House β€” Financial Disclosure Reports

Public domain government records published under the STOCK Act of 2012. The Clerk publishes a fresh ZIP daily containing every disclosure filed that year.

This pipeline does not scrape third-party aggregators. It pulls only from the official source.


Phase 2

  • OCR fallback for scanned PDFs (older paper filings)
  • Ticker enrichment for bond/muni rows where the source omits the ticker
  • Cross-chamber merge actor that consumes both Senate + House datasets and emits a single Congress-wide stream

License

MIT. Use the actor or the source however you want.