Pricing

from $1.00 / 1,000 transaction records

U.S. House Trading Pipeline

Fetches U.S. House Periodic Transaction Reports from the official disclosures-clerk.house.gov ZIP archive and parses per-filing PDFs into a clean, deduplicated transaction dataset. STOCK Act compliant. Public domain data, no third-party vendors.

Pricing

from $1.00 / 1,000 transaction records

Rating

0.0

(0)

Developer

Fatih İlhan

Actor stats

Bookmarked

Total users

Monthly active users

20 days ago

Last modified

What it produces

One row per individual transaction reported in a House PTR:

{
  "id": "4d6016b44239f646476ffac6798f21ae3e32c8ed75ea6c5b50a0bbdf9e5d3296",
  "politician": "Mark Alford",
  "transaction_date": "2026-03-16",
  "filing_date": "2026-03-31",
  "ticker": "AMZN",
  "asset_name": "Amazon.com, Inc. - Common Stock",
  "asset_type": "Stock",
  "type": "sell",
  "amount_min": 1001,
  "amount_max": 15000,
  "owner": "self"
}

Field	Type	Notes
`id`	`string`	SHA-256 of `politician\|date\|asset\|amount_min\|amount_max` — stable dedup key
`politician`	`string`	Filer name as it appears on the PTR
`transaction_date`	`YYYY-MM-DD`	Trade execution date
`filing_date`	`YYYY-MM-DD`	Date the PTR was submitted to the House Clerk
`ticker`	`string \| null`	`null` for bonds, municipals, structured notes
`asset_name`	`string`	Full asset description
`asset_type`	`string`	`Stock`, `Stock Option`, `Mutual Fund`, `Corporate Bond`, etc.
`type`	`'buy' \| 'sell'`	`Purchase` → `buy`; `Sale (Full)`/`Sale (Partial)` → `sell`
`amount_min`	`integer`	Lower bound of reported amount range, USD
`amount_max`	`integer \| null`	Upper bound. `null` for unbounded "Over $X" disclosures
`owner`	`'self' \| 'joint' \| 'spouse' \| 'child'`	Account owner per STOCK Act categories

How it works

ZIP fetch         XML parse          PDF download       Text extract       Normalize
┌──────────────┐  ┌────────────────┐  ┌───────────────┐  ┌──────────────┐  ┌──────────┐
│ <YEAR>FD.zip │─▶│ <YEAR>FD.xml   │─▶│ /ptr-pdfs/    │─▶│  pdf-parse   │─▶│ buy/sell │
│ from         │  │ filter         │  │ <YEAR>/       │  │ + marker-    │  │ + amount │
│ disclosures- │  │ FilingType='P' │  │ <DocID>.pdf   │  │ anchored     │  │ ranges   │
│ clerk        │  │ + date window  │  │ (~600ms each) │  │ regex        │  │ + dates  │
└──────────────┘  └────────────────┘  └───────────────┘  └──────────────┘  └──────────┘
                                                                                 │
                                                                                 ▼
                                                                     ┌──────────────────┐
                                                                     │ Dedup (SHA-256)  │
                                                                     │ + Apify Dataset  │
                                                                     └──────────────────┘

1. ZIP fetch. A single HTTPS GET pulls the year-to-date ZIP from https://disclosures-clerk.house.gov/public_disc/financial-pdfs/<YEAR>FD.zip. No proxy needed — plain HTTPS, no Akamai, no terms gate.

2. XML index. Inside the ZIP is <YEAR>FD.xml listing every disclosure for the year. Filter to FilingType=P (Periodic Transaction Report) within the configured date window.

3. Per-PTR PDF fetch. Each XML entry has a DocID. Fetch https://disclosures-clerk.house.gov/public_disc/ptr-pdfs/<YEAR>/<DocID>.pdf for each one. Rate-limited to 600ms between requests.

4. Text extraction. pdf-parse reads the PDF and returns text. House PTRs are machine-generated so the text is clean — but the layout has quirks (header null bytes, glued fields, comment-block bleed).

5. Marker-anchored parsing. Each transaction row in the PDF includes a (TICKER) [TYPE] marker. The parser anchors on these markers, walks backward for the asset name, forward for the transaction details, and emits one record per marker.

6. Normalize + dedup + push. Map source codes (P/S/S (partial), SP/DC/JT) to the canonical schema, hash the natural key for dedup, push to the default Apify dataset.

Older filings filed on paper produce scanned-image PDFs that pdf-parse can't extract from. The parser logs them as unparseable and continues — about 5% of historical PTRs. OCR fallback is on the Phase 2 list.

Apify deployment

The actor lives at apify.com/seralifatih/congress-trading-pipeline-1.

To run it via API:

# Trigger a run
curl -X POST "https://api.apify.com/v2/acts/seralifatih~congress-trading-pipeline-1/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "fetchDaysBack": 30 }'

# Read the dataset
curl "https://api.apify.com/v2/datasets/<dataset-id>/items?token=YOUR_TOKEN&format=json"

Input schema

Field	Type	Default	Description
`fetchDaysBack`	`integer`	`90`	Rolling window of PTRs to fetch (1-365)
`fromDate`	`string` (YYYY-MM-DD)	—	Explicit start date. Overrides `fetchDaysBack`
`toDate`	`string` (YYYY-MM-DD)	today	Explicit end date
`debugPtrLimit`	`integer`	`0`	Diagnostic — fetch only first N PTRs
`debugPdfText`	`boolean`	`false`	Log first 2KB of any PDF where regex finds 0 rows

Self-hosting

If you'd rather run it yourself:

git clone https://github.com/seralifatih/house-trading-pipeline
cd house-trading-pipeline
npm install
cp .env.example .env
npm run build
node dist/apify.js   # or wire your own runner around runPipeline()

The pipeline's main export is in src/scheduler/pipeline.ts:

import { runPipeline } from './scheduler/pipeline.js';
import { SqliteStore } from './store/sqliteStore.js';

const stats = await runPipeline(SqliteStore.getInstance(), {
  fromDate: '2026-01-01',
  toDate: '2026-04-30',
});

console.log(stats); // { inserted, skipped, errors }

Storage is pluggable — StoreAdapter interface in src/types/index.ts. The repo ships with a SQLite implementation for local runs and an Apify Dataset implementation for cloud runs. Add Postgres or whatever else by implementing the same interface.

Project layout

src/
├── apify.ts                  Actor entry point — wires runPipeline + ApifyStore
├── fetcher/
│   └── houseFetcher.ts       ZIP download + XML index + per-PDF fetch
├── parser/
│   └── housePdfParser.ts     Marker-anchored regex extractor
├── transformer/
│   └── normalize.ts          Source codes → canonical schema
├── store/
│   ├── sqliteStore.ts        Local SQLite via better-sqlite3
│   └── apifyStore.ts         Apify Dataset via Apify SDK
├── scheduler/
│   └── pipeline.ts           Fetch → parse → normalize → dedup → save
├── utils/
│   ├── config.ts             Zod-validated env vars
│   ├── dedup.ts              SHA-256 ID generation
│   ├── retry.ts              Exponential backoff with jitter
│   └── logger.ts             JSON-lines structured logger
└── types/
    └── index.ts              RawTransaction, Transaction, StoreAdapter, schemas

Data source

Clerk of the U.S. House — Financial Disclosure Reports

Public domain government records published under the STOCK Act of 2012. The Clerk publishes a fresh ZIP daily containing every disclosure filed that year.

This pipeline does not scrape third-party aggregators. It pulls only from the official source.

Phase 2

OCR fallback for scanned PDFs (older paper filings)
Ticker enrichment for bond/muni rows where the source omits the ticker
Cross-chamber merge actor that consumes both Senate + House datasets and emits a single Congress-wide stream

License

MIT. Use the actor or the source however you want.

U.S. Senate Trading Pipeline

seralifatih/congress-trading-pipeline

Fetches U.S. Senate Periodic Transaction Reports (PTRs) directly from the official efdsearch.senate.gov source. Normalizes filings into a clean, deduplicated dataset with politician, ticker, asset, type, amount range, dates, and owner. No third-party vendors. Public domain data. STOCK Act compliant.

Fatih İlhan

Congress Stock Trades - Politician Financial Disclosures

pink_comic/congress-stock-trading-disclosures

Track US Congress stock trades and financial disclosures. Search House and Senate periodic transaction reports (PTRs) by member name, state, or filing type. Essential for journalists, researchers, compliance teams, and transparency advocates. Official data from House Clerk and Senate eFD.

Ava Torres

UK Companies House Extractor

bot-rebellion/uk-companies-house-extractor

Search and extract UK company data from Companies House API profiles officers and filing history

Bot-Rebellion

House Disclosure Scraper

velvety_bedbug/house-disclosure-scraper

Scrape annual financial disclosures filed by US Representatives and House officers from the Clerk's bulk XML feed. Returns prefix/first/last/suffix, state/district, year, filing type, PDF link, and disclosure date.

Peters Bugs

House Disclosure Scraper

chimerical_quicklime/house-disclosure-scraper

Khrystyna Skotte

UK Companies House Scraper

janbruinier/companies-house-uk-scraper

Scrapes company information from the UK Companies House register.

Jan Bruinier

UK Companies House Search

ryanclinton/uk-companies-house

Search the official UK Companies House registry for company profiles, directors, officers, filing history, and registered addresses.

Ryan Clinton

UK Companies House

artificially/uk-companies-house

Search and extract company data from UK Companies House. Get company details, officers, filing history, and more from the official UK government registry.

Artificially

UK Companies House Scraper

gio21/companies-house-uk-scraper

Scrape the official UK Companies House registry. Company number, name, status, incorporation, type, address. Free public data. Pay per company.

Gio

Senator Financial Disclosures Scraper

kenny.yona/sen-disclosures-scraper

This actor scrapes and downloads U.S. House of Representatives financial disclosure PDFs by member last name and/or filing year. Perfect for journalists, researchers, and compliance professionals seeking fast, structured access to official disclosure documents.