CourtListener RAG Extractor
Pricing
from $15.00 / 1,000 opinion extracteds
CourtListener RAG Extractor
Extract SCOTUS and U.S. federal appeals opinions from CourtListener into normalized RAG-ready JSON with fixed-token chunks, metadata, citations, and summary fallback. Built for legal AI and litigation research pipelines. $0.03 per opinion.
Pricing
from $15.00 / 1,000 opinion extracteds
Rating
0.0
(0)
Developer
Devansh Tiwari
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
What does CourtListener RAG Extractor do?
CourtListener RAG Extractor pulls U.S. federal court opinions from CourtListener and normalizes them into RAG-ready JSON records with fixed-token chunks, citations, and metadata. It focuses on SCOTUS and all 13 federal Courts of Appeals in v1.
Use it when you need legal AI corpora that are directly usable in LangChain, LlamaIndex, OpenAI vector stores, Qdrant, Pinecone, Weaviate, or pgvector pipelines.
Why use it?
- Build legal-AI retrieval corpora without writing custom ETL around CourtListener REST APIs.
- Get standardized record shape across courts: opinion ID, cluster ID, case naming, docket, filing date, citations, and source URL.
- Keep chunk size consistent for embedding and reranking workflows (512 tokens with 50-token overlap).
- Support litigation analytics and case-law search features with structured citation metadata.
- Run as an Apify Actor with scheduling, API access, and integration-ready dataset outputs.
How to use it?
- Open the Actor in Apify Console.
- Set your date range (
dateFrom,dateTo) and optional court filters. - Optionally add a CourtListener API key from https://www.courtlistener.com/api/ for faster and richer detail retrieval.
- Set
maxOpinionsto cap cost and runtime. - Run the Actor and consume the dataset output via API or download.
Input fields
courtIds(array): Court slugs. Empty means all 14 supported federal courts.dateFrom(string, required): Inclusive lower bound inYYYY-MM-DDformat.dateTo(string, required): Inclusive upper bound inYYYY-MM-DDformat.searchQuery(string): Optional CourtListener query syntax.maxOpinions(integer): Hard cap on returned opinions.courtListenerApiKey(secret string): Optional token for higher throughput and detail endpoint access.
Output schema
Each dataset item is one opinion record:
{"opinion_id": "11314034","cluster_id": "10846667","court": "scotus","court_full": "Supreme Court of the United States","case_name": "Enbridge Energy, LP v. Nessel","case_name_short": null,"docket_number": "24-783","date_filed": "2026-04-22","citation_count": 21,"citations": ["608 U.S. ___"],"absolute_url": "https://www.courtlistener.com/opinion/11314034/enbridge-energy-lp-v-nessel/","source": "summary","chunks": [{ "idx": 0, "text": "...", "tokens": 512 },{ "idx": 1, "text": "...", "tokens": 213 }]}
Data table
| Field | Type | Description |
|---|---|---|
opinion_id | string | CourtListener opinion ID |
court | string | Court slug (scotus, ca1...cafc) |
case_name | string | Canonical case name |
docket_number | string | Docket number |
date_filed | string | Filing date |
citation_count | number | Citation count from cluster metadata when available |
source | string | full_text or summary |
absolute_url | string | Absolute CourtListener opinion URL |
Pricing / cost estimation
Price target is $0.03 per opinion, with a free trial of 5 results.
| Opinions | Estimated cost |
|---|---|
| 100 | $3 |
| 1,000 | $30 |
| 10,000 | $300 |
Tips / Advanced
- Add
courtListenerApiKeyfor stable throughput and richer detail endpoint access. - Narrow with
searchQueryfor topic-specific corpora. - Keep date windows smaller for incremental backfills.
- Start with low
maxOpinionsfor schema checks, then scale.
Limits
- v1 supports only SCOTUS + 13 federal Courts of Appeals.
- No district or state courts in v1.
- No majority/concurrence/dissent section separation in v1.
- No citation graph extraction in v1.
Legal disclaimer
This Actor extracts publicly available U.S. federal court opinions from CourtListener (operated by the Free Law Project). Output is not legal advice. Users are responsible for compliance with local professional-responsibility rules when using this data.
FAQ
Do I need a CourtListener API key? No, but it is strongly recommended. Without a key, the Actor uses conservative rate limits and may rely more heavily on summary-level fields.
What happens when opinion detail endpoints are unavailable? The run continues with available search metadata and summary fallback so records still remain schema-consistent.
Does this include citation graph relationships? No. v1 includes citation strings and counts, not graph topology.
Support
If you need feature requests or issue triage, open a ticket in this repo's Issues tab.