CourtListener RAG Extractor avatar

CourtListener RAG Extractor

Pricing

from $15.00 / 1,000 opinion extracteds

Go to Apify Store
CourtListener RAG Extractor

CourtListener RAG Extractor

Extract SCOTUS and U.S. federal appeals opinions from CourtListener into normalized RAG-ready JSON with fixed-token chunks, metadata, citations, and summary fallback. Built for legal AI and litigation research pipelines. $0.03 per opinion.

Pricing

from $15.00 / 1,000 opinion extracteds

Rating

0.0

(0)

Developer

Devansh Tiwari

Devansh Tiwari

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

What does CourtListener RAG Extractor do?

CourtListener RAG Extractor pulls U.S. federal court opinions from CourtListener and normalizes them into RAG-ready JSON records with fixed-token chunks, citations, and metadata. It focuses on SCOTUS and all 13 federal Courts of Appeals in v1.

Use it when you need legal AI corpora that are directly usable in LangChain, LlamaIndex, OpenAI vector stores, Qdrant, Pinecone, Weaviate, or pgvector pipelines.

Why use it?

  • Build legal-AI retrieval corpora without writing custom ETL around CourtListener REST APIs.
  • Get standardized record shape across courts: opinion ID, cluster ID, case naming, docket, filing date, citations, and source URL.
  • Keep chunk size consistent for embedding and reranking workflows (512 tokens with 50-token overlap).
  • Support litigation analytics and case-law search features with structured citation metadata.
  • Run as an Apify Actor with scheduling, API access, and integration-ready dataset outputs.

How to use it?

  1. Open the Actor in Apify Console.
  2. Set your date range (dateFrom, dateTo) and optional court filters.
  3. Optionally add a CourtListener API key from https://www.courtlistener.com/api/ for faster and richer detail retrieval.
  4. Set maxOpinions to cap cost and runtime.
  5. Run the Actor and consume the dataset output via API or download.

Input fields

  • courtIds (array): Court slugs. Empty means all 14 supported federal courts.
  • dateFrom (string, required): Inclusive lower bound in YYYY-MM-DD format.
  • dateTo (string, required): Inclusive upper bound in YYYY-MM-DD format.
  • searchQuery (string): Optional CourtListener query syntax.
  • maxOpinions (integer): Hard cap on returned opinions.
  • courtListenerApiKey (secret string): Optional token for higher throughput and detail endpoint access.

Output schema

Each dataset item is one opinion record:

{
"opinion_id": "11314034",
"cluster_id": "10846667",
"court": "scotus",
"court_full": "Supreme Court of the United States",
"case_name": "Enbridge Energy, LP v. Nessel",
"case_name_short": null,
"docket_number": "24-783",
"date_filed": "2026-04-22",
"citation_count": 21,
"citations": ["608 U.S. ___"],
"absolute_url": "https://www.courtlistener.com/opinion/11314034/enbridge-energy-lp-v-nessel/",
"source": "summary",
"chunks": [
{ "idx": 0, "text": "...", "tokens": 512 },
{ "idx": 1, "text": "...", "tokens": 213 }
]
}

Data table

FieldTypeDescription
opinion_idstringCourtListener opinion ID
courtstringCourt slug (scotus, ca1...cafc)
case_namestringCanonical case name
docket_numberstringDocket number
date_filedstringFiling date
citation_countnumberCitation count from cluster metadata when available
sourcestringfull_text or summary
absolute_urlstringAbsolute CourtListener opinion URL

Pricing / cost estimation

Price target is $0.03 per opinion, with a free trial of 5 results.

OpinionsEstimated cost
100$3
1,000$30
10,000$300

Tips / Advanced

  • Add courtListenerApiKey for stable throughput and richer detail endpoint access.
  • Narrow with searchQuery for topic-specific corpora.
  • Keep date windows smaller for incremental backfills.
  • Start with low maxOpinions for schema checks, then scale.

Limits

  • v1 supports only SCOTUS + 13 federal Courts of Appeals.
  • No district or state courts in v1.
  • No majority/concurrence/dissent section separation in v1.
  • No citation graph extraction in v1.

This Actor extracts publicly available U.S. federal court opinions from CourtListener (operated by the Free Law Project). Output is not legal advice. Users are responsible for compliance with local professional-responsibility rules when using this data.

FAQ

Do I need a CourtListener API key? No, but it is strongly recommended. Without a key, the Actor uses conservative rate limits and may rely more heavily on summary-level fields.

What happens when opinion detail endpoints are unavailable? The run continues with available search metadata and summary fallback so records still remain schema-consistent.

Does this include citation graph relationships? No. v1 includes citation strings and counts, not graph topology.

Support

If you need feature requests or issue triage, open a ticket in this repo's Issues tab.