# CourtListener RAG Extractor (`devanshlive/courtlistener-rag-extractor`) Actor

Extract SCOTUS and U.S. federal appeals opinions from CourtListener into normalized RAG-ready JSON with fixed-token chunks, metadata, citations, and summary fallback. Built for legal AI and litigation research pipelines. $0.03 per opinion.

- **URL**: https://apify.com/devanshlive/courtlistener-rag-extractor.md
- **Developed by:** [Devansh Tiwari](https://apify.com/devanshlive) (community)
- **Categories:** AI, Developer tools, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $15.00 / 1,000 opinion extracteds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## CourtListener RAG Extractor

### What does CourtListener RAG Extractor do?

CourtListener RAG Extractor pulls U.S. federal court opinions from CourtListener and normalizes them into RAG-ready JSON records with fixed-token chunks, citations, and metadata. It focuses on SCOTUS and all 13 federal Courts of Appeals in v1.

Use it when you need legal AI corpora that are directly usable in LangChain, LlamaIndex, OpenAI vector stores, Qdrant, Pinecone, Weaviate, or pgvector pipelines.

### Why use it?

- Build legal-AI retrieval corpora without writing custom ETL around CourtListener REST APIs.
- Get standardized record shape across courts: opinion ID, cluster ID, case naming, docket, filing date, citations, and source URL.
- Keep chunk size consistent for embedding and reranking workflows (512 tokens with 50-token overlap).
- Support litigation analytics and case-law search features with structured citation metadata.
- Run as an Apify Actor with scheduling, API access, and integration-ready dataset outputs.

### How to use it?

1. Open the Actor in Apify Console.
2. Set your date range (`dateFrom`, `dateTo`) and optional court filters.
3. Optionally add a CourtListener API key from https://www.courtlistener.com/api/ for faster and richer detail retrieval.
4. Set `maxOpinions` to cap cost and runtime.
5. Run the Actor and consume the dataset output via API or download.

### Input fields

- `courtIds` (array): Court slugs. Empty means all 14 supported federal courts.
- `dateFrom` (string, required): Inclusive lower bound in `YYYY-MM-DD` format.
- `dateTo` (string, required): Inclusive upper bound in `YYYY-MM-DD` format.
- `searchQuery` (string): Optional CourtListener query syntax.
- `maxOpinions` (integer): Hard cap on returned opinions.
- `courtListenerApiKey` (secret string): Optional token for higher throughput and detail endpoint access.

### Output schema

Each dataset item is one opinion record:

```json
{
    "opinion_id": "11314034",
    "cluster_id": "10846667",
    "court": "scotus",
    "court_full": "Supreme Court of the United States",
    "case_name": "Enbridge Energy, LP v. Nessel",
    "case_name_short": null,
    "docket_number": "24-783",
    "date_filed": "2026-04-22",
    "citation_count": 21,
    "citations": ["608 U.S. ___"],
    "absolute_url": "https://www.courtlistener.com/opinion/11314034/enbridge-energy-lp-v-nessel/",
    "source": "summary",
    "chunks": [
        { "idx": 0, "text": "...", "tokens": 512 },
        { "idx": 1, "text": "...", "tokens": 213 }
    ]
}
````

### Data table

| Field            | Type   | Description                                         |
| ---------------- | ------ | --------------------------------------------------- |
| `opinion_id`     | string | CourtListener opinion ID                            |
| `court`          | string | Court slug (`scotus`, `ca1`...`cafc`)               |
| `case_name`      | string | Canonical case name                                 |
| `docket_number`  | string | Docket number                                       |
| `date_filed`     | string | Filing date                                         |
| `citation_count` | number | Citation count from cluster metadata when available |
| `source`         | string | `full_text` or `summary`                            |
| `absolute_url`   | string | Absolute CourtListener opinion URL                  |

### Pricing / cost estimation

Price target is **$0.03 per opinion**, with a **free trial of 5 results**.

| Opinions | Estimated cost |
| -------- | -------------- |
| 100      | $3             |
| 1,000    | $30            |
| 10,000   | $300           |

### Tips / Advanced

- Add `courtListenerApiKey` for stable throughput and richer detail endpoint access.
- Narrow with `searchQuery` for topic-specific corpora.
- Keep date windows smaller for incremental backfills.
- Start with low `maxOpinions` for schema checks, then scale.

### Limits

- v1 supports only SCOTUS + 13 federal Courts of Appeals.
- No district or state courts in v1.
- No majority/concurrence/dissent section separation in v1.
- No citation graph extraction in v1.

### Legal disclaimer

This Actor extracts publicly available U.S. federal court opinions from CourtListener (operated by the Free Law Project). Output is not legal advice. Users are responsible for compliance with local professional-responsibility rules when using this data.

### FAQ

**Do I need a CourtListener API key?**
No, but it is strongly recommended. Without a key, the Actor uses conservative rate limits and may rely more heavily on summary-level fields.

**What happens when opinion detail endpoints are unavailable?**
The run continues with available search metadata and summary fallback so records still remain schema-consistent.

**Does this include citation graph relationships?**
No. v1 includes citation strings and counts, not graph topology.

### Support

If you need feature requests or issue triage, open a ticket in this repo's Issues tab.

# Actor input Schema

## `courtIds` (type: `array`):

CourtListener court slugs. Empty means all 14 federal courts (SCOTUS + 13 appeals).

## `dateFrom` (type: `string`):

Inclusive lower bound of date\_filed.

## `dateTo` (type: `string`):

Inclusive upper bound of date\_filed.

## `searchQuery` (type: `string`):

CourtListener search syntax. Empty means no keyword filter.

## `maxOpinions` (type: `integer`):

Hard cap on returned opinions. Unauthenticated runs use 1 request/sec. With API key, 5 requests/sec.

## `courtListenerApiKey` (type: `string`):

Free API key from https://www.courtlistener.com/api/.

## Actor input object example

```json
{
  "courtIds": [
    "scotus"
  ],
  "dateFrom": "2024-01-01",
  "dateTo": "2024-06-30",
  "searchQuery": "",
  "maxOpinions": 50
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "courtIds": [
        "scotus"
    ],
    "dateFrom": "2024-01-01",
    "dateTo": "2024-06-30",
    "searchQuery": "",
    "maxOpinions": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("devanshlive/courtlistener-rag-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "courtIds": ["scotus"],
    "dateFrom": "2024-01-01",
    "dateTo": "2024-06-30",
    "searchQuery": "",
    "maxOpinions": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("devanshlive/courtlistener-rag-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "courtIds": [
    "scotus"
  ],
  "dateFrom": "2024-01-01",
  "dateTo": "2024-06-30",
  "searchQuery": "",
  "maxOpinions": 50
}' |
apify call devanshlive/courtlistener-rag-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=devanshlive/courtlistener-rag-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CourtListener RAG Extractor",
        "description": "Extract SCOTUS and U.S. federal appeals opinions from CourtListener into normalized RAG-ready JSON with fixed-token chunks, metadata, citations, and summary fallback. Built for legal AI and litigation research pipelines. $0.03 per opinion.",
        "version": "0.1",
        "x-build-id": "Fh3ANaFJgtw7w4tSO"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/devanshlive~courtlistener-rag-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-devanshlive-courtlistener-rag-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/devanshlive~courtlistener-rag-extractor/runs": {
            "post": {
                "operationId": "runs-sync-devanshlive-courtlistener-rag-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/devanshlive~courtlistener-rag-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-devanshlive-courtlistener-rag-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "dateFrom",
                    "dateTo"
                ],
                "properties": {
                    "courtIds": {
                        "title": "Court IDs",
                        "type": "array",
                        "description": "CourtListener court slugs. Empty means all 14 federal courts (SCOTUS + 13 appeals).",
                        "items": {
                            "type": "string",
                            "enum": [
                                "scotus",
                                "ca1",
                                "ca2",
                                "ca3",
                                "ca4",
                                "ca5",
                                "ca6",
                                "ca7",
                                "ca8",
                                "ca9",
                                "ca10",
                                "ca11",
                                "cadc",
                                "cafc"
                            ]
                        },
                        "default": []
                    },
                    "dateFrom": {
                        "title": "From date (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Inclusive lower bound of date_filed.",
                        "default": "2024-01-01"
                    },
                    "dateTo": {
                        "title": "To date (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Inclusive upper bound of date_filed.",
                        "default": "2024-06-30"
                    },
                    "searchQuery": {
                        "title": "Free-text search (optional)",
                        "type": "string",
                        "description": "CourtListener search syntax. Empty means no keyword filter.",
                        "default": ""
                    },
                    "maxOpinions": {
                        "title": "Max opinions per run",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Hard cap on returned opinions. Unauthenticated runs use 1 request/sec. With API key, 5 requests/sec.",
                        "default": 50
                    },
                    "courtListenerApiKey": {
                        "title": "CourtListener API key (optional)",
                        "type": "string",
                        "description": "Free API key from https://www.courtlistener.com/api/."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
