# Movie Script Finder & Extractor (`thescrapelab/screenplay-script-scraper`) Actor

Find publicly accessible movie scripts and screenplays, extract clean metadata, and output script text in separate chunk rows for research, indexing, and analysis.

- **URL**: https://apify.com/thescrapelab/screenplay-script-scraper.md
- **Developed by:** [Inus Grobler](https://apify.com/thescrapelab) (community)
- **Categories:** Developer tools, Automation, AI
- **Stats:** 3 total users, 2 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $25.00 / 1,000 per movie scripts

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Movie Script Finder & Extractor

`Movie Script Finder & Extractor` is an Apify Actor for finding publicly accessible screenplay pages, extracting clean movie script metadata, and writing script text as separate chunk rows instead of one huge field.

This Actor outputs script text in separate chunk rows. It does not place the entire script in one large field.

### Overview

- Public-only screenplay crawling
- Always low-memory by design
- Supports multiple scripts in a single run
- One metadata row per script
- Separate chunk rows for script text
- Cleaner output that omits unknown fields instead of filling rows with `null`

### Supported Sources

The Actor automatically tries all supported sources in this order:

1. `imsdb`
2. `dailyscript`
3. `simplyscripts`
4. `scriptslug`

Implementation status:

- `IMSDb`: fully implemented for index discovery, metadata extraction, and HTML script extraction
- `Daily Script`: implemented for HTML and TXT script extraction
- `SimplyScripts`: implemented for index discovery and metadata-first handling of HTML, TXT, PDF, and external links
- `Script Slug`: implemented for public metadata and PDF link extraction; PDF text extraction is not enabled in v1

### Input

The input is intentionally minimal. Use one of these:

- `movieName` for one best-match screenplay
- `searches` for multiple matching scripts, with chunk rows when public text is available

The public Store input intentionally exposes only those two fields so runs stay simple, fast, and predictable.

Quick examples:

```json
{
  "movieName": "The Matrix"
}
````

```json
{
  "searches": ["The Matrix", "Alien", "Christopher Nolan"]
}
```

The default Store example uses `movieName` only, so Apify's automated daily test gets a fast, non-empty result.

Key notes:

- You do not need to choose sources manually. The Actor uses all supported sources automatically.
- `movieName` returns the single top screenplay match for that movie title.
- `searches` returns multiple matching scripts, with chunk rows when public text is available and compact metadata rows when a source only exposes metadata.
- The Actor keeps defaults lightweight and low-memory automatically.
- If both fields are filled, `movieName` takes priority and the Actor logs a warning that `searches` was ignored.

### Multiple Scripts Per Run

Yes. Use `searches` to look up multiple movies or topics in one run.

Use:

- `movieName` when you want one best-match screenplay with chunk rows
- `searches` when you want multiple script results in one run

Example:

```json
{
  "searches": ["The Matrix", "Alien", "Christopher Nolan"]
}
```

In that example, the Actor can return multiple distinct scripts in one run. If a public screenplay page is available, that script also gets its own `script_chunk` rows.

### Output Row Types

Every dataset row includes:

```json
{
  "type": "script_metadata",
  "source": "imsdb",
  "scrapedAt": "2026-05-08T00:00:00.000Z"
}
```

The Actor emits four row types:

- `script_metadata`
- `script_chunk`
- `script_analysis`
- `error`

Unknown or unavailable values are omitted from success rows instead of being emitted as `null`.
For invalid or unsupported input URLs, `error` rows use `source: "unknown"`.

The default output is:

- `movieName` mode returns one script plus chunk rows
- `searches` mode returns multiple matching scripts
- chunk rows are included for matches with public script text
- metadata-only fallback rows stay compact when a source only exposes metadata or a public file link

### Metadata Rows

One `script_metadata` row is written per script.

Typical fields include:

- `scriptId`
- `scriptUrl`
- `canonicalUrl` when different from `scriptUrl`
- `title`
- `writers`
- `genres`
- `scriptFormat`
- `chunkCount`
- `wordCount`
- `characterCount`
- `sceneCount`

Example:

```json
{
  "type": "script_metadata",
  "source": "imsdb",
  "scrapedAt": "2026-05-08T00:00:00.000Z",
  "scriptId": "imsdb-the-matrix",
  "scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
  "title": "The Matrix",
  "writers": ["Larry Wachowski", "Andy Wachowski"],
  "genres": ["Action", "Sci-Fi", "Thriller"],
  "scriptFormat": "html",
  "hasScriptText": true,
  "chunkCount": 136,
  "wordCount": 23137,
  "characterCount": 143493,
  "sceneCount": 119
}
```

The metadata row never contains the full script text.

### Chunk Rows

When you use `movieName`, the Actor emits multiple `script_chunk` rows for that screenplay.

By default, the text is split into readable scene-style chunks instead of one giant script field.

Example first chunk:

```json
{
  "type": "script_chunk",
  "source": "imsdb",
  "scrapedAt": "2026-05-08T00:00:00.000Z",
  "scriptId": "imsdb-the-matrix",
  "scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
  "title": "The Matrix",
  "chunkIndex": 1,
  "chunkMode": "scene",
  "chunkTitle": "Front Matter",
  "chunkText": "THE MATRIX\\n\\nWritten by Larry and Andy Wachowski ...",
  "chunkCharacterCount": 2823,
  "chunkWordCount": 447,
  "nextChunkIndex": 2
}
```

Example scene chunk:

```json
{
  "type": "script_chunk",
  "source": "imsdb",
  "scrapedAt": "2026-05-08T00:00:00.000Z",
  "scriptId": "imsdb-the-matrix",
  "scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
  "title": "The Matrix",
  "chunkIndex": 2,
  "chunkMode": "scene",
  "chunkTitle": "INT. CHASE HOTEL - NIGHT",
  "sceneHeading": "INT. CHASE HOTEL - NIGHT",
  "chunkText": "INT. CHASE HOTEL - NIGHT\\n... shortened placeholder text ...",
  "chunkCharacterCount": 964,
  "chunkWordCount": 161,
  "previousChunkIndex": 1,
  "nextChunkIndex": 3
}
```

### Analysis Rows

Advanced or internal runs may also include one lightweight `script_analysis` row per script.

Analysis is approximate and can include:

- `estimatedPageCount`
- `sceneHeadings`
- `topCharacters`
- `topLocations`
- `dialogueLineCount`
- `actionLineCount`
- `dialoguePercentageApprox`

Example:

```json
{
  "type": "script_analysis",
  "source": "imsdb",
  "scrapedAt": "2026-05-08T00:00:00.000Z",
  "scriptId": "imsdb-the-matrix",
  "scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
  "title": "The Matrix",
  "wordCount": 23137,
  "characterCount": 143493,
  "estimatedPageCount": 129,
  "chunkCount": 136,
  "sceneCount": 119,
  "sceneHeadings": [
    "INT. CHASE HOTEL - NIGHT",
    "EXT. CHASE HOTEL - NIGHT"
  ],
  "topCharacters": [
    {
      "name": "MORPHEUS",
      "dialogueLineCount": 349,
      "approxWordCount": 1787
    }
  ],
  "topLocations": [
    {
      "location": "HALL",
      "count": 13
    }
  ],
  "dialogueLineCount": 1795,
  "actionLineCount": 1815,
  "dialoguePercentageApprox": 49.7
}
```

### Runtime Behavior

The Actor always runs in low-memory mode.

Behavior:

- Uses a lightweight HTML crawler only
- Does not launch a browser
- Uses conservative retries
- Caps effective concurrency at `2`
- Pushes rows as soon as they are ready
- Avoids storing full scripts in metadata rows
- Avoids storing unknown values as `null` in success rows

### Pricing Note

If you monetize this Actor with Apify pay-per-event pricing, the intended simple setup is:

- one `movie_script` event per returned script
- optional very low-priced default dataset item pricing for row writes

Chunk rows are part of the same script result and are not meant to be priced as separate script-level units.

### Run Locally

You can run this Actor on your own machine and use your Apify account from the `APIFY_TOKEN` environment variable.

1. Make sure `APIFY_TOKEN` is set in your shell.
2. Install dependencies:

```bash
npm install
```

3. Build the Actor:

```bash
npm run build
```

For a quick CI-style validation of the actor config and schemas:

```bash
npm test
```

4. Add your input to `storage/key_value_stores/default/INPUT.json`.

Example:

```json
{
  "movieName": "The Matrix"
}
```

5. Run locally:

```bash
apify run
```

If you want to deploy and run it in your Apify account:

```bash
apify login --token "$APIFY_TOKEN"
apify push
```

Then start a cloud run from the Apify Console or with:

```bash
apify call <actor-id>
```

### Performance Tips

- Use `movieName` when you want one full screenplay result
- Use `searches` when you want a lighter list of scripts
- Keep search phrases short and clear for best title matching

### Use Cases

- Public screenplay indexing
- Metadata enrichment
- Story structure analysis
- Writer study workflows
- Scene-level chunking for retrieval and annotation
- Cataloging public script collections

### Limitations

- V1 prioritizes public static HTML and TXT pages over difficult or inconsistent sources
- PDF text extraction is not enabled by default
- Script analysis is approximate and not screenplay-software accurate
- Some sources expose metadata pages that link to a separate script page; the Actor resolves those when possible
- Some public URLs are near-miss script paths; for IMSDb the Actor can recover some of these by matching against the public index
- Script Slug support is metadata-first in v1
- SimplyScripts external links are handled conservatively as metadata/link rows instead of full external-site crawling

### Legal And Ethical Scraping Notice

Movie scripts and screenplays may be copyrighted.

This Actor only accesses publicly available pages.

Users are responsible for ensuring their use complies with copyright law, website terms, robots.txt, and applicable regulations.

The Actor does not bypass logins, paywalls, CAPTCHAs, or access controls.

The Actor is intended for indexing, metadata extraction, research, and analysis workflows.

It is not a piracy or downloader tool.

### Troubleshooting

- If a movie title does not return the script you expect, try a more exact title
- If a source blocks or disallows crawling, the Actor skips or emits an error instead of bypassing protections
- If you see large PDF-only collections, expect metadata rows unless you later extend the Actor with explicit PDF extraction

# Actor input Schema

## `movieName` (type: `string`):

Enter one movie title to get the single best screenplay match with script chunks. If this is filled, it takes priority over searches.

## `searches` (type: `array`):

Enter several movie titles or search terms to find multiple matching scripts. When public script text is available, the actor also returns script\_chunk rows for each matched script. If a source only exposes metadata or a public file link, that result is still returned as a metadata row. If movieName is also filled, this list will be ignored.

## Actor input object example

```json
{
  "movieName": "The Matrix"
}
```

# Actor output Schema

## `results` (type: `string`):

No description

## `debugHtml` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "movieName": "The Matrix"
};

// Run the Actor and wait for it to finish
const run = await client.actor("thescrapelab/screenplay-script-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "movieName": "The Matrix" }

# Run the Actor and wait for it to finish
run = client.actor("thescrapelab/screenplay-script-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "movieName": "The Matrix"
}' |
apify call thescrapelab/screenplay-script-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=thescrapelab/screenplay-script-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Movie Script Finder & Extractor",
        "description": "Find publicly accessible movie scripts and screenplays, extract clean metadata, and output script text in separate chunk rows for research, indexing, and analysis.",
        "version": "0.3",
        "x-build-id": "8i3agmprsyHnbulLR"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/thescrapelab~screenplay-script-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-thescrapelab-screenplay-script-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/thescrapelab~screenplay-script-scraper/runs": {
            "post": {
                "operationId": "runs-sync-thescrapelab-screenplay-script-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/thescrapelab~screenplay-script-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-thescrapelab-screenplay-script-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "movieName": {
                        "title": "Movie Name",
                        "type": "string",
                        "description": "Enter one movie title to get the single best screenplay match with script chunks. If this is filled, it takes priority over searches."
                    },
                    "searches": {
                        "title": "Search List",
                        "type": "array",
                        "description": "Enter several movie titles or search terms to find multiple matching scripts. When public script text is available, the actor also returns script_chunk rows for each matched script. If a source only exposes metadata or a public file link, that result is still returned as a metadata row. If movieName is also filled, this list will be ignored.",
                        "items": {
                            "type": "string"
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
