Movie Script Finder & Extractor avatar

Movie Script Finder & Extractor

Pricing

from $25.00 / 1,000 per movie scripts

Go to Apify Store
Movie Script Finder & Extractor

Movie Script Finder & Extractor

Find publicly accessible movie scripts and screenplays, extract clean metadata, and output script text in separate chunk rows for research, indexing, and analysis.

Pricing

from $25.00 / 1,000 per movie scripts

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

7 days ago

Last modified

Share

Movie Script Finder & Extractor is an Apify Actor for finding publicly accessible screenplay pages, extracting clean movie script metadata, and writing script text as separate chunk rows instead of one huge field.

This Actor outputs script text in separate chunk rows. It does not place the entire script in one large field.

Overview

  • Public-only screenplay crawling
  • Always low-memory by design
  • Supports multiple scripts in a single run
  • One metadata row per script
  • Separate chunk rows for script text
  • Cleaner output that omits unknown fields instead of filling rows with null

Supported Sources

The Actor automatically tries all supported sources in this order:

  1. imsdb
  2. dailyscript
  3. simplyscripts
  4. scriptslug

Implementation status:

  • IMSDb: fully implemented for index discovery, metadata extraction, and HTML script extraction
  • Daily Script: implemented for HTML and TXT script extraction
  • SimplyScripts: implemented for index discovery and metadata-first handling of HTML, TXT, PDF, and external links
  • Script Slug: implemented for public metadata and PDF link extraction; PDF text extraction is not enabled in v1

Input

The input is intentionally minimal. Use one of these:

  • movieName for one best-match screenplay
  • searches for multiple matching scripts, with chunk rows when public text is available

The public Store input intentionally exposes only those two fields so runs stay simple, fast, and predictable.

Quick examples:

{
"movieName": "The Matrix"
}
{
"searches": ["The Matrix", "Alien", "Christopher Nolan"]
}

The default Store example uses movieName only, so Apify's automated daily test gets a fast, non-empty result.

Key notes:

  • You do not need to choose sources manually. The Actor uses all supported sources automatically.
  • movieName returns the single top screenplay match for that movie title.
  • searches returns multiple matching scripts, with chunk rows when public text is available and compact metadata rows when a source only exposes metadata.
  • The Actor keeps defaults lightweight and low-memory automatically.
  • If both fields are filled, movieName takes priority and the Actor logs a warning that searches was ignored.

Multiple Scripts Per Run

Yes. Use searches to look up multiple movies or topics in one run.

Use:

  • movieName when you want one best-match screenplay with chunk rows
  • searches when you want multiple script results in one run

Example:

{
"searches": ["The Matrix", "Alien", "Christopher Nolan"]
}

In that example, the Actor can return multiple distinct scripts in one run. If a public screenplay page is available, that script also gets its own script_chunk rows.

Output Row Types

Every dataset row includes:

{
"type": "script_metadata",
"source": "imsdb",
"scrapedAt": "2026-05-08T00:00:00.000Z"
}

The Actor emits four row types:

  • script_metadata
  • script_chunk
  • script_analysis
  • error

Unknown or unavailable values are omitted from success rows instead of being emitted as null. For invalid or unsupported input URLs, error rows use source: "unknown".

The default output is:

  • movieName mode returns one script plus chunk rows
  • searches mode returns multiple matching scripts
  • chunk rows are included for matches with public script text
  • metadata-only fallback rows stay compact when a source only exposes metadata or a public file link

Metadata Rows

One script_metadata row is written per script.

Typical fields include:

  • scriptId
  • scriptUrl
  • canonicalUrl when different from scriptUrl
  • title
  • writers
  • genres
  • scriptFormat
  • chunkCount
  • wordCount
  • characterCount
  • sceneCount

Example:

{
"type": "script_metadata",
"source": "imsdb",
"scrapedAt": "2026-05-08T00:00:00.000Z",
"scriptId": "imsdb-the-matrix",
"scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
"title": "The Matrix",
"writers": ["Larry Wachowski", "Andy Wachowski"],
"genres": ["Action", "Sci-Fi", "Thriller"],
"scriptFormat": "html",
"hasScriptText": true,
"chunkCount": 136,
"wordCount": 23137,
"characterCount": 143493,
"sceneCount": 119
}

The metadata row never contains the full script text.

Chunk Rows

When you use movieName, the Actor emits multiple script_chunk rows for that screenplay.

By default, the text is split into readable scene-style chunks instead of one giant script field.

Example first chunk:

{
"type": "script_chunk",
"source": "imsdb",
"scrapedAt": "2026-05-08T00:00:00.000Z",
"scriptId": "imsdb-the-matrix",
"scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
"title": "The Matrix",
"chunkIndex": 1,
"chunkMode": "scene",
"chunkTitle": "Front Matter",
"chunkText": "THE MATRIX\\n\\nWritten by Larry and Andy Wachowski ...",
"chunkCharacterCount": 2823,
"chunkWordCount": 447,
"nextChunkIndex": 2
}

Example scene chunk:

{
"type": "script_chunk",
"source": "imsdb",
"scrapedAt": "2026-05-08T00:00:00.000Z",
"scriptId": "imsdb-the-matrix",
"scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
"title": "The Matrix",
"chunkIndex": 2,
"chunkMode": "scene",
"chunkTitle": "INT. CHASE HOTEL - NIGHT",
"sceneHeading": "INT. CHASE HOTEL - NIGHT",
"chunkText": "INT. CHASE HOTEL - NIGHT\\n... shortened placeholder text ...",
"chunkCharacterCount": 964,
"chunkWordCount": 161,
"previousChunkIndex": 1,
"nextChunkIndex": 3
}

Analysis Rows

Advanced or internal runs may also include one lightweight script_analysis row per script.

Analysis is approximate and can include:

  • estimatedPageCount
  • sceneHeadings
  • topCharacters
  • topLocations
  • dialogueLineCount
  • actionLineCount
  • dialoguePercentageApprox

Example:

{
"type": "script_analysis",
"source": "imsdb",
"scrapedAt": "2026-05-08T00:00:00.000Z",
"scriptId": "imsdb-the-matrix",
"scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html",
"title": "The Matrix",
"wordCount": 23137,
"characterCount": 143493,
"estimatedPageCount": 129,
"chunkCount": 136,
"sceneCount": 119,
"sceneHeadings": [
"INT. CHASE HOTEL - NIGHT",
"EXT. CHASE HOTEL - NIGHT"
],
"topCharacters": [
{
"name": "MORPHEUS",
"dialogueLineCount": 349,
"approxWordCount": 1787
}
],
"topLocations": [
{
"location": "HALL",
"count": 13
}
],
"dialogueLineCount": 1795,
"actionLineCount": 1815,
"dialoguePercentageApprox": 49.7
}

Runtime Behavior

The Actor always runs in low-memory mode.

Behavior:

  • Uses a lightweight HTML crawler only
  • Does not launch a browser
  • Uses conservative retries
  • Caps effective concurrency at 2
  • Pushes rows as soon as they are ready
  • Avoids storing full scripts in metadata rows
  • Avoids storing unknown values as null in success rows

Pricing Note

If you monetize this Actor with Apify pay-per-event pricing, the intended simple setup is:

  • one movie_script event per returned script
  • optional very low-priced default dataset item pricing for row writes

Chunk rows are part of the same script result and are not meant to be priced as separate script-level units.

Run Locally

You can run this Actor on your own machine and use your Apify account from the APIFY_TOKEN environment variable.

  1. Make sure APIFY_TOKEN is set in your shell.
  2. Install dependencies:
$npm install
  1. Build the Actor:
$npm run build

For a quick CI-style validation of the actor config and schemas:

$npm test
  1. Add your input to storage/key_value_stores/default/INPUT.json.

Example:

{
"movieName": "The Matrix"
}
  1. Run locally:
$apify run

If you want to deploy and run it in your Apify account:

apify login --token "$APIFY_TOKEN"
apify push

Then start a cloud run from the Apify Console or with:

$apify call <actor-id>

Performance Tips

  • Use movieName when you want one full screenplay result
  • Use searches when you want a lighter list of scripts
  • Keep search phrases short and clear for best title matching

Use Cases

  • Public screenplay indexing
  • Metadata enrichment
  • Story structure analysis
  • Writer study workflows
  • Scene-level chunking for retrieval and annotation
  • Cataloging public script collections

Limitations

  • V1 prioritizes public static HTML and TXT pages over difficult or inconsistent sources
  • PDF text extraction is not enabled by default
  • Script analysis is approximate and not screenplay-software accurate
  • Some sources expose metadata pages that link to a separate script page; the Actor resolves those when possible
  • Some public URLs are near-miss script paths; for IMSDb the Actor can recover some of these by matching against the public index
  • Script Slug support is metadata-first in v1
  • SimplyScripts external links are handled conservatively as metadata/link rows instead of full external-site crawling

Movie scripts and screenplays may be copyrighted.

This Actor only accesses publicly available pages.

Users are responsible for ensuring their use complies with copyright law, website terms, robots.txt, and applicable regulations.

The Actor does not bypass logins, paywalls, CAPTCHAs, or access controls.

The Actor is intended for indexing, metadata extraction, research, and analysis workflows.

It is not a piracy or downloader tool.

Troubleshooting

  • If a movie title does not return the script you expect, try a more exact title
  • If a source blocks or disallows crawling, the Actor skips or emits an error instead of bypassing protections
  • If you see large PDF-only collections, expect metadata rows unless you later extend the Actor with explicit PDF extraction