Movie Script Finder & Extractor
Pricing
from $25.00 / 1,000 per movie scripts
Movie Script Finder & Extractor
Find publicly accessible movie scripts and screenplays, extract clean metadata, and output script text in separate chunk rows for research, indexing, and analysis.
Pricing
from $25.00 / 1,000 per movie scripts
Rating
0.0
(0)
Developer
Inus Grobler
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
2
Monthly active users
7 days ago
Last modified
Categories
Share
Movie Script Finder & Extractor is an Apify Actor for finding publicly accessible screenplay pages, extracting clean movie script metadata, and writing script text as separate chunk rows instead of one huge field.
This Actor outputs script text in separate chunk rows. It does not place the entire script in one large field.
Overview
- Public-only screenplay crawling
- Always low-memory by design
- Supports multiple scripts in a single run
- One metadata row per script
- Separate chunk rows for script text
- Cleaner output that omits unknown fields instead of filling rows with
null
Supported Sources
The Actor automatically tries all supported sources in this order:
imsdbdailyscriptsimplyscriptsscriptslug
Implementation status:
IMSDb: fully implemented for index discovery, metadata extraction, and HTML script extractionDaily Script: implemented for HTML and TXT script extractionSimplyScripts: implemented for index discovery and metadata-first handling of HTML, TXT, PDF, and external linksScript Slug: implemented for public metadata and PDF link extraction; PDF text extraction is not enabled in v1
Input
The input is intentionally minimal. Use one of these:
movieNamefor one best-match screenplaysearchesfor multiple matching scripts, with chunk rows when public text is available
The public Store input intentionally exposes only those two fields so runs stay simple, fast, and predictable.
Quick examples:
{"movieName": "The Matrix"}
{"searches": ["The Matrix", "Alien", "Christopher Nolan"]}
The default Store example uses movieName only, so Apify's automated daily test gets a fast, non-empty result.
Key notes:
- You do not need to choose sources manually. The Actor uses all supported sources automatically.
movieNamereturns the single top screenplay match for that movie title.searchesreturns multiple matching scripts, with chunk rows when public text is available and compact metadata rows when a source only exposes metadata.- The Actor keeps defaults lightweight and low-memory automatically.
- If both fields are filled,
movieNametakes priority and the Actor logs a warning thatsearcheswas ignored.
Multiple Scripts Per Run
Yes. Use searches to look up multiple movies or topics in one run.
Use:
movieNamewhen you want one best-match screenplay with chunk rowssearcheswhen you want multiple script results in one run
Example:
{"searches": ["The Matrix", "Alien", "Christopher Nolan"]}
In that example, the Actor can return multiple distinct scripts in one run. If a public screenplay page is available, that script also gets its own script_chunk rows.
Output Row Types
Every dataset row includes:
{"type": "script_metadata","source": "imsdb","scrapedAt": "2026-05-08T00:00:00.000Z"}
The Actor emits four row types:
script_metadatascript_chunkscript_analysiserror
Unknown or unavailable values are omitted from success rows instead of being emitted as null.
For invalid or unsupported input URLs, error rows use source: "unknown".
The default output is:
movieNamemode returns one script plus chunk rowssearchesmode returns multiple matching scripts- chunk rows are included for matches with public script text
- metadata-only fallback rows stay compact when a source only exposes metadata or a public file link
Metadata Rows
One script_metadata row is written per script.
Typical fields include:
scriptIdscriptUrlcanonicalUrlwhen different fromscriptUrltitlewritersgenresscriptFormatchunkCountwordCountcharacterCountsceneCount
Example:
{"type": "script_metadata","source": "imsdb","scrapedAt": "2026-05-08T00:00:00.000Z","scriptId": "imsdb-the-matrix","scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html","title": "The Matrix","writers": ["Larry Wachowski", "Andy Wachowski"],"genres": ["Action", "Sci-Fi", "Thriller"],"scriptFormat": "html","hasScriptText": true,"chunkCount": 136,"wordCount": 23137,"characterCount": 143493,"sceneCount": 119}
The metadata row never contains the full script text.
Chunk Rows
When you use movieName, the Actor emits multiple script_chunk rows for that screenplay.
By default, the text is split into readable scene-style chunks instead of one giant script field.
Example first chunk:
{"type": "script_chunk","source": "imsdb","scrapedAt": "2026-05-08T00:00:00.000Z","scriptId": "imsdb-the-matrix","scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html","title": "The Matrix","chunkIndex": 1,"chunkMode": "scene","chunkTitle": "Front Matter","chunkText": "THE MATRIX\\n\\nWritten by Larry and Andy Wachowski ...","chunkCharacterCount": 2823,"chunkWordCount": 447,"nextChunkIndex": 2}
Example scene chunk:
{"type": "script_chunk","source": "imsdb","scrapedAt": "2026-05-08T00:00:00.000Z","scriptId": "imsdb-the-matrix","scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html","title": "The Matrix","chunkIndex": 2,"chunkMode": "scene","chunkTitle": "INT. CHASE HOTEL - NIGHT","sceneHeading": "INT. CHASE HOTEL - NIGHT","chunkText": "INT. CHASE HOTEL - NIGHT\\n... shortened placeholder text ...","chunkCharacterCount": 964,"chunkWordCount": 161,"previousChunkIndex": 1,"nextChunkIndex": 3}
Analysis Rows
Advanced or internal runs may also include one lightweight script_analysis row per script.
Analysis is approximate and can include:
estimatedPageCountsceneHeadingstopCharacterstopLocationsdialogueLineCountactionLineCountdialoguePercentageApprox
Example:
{"type": "script_analysis","source": "imsdb","scrapedAt": "2026-05-08T00:00:00.000Z","scriptId": "imsdb-the-matrix","scriptUrl": "https://imsdb.com/scripts/Matrix,-The.html","title": "The Matrix","wordCount": 23137,"characterCount": 143493,"estimatedPageCount": 129,"chunkCount": 136,"sceneCount": 119,"sceneHeadings": ["INT. CHASE HOTEL - NIGHT","EXT. CHASE HOTEL - NIGHT"],"topCharacters": [{"name": "MORPHEUS","dialogueLineCount": 349,"approxWordCount": 1787}],"topLocations": [{"location": "HALL","count": 13}],"dialogueLineCount": 1795,"actionLineCount": 1815,"dialoguePercentageApprox": 49.7}
Runtime Behavior
The Actor always runs in low-memory mode.
Behavior:
- Uses a lightweight HTML crawler only
- Does not launch a browser
- Uses conservative retries
- Caps effective concurrency at
2 - Pushes rows as soon as they are ready
- Avoids storing full scripts in metadata rows
- Avoids storing unknown values as
nullin success rows
Pricing Note
If you monetize this Actor with Apify pay-per-event pricing, the intended simple setup is:
- one
movie_scriptevent per returned script - optional very low-priced default dataset item pricing for row writes
Chunk rows are part of the same script result and are not meant to be priced as separate script-level units.
Run Locally
You can run this Actor on your own machine and use your Apify account from the APIFY_TOKEN environment variable.
- Make sure
APIFY_TOKENis set in your shell. - Install dependencies:
$npm install
- Build the Actor:
$npm run build
For a quick CI-style validation of the actor config and schemas:
$npm test
- Add your input to
storage/key_value_stores/default/INPUT.json.
Example:
{"movieName": "The Matrix"}
- Run locally:
$apify run
If you want to deploy and run it in your Apify account:
apify login --token "$APIFY_TOKEN"apify push
Then start a cloud run from the Apify Console or with:
$apify call <actor-id>
Performance Tips
- Use
movieNamewhen you want one full screenplay result - Use
searcheswhen you want a lighter list of scripts - Keep search phrases short and clear for best title matching
Use Cases
- Public screenplay indexing
- Metadata enrichment
- Story structure analysis
- Writer study workflows
- Scene-level chunking for retrieval and annotation
- Cataloging public script collections
Limitations
- V1 prioritizes public static HTML and TXT pages over difficult or inconsistent sources
- PDF text extraction is not enabled by default
- Script analysis is approximate and not screenplay-software accurate
- Some sources expose metadata pages that link to a separate script page; the Actor resolves those when possible
- Some public URLs are near-miss script paths; for IMSDb the Actor can recover some of these by matching against the public index
- Script Slug support is metadata-first in v1
- SimplyScripts external links are handled conservatively as metadata/link rows instead of full external-site crawling
Legal And Ethical Scraping Notice
Movie scripts and screenplays may be copyrighted.
This Actor only accesses publicly available pages.
Users are responsible for ensuring their use complies with copyright law, website terms, robots.txt, and applicable regulations.
The Actor does not bypass logins, paywalls, CAPTCHAs, or access controls.
The Actor is intended for indexing, metadata extraction, research, and analysis workflows.
It is not a piracy or downloader tool.
Troubleshooting
- If a movie title does not return the script you expect, try a more exact title
- If a source blocks or disallows crawling, the Actor skips or emits an error instead of bypassing protections
- If you see large PDF-only collections, expect metadata rows unless you later extend the Actor with explicit PDF extraction