TMDB Scraper: Movies, TV Shows, Cast & Episodes avatar

TMDB Scraper: Movies, TV Shows, Cast & Episodes

Pricing

from $0.99 / 1,000 results

Go to Apify Store
TMDB Scraper: Movies, TV Shows, Cast & Episodes

TMDB Scraper: Movies, TV Shows, Cast & Episodes

Scrape TMDB movie data, TV show metadata, actor profiles, cast and crew, seasons, episodes, ratings, posters, trailers, keywords, social links, and recommendations.

Pricing

from $0.99 / 1,000 results

Rating

0.0

(0)

Developer

Inus Grobler

Inus Grobler

Maintained by Community

Actor stats

1

Bookmarked

13

Total users

2

Monthly active users

7 days ago

Last modified

Categories

Share

TMDB Scraper: Movie, TV Show, Cast, Season and Episode Data

This TMDB scraper extracts movie data, TV show metadata, actor profiles, cast and crew details, seasons, episodes, ratings, posters, trailers, keywords, social links, and recommendations from public The Movie Database pages for media teams, data analysts, content apps, and entertainment research workflows.

The Actor is built for users who need structured entertainment data without setting up a crawler. Provide TMDB search terms or public TMDB URLs, choose the record types you want, and export the results from the Apify dataset.

Main Use Cases

  • Build movie, TV show, actor, cast, and episode datasets.
  • Enrich streaming, catalog, media intelligence, or recommendation products.
  • Collect title metadata for research, lead lists, content planning, or QA.
  • Monitor selected TMDB title, person, season, search, or list pages.
  • Scrape TV season and episode lists for one exact series.
  • Discover likely titles from a natural-language prompt when AI mode is configured.

Data Extracted

Depending on the page type, records can include:

  • TMDB ID, TMDB URL, title or name, description, and source search term.
  • Movie release date, runtime, certification, status, budget, revenue, directors, cast, genres, keywords, ratings, posters, backdrops, trailer URL, social links, collection, and recommendations.
  • TV first air date, last air date, networks, status, type, season count, episode count, genres, ratings, posters, backdrops, trailer URL, and social links.
  • Person name, biography, known-for department, known credits, gender, birthday, deathday, place of birth, aliases, profile image, and social links.
  • Season number, season title, release date, episode count, poster URL, and parent show references.
  • Episode title, episode number, air date, runtime, score, overview, still image, and parent show references.

Dataset records use these entityType values: movie, tv, person, season, and episode.

Input Configuration

Use one or both of these input methods:

  • searchTerms: movie titles, TV show titles, or person names to search on TMDB.
  • startUrls: direct public TMDB URLs for movies, TV shows, people, seasons, or search pages. Some dynamic browse/list pages do not expose result links in static HTML.

Important controls:

  • searchTypes: choose movie, tv, person, or a combination.
  • maxResultsPerSearchTerm: how many search results to follow for each term. Higher values can find more records but may also add less relevant pages.
  • maxItems: maximum number of unique dataset records to save. This is a ceiling, not a guarantee, because duplicate TMDB results are skipped.
  • scrapeFullCast: visit full cast and crew pages and save linked people as separate records.
  • scrapeEpisodes: visit TV season pages and save season and episode records.
  • strictExactSeriesMatch: TV-series only. Use exact TV-title matching when you want one specific show; this does not affect movie or person searches.
  • language: TMDB language code added to crawled URLs. en-US usually gives the most complete labels.
  • maxConcurrency: number of TMDB pages fetched at the same time. The default is cost-conscious based on measured 256 MB runs; lower it if rate-limit retries appear.
  • maxRequestRetries: retries per failed or blocked page. Lower values reduce wasted compute on unavailable pages.
  • proxyConfiguration: Apify Proxy is recommended for reliability. Direct crawling may be cheaper for small tests but can hit rate limits on larger runs.

AI-assisted search is optional. When enableAiSearch is enabled, the Actor can turn aiQuery into likely TMDB titles before crawling, if AI credentials are configured by the Actor owner.

Example Input

{
"searchTerms": ["Inception", "Breaking Bad", "Tom Hanks"],
"searchTypes": ["movie", "tv", "person"],
"maxResultsPerSearchTerm": 3,
"maxItems": 25,
"language": "en-US",
"maxConcurrency": 5,
"maxRequestRetries": 2,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Example TV Episodes Input

{
"searchTerms": ["Friends"],
"searchTypes": ["tv"],
"strictExactSeriesMatch": true,
"scrapeEpisodes": true,
"maxResultsPerSearchTerm": 1,
"maxItems": 300,
"language": "en-US"
}

Example Output

{
"entityType": "movie",
"tmdbId": 27205,
"sourceSearchTerm": "Inception",
"title": "Inception",
"description": "Cobb, a skilled thief who commits corporate espionage by infiltrating the subconscious of his targets...",
"releaseDate": "2010-07-15",
"releaseYear": 2010,
"duration": 148,
"ratingAverage": 8.4,
"ratingCount": 38000,
"genres": ["Action", "Science Fiction", "Adventure"],
"directors": ["Christopher Nolan"],
"topCast": ["Leonardo DiCaprio", "Joseph Gordon-Levitt", "Elliot Page"],
"posterUrl": "https://media.themoviedb.org/t/p/original/...",
"trailerUrl": "https://www.youtube.com/watch?v=...",
"tmdbUrl": "https://www.themoviedb.org/movie/27205?language=en-US"
}

How To Run On Apify

  1. Open the Actor on Apify.
  2. Add search terms or TMDB URLs in the Input tab.
  3. Set maxItems to the number of records you want.
  4. Keep full cast and episode scraping off unless you need those expanded records.
  5. Start the run and open the Dataset tab when it finishes.

For large runs, use narrower search terms, keep maxResultsPerSearchTerm modest, and use strictExactSeriesMatch only when scraping episodes for one exact TV series.

Exporting Results

Results are saved to the default Apify dataset while the Actor runs. You can export them from the Dataset tab in JSON, CSV, Excel, XML, or RSS formats. You can also access results through the Apify API.

Python API Example

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run_input = {
"searchTerms": ["Inception", "Breaking Bad"],
"searchTypes": ["movie", "tv"],
"maxResultsPerSearchTerm": 3,
"maxItems": 20,
"language": "en-US",
}
run = client.actor("thescrapelab/Apify-tmdb-scraper").call(run_input=run_input)
dataset = client.dataset(run["defaultDatasetId"])
for item in dataset.iterate_items():
print(item)

Limits And Caveats

  • This Actor scrapes public TMDB web pages and is not endorsed or certified by TMDB.
  • TMDB page structure can change, which may affect some fields until the scraper is updated.
  • Some browse pages, such as dynamic popular-list pages, may not expose item links in the static HTML returned to this HTTP crawler. Use search terms or direct detail URLs when you need predictable output.
  • maxItems is a maximum saved-record count. Duplicate search results, empty pages, or strict filters can produce fewer records.
  • Best field coverage is usually with language: "en-US" because some metadata labels are language-dependent.
  • Full cast and episode scraping can create many linked requests. Use maxItems to keep runs controlled.
  • Large direct runs may receive temporary rate limits. Use Apify Proxy or lower maxConcurrency if that happens.
  • AI mode depends on owner-configured AI credentials and should be treated as title discovery, not guaranteed recommendation accuracy.

Troubleshooting

The run returned fewer records than maxItems. This usually means search results overlapped, filters skipped some entity types, or there were fewer matching TMDB pages than the limit.

The run is slower than expected. Reduce maxResultsPerSearchTerm, disable full cast or episode scraping, or keep maxConcurrency near the default. If retry logs appear, lower concurrency.

Some fields are empty. Not every TMDB page contains every field, and some fields depend on the selected language.

The run shows blocked or 429 retry messages. Use Apify Proxy, reduce maxConcurrency, or narrow the input. Repeated retries increase cost without adding useful results.

I only want one TV show and its episodes. Use searchTypes: ["tv"], strictExactSeriesMatch: true, scrapeEpisodes: true, and maxResultsPerSearchTerm: 1.

Pricing

The Actor is best suited to pay-per-result pricing. Each saved dataset item has similar client value, and results are pushed as they are scraped. Small test runs stay inexpensive because the recommended event price is per dataset item, with only a very small Actor-start event if configured.

Measured runs during the June 2026 optimization showed that small and medium direct HTTP runs are very low cost, with peak memory well under 256 MB. Larger direct runs may become more expensive if rate limits cause retries, so proxy settings and conservative concurrency are recommended for reliability.

Recommended pricing: keep apify-default-dataset-item as the primary event at about $0.001 per saved result, keep the Actor-start event near $0.00005, and charge a separate AI event only when AI title discovery is enabled.

FAQ

Can I scrape TMDB movie data without a TMDB API key?

Yes. This Actor scrapes public TMDB web pages and does not require a TMDB API key from the user.

Does this scrape movies, TV shows, and people?

Yes. Use searchTypes to choose movies, TV shows, people, or any combination.

Can it scrape cast and crew?

Yes. Top cast is included on movie and TV detail records. Enable scrapeFullCast to visit full cast and crew pages and save linked people as separate records.

Can it scrape TV seasons and episodes?

Yes. Enable scrapeEpisodes. For best results, use exact TV-series matching with strictExactSeriesMatch.

Can I start from direct TMDB URLs?

Yes. Add movie, TV, person, season, or search URLs to startUrls. For dynamic browse pages, use search terms or direct detail URLs when the page returns no results.

Is this an official TMDB product?

No. This Actor uses public TMDB pages but is not endorsed or certified by TMDB.

How do I keep costs low?

Use specific search terms, keep maxResultsPerSearchTerm modest, set maxItems, avoid full cast and episode scraping unless needed, and keep concurrency conservative when rate limits appear.