IMDb Scraper avatar
IMDb Scraper
Try for free

1 day trial then $50.00/month - No credit card required now

View all Actors
IMDb Scraper

IMDb Scraper

dtrungtin/imdb-scraper
Try for free

1 day trial then $50.00/month - No credit card required now

Free IMDb API to extract and download data on movies, TV shows, video games, and other listings from IMDb. Delivers custom machine-readable IMDb datasets containing all information on your selected listings.

Features

Our free IMDb Scraper enables you to extract and download data about movies, video games, TV shows, streaming content, and personalities from IMDb.com.

Why scrape IMDb?

Launched over 30 years ago, IMDb now contains over 8 million titles and over 10 million personalities. It is the most comprehensive film and TV show database in the world.

IMDb datasets are frequently used to train AI models and recommendation systems and several different datasets are freely available to download. You should also check out the IMDb developer page if you're interested in accessing data products direct from IMDb.

However, if your specific use case means that you need to scrape IMDb, our free IMDb Scraper effectively creates an unofficial IMDb API and gives you an alternative way to access live IMDb data direct from the website.

Tutorial

Our Beginner's Guide to Web Scraping has a really great explanation of how to get started with web scraping. And if you skip ahead to the section on Web Scraping with Apify, you'll find a quick guide on how to use IMDb Scraper to scrape data about The Queens's Gambit. If you still have questions on how to use the scraper, just email support@apify.com.

Cost of usage

Note that it is much more efficient to run one longer scrape (at least one minute) than more shorter ones because of the startup time.

Based on our experience, you can get 1,000 results for as little as $0.30 if you run the IMDb Scraper on the Apify platform.

Input

FieldTypeDescriptionDefault value
startUrlsarrayList of request objects that will be deeply crawled. The URL can be top level such as https://www.imdb.com/search/title/ or https://www.imdb.com/find?q=bond or a detail URL, such as https://www.imdb.com/title/tt7286456.[{ "url": "https://www.imdb.com/search/title/" }]
maxItemsnumberMaximum number of actor pages that will be scrapedall found
extendOutputFunctionstringFunction that takes a Cheerio handle ($) as argument and returns data that will be merged with the result output. More information in Extend output function
proxyConfigurationobjectProxy settings of the run. If you have access to Apify Proxy, leave the default settings. If not, you can set { "useApifyProxy": false" } to disable proxy usage{ "useApifyProxy": true }

Output

The IMDb Scraper output is stored in a dataset. Each item contains information about a movie, TV show, or other IMDb listing. For example:

1{
2  "title": "Turandot (1981)",
3  "original title": "",
4  "runtime": "",
5  "certificate": "",
6  "year": "1981",
7  "rating": "",
8  "ratingcount": "",
9  "description": "",
10  "stars": "Montserrat Caballé, Rémy Corazza, Fernand Dumont",
11  "director": "Pierre Desfons",
12  "genre": "Musical",
13  "country": "France",
14  "url": "https://www.imdb.com/title/tt0834174"
15}

Extend output function

You can use this function to update the result output of this actor. This function gets a Cheerio handle $ as an argument so you can choose what data from the page you want to scrape. The output from this function will get merged with the result output.

The return value of this function has to be an object!

You can return fields to achive three different things:

  • Add a new field - Return object with a field that is not in the result output
  • Change a field - Return an existing field with a new value
  • Remove a field - Return an existing field with a value undefined
1($) => {
2    return {
3        "story line": $('#titleStoryLine div p span').text().trim(),
4        "original title": "NA",
5        url: undefined
6    }
7}

This example will add a new field story line, change the original title field and remove the url field

1{
2  "title": "Turandot (1981)",
3  "story line": "",
4  "original title": "NA",
5  "runtime": "",
6  "certificate": "",
7  "year": "1981",
8  "rating": "",
9  "ratingcount": "",
10  "description": "",
11  "stars": "Montserrat Caballé, Rémy Corazza, Fernand Dumont",
12  "director": "Pierre Desfons",
13  "genre": "Musical",
14  "country": "France"
15}

Changelog

IMDb Scraper is regularly updated, so please check the changelog for fixes and improvements.

Developer
Maintained by Community
Actor metrics
  • 17 monthly users
  • 98.1% runs succeeded
  • 12.6 days response time
  • Created in Oct 2019
  • Modified 2 months ago
Categories