Fast Scraper is a blazingly fast web scraper powered by Rust on the backend. It allows you to scrape static csfd HTML pages extremely quickly without renderring while using only 128 MB of memory. With this scraper, you can maximize the efficiency of your credits on Apify. 🚀🚀🚀

Regarding benchmark see https://apify.com/danielherman/fast-scraper.

Explanation of the input

There are some global parameters for the actor that you can find in the Input tab description and then there are requests. Requests have the following structure

{
    "request_type": string, // required
    "url": string, // optional
    "id": string, // optional
    "headers": object, // optional
    "user-agent": string // optional
}

Only request_type is required, so if request_type=Sitemap then url won't be considered, but with different request_type the url has to be mentioned otherwise the actor will panic. Key id is optional and will be copied to results, this value is for you if you want to track the requests with something else then url only. In the response list the order of scraped data most likely will be different than in requests. Both headers and user-agent are optional, you can also state user-agent in headers directly. Request headers and user-agent will override the global headers and user-agent. Let's see an example

{
    "requests": [
        {
            "request_type": "View",
            "url": "https://www.csfd.cz/film/68990-star-trek-hluboky-vesmir-devet/494608-serie-6/prehled/",
            "headers": {
                "dnt": "0",
                "priority": "u=0, i",
                "referer": "https://www.csfd.cz/"
            }
        }
    ],
    "headers": {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "accept-language": "en-US,en;q=0.9",
        "dnt": "1",
        "priority": "u=0, i",
        "sec-ch-ua": "\"Chromium\";v=\"124\", \"Google Chrome\";v=\"124\", \"Not-A.Brand\";v=\"99\"",
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "\"macOS\"",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
    },
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

Here the request will contain all the global headers and user_agent, but "dnt" (Do Not Track) header will be set to 0 from 1 and we also have two additional header "priority" and "referer". Once you set global headers you can not delete them at the request level, only override them.

Supported page types

There are different request_types

Sitemap request_type=Sitemap: this allows you to scrape all urls that are in the exposed csfd.cz sitemap. This can take a while and it is done in a single request specification.
View request_type=View: with this type you will obtain information for views (/prehled) of movies, serials, series and episodes.
View reviews request_type=ViewReviews: this would return you comments for specific movie, serial, series or episode.
User request_type=User: this would return you information about the user.
User reviews request_type=UserReviews: this would return you information about the user.
Ratings request_type=Ratings: this would return you all th user ratings for specific movie, serial, series or episode. PLANNED
Creator request_type=Creator: this would return you information about the creator. PLANNED
Program request_type=Program: this would return you parsed TV program from https://www.csfd.cz/televize/program/. PLANNED

At this moment there is only one page type supported and that is view type. Soon will be also added rating and comments types. You can now scrape the whole sitemap with this scraper.

Sitemap

Make sure that the timeout for actor is long enough (e.g. 3600 s). The scraping of sitemap is not done in parallel.

Input example:

{
    "requests": [
        {
            "request_type": "Sitemap"
        }
    ],
    "user_agent": "ApifyFastScraper/1.0",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

It will fetch for whole published sitemap of csfd.cz that contains also:

https://www.csfd.cz/film/
https://www.csfd.cz/tvurce/
https://www.csfd.cz/uzivatel/
https://www.csfd.cz/diskuze/
https://www.csfd.cz/akce/
https://www.csfd.cz/festival/
https://www.csfd.cz/kino/
https://www.csfd.cz/novinky/
https://www.csfd.cz/zanry/

Output example:

[
    {
        "id": "06de9c9d-b17f-44aa-a9f3-e87a6769fffd",
        "request_type": "Sitemap",
        "url": "https://www.csfd.cz/sitemap.xml",
        "data": {
            "Sitemap": [
                "https://www.csfd.cz/film/16-zurov/231-zurov-2/prehled/",
                "https://www.csfd.cz/film/16-zurov/703683-zurov/prehled/",
                "https://www.csfd.cz/film/16-zurov/703684-teorema-lobacevskogo/prehled/"
            ]
        }
    }
]

Views (film + prehled)

Pages of the type https://www.csfd.cz/film/<movie-id>/, https://www.csfd.cz/film/<movie-id>/<movie-id2>/, https://www.csfd.cz/film/<movie-id>/prehled/ or https://www.csfd.cz/film/<movie-id>/<movie-id2>/prehled/.

Set request_type=View and here is an example of input and output. Input example:

{
    "requests": [
        {
            "request_type": "View",
            "url": "https://www.csfd.cz/film/17592-ctyri-svatby-a-jeden-pohreb/prehled/"
        }
    ],
    "user_agent": "ApifyFastScraper/1.0",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

The following requests are all equivalent

"requests": [
    {
        "request_type": "View",
        "url": "https://www.csfd.cz/film/17592-ctyri-svatby-a-jeden-pohreb/prehled/"
    },
    {
        "request_type": "View",
        "url": "https://www.csfd.cz/film/17592/prehled/"
    },
]

Output example:

{
  "View": {
    "header_name": "Čtyři svatby a jeden pohřeb",
    "header_name_langs": [
      {
        "country": "Velká Británie",
        "title": "Four Weddings and a Funeral(více)"
      },
      {
        "country": "USA",
        "title": "Four Weddings and a Funeral"
      },
      {
        "country": "Slovensko",
        "title": "Štyri svadby a jeden pohreb(méně)"
      }
    ],
    "rating": "72%",
    "rating_votes_count": 14484,
    "rating_fanklub_count": 45,
    "origin": "Velká Británie / USA, 1994, 117 min(Alternativní 113 min)",
    "plot_full": "Snímek vypráví příběh Charlese (Hugh Grant), vtipného a okouzlujícího muže, který ve svých dvaatřiceti letech stále střídá partnerky jako na běžícím pásu. Jeho životem prošla spousta žen, které zbožňoval, ale s žádnou z nich nedokázal navázat hlubší vztah. Rezervovaný Angličan vystavěl kolem vlastního nitra tak nepropustnou zeď, že nyní nedokáže projevit své city. A čím více svateb společně se svými kamarády navštíví, tím méně se sám hrne do ženění. Až do oné osudné soboty, kdy v jednom kostele spatří Carrie (Andie MacDowellová) – tu nejzajímavější, nejkrásnější, nejdůvtipnější a také nejnedostupnější Američanku, jakou kdy v životě potkal. Charles se ze všech sil snaží, aby ji příliš neuháněl a hlavně se do ní nezamiloval - během jednoho pohřbu a tří dalších svateb…(Cinemax)",
    "genres": [
      "Komedie",
      "Romantický",
      "Drama"
    ],
    "creators": [
      {
        "name": "režie",
        "people": [
          {
            "name": "Mike Newell",
            "url": "/tvurce/4060-mike-newell/"
          }
        ]
      },
      {
        "name": "scénář",
        "people": [
          {
            "name": "Richard Curtis",
            "url": "/tvurce/6726-richard-curtis/"
          }
        ]
      },
      {
        "name": "kamera",
        "people": [
          {
            "name": "Michael Coulter",
            "url": "/tvurce/75908-michael-coulter/"
          }
        ]
      },
      {
        "name": "hudba",
        "people": [
          {
            "name": "Richard Rodney Bennett",
            "url": "/tvurce/63995-richard-rodney-bennett/"
          }
        ]
      },
      {
        "name": "hrají",
        "people": [
          {
            "name": "Hugh Grant",
            "url": "/tvurce/332-hugh-grant/"
          },
          {
            "name": "Andie MacDowell",
            "url": "/tvurce/130-andie-macdowell/"
          },
          {
            "name": "James Fleet",
            "url": "/tvurce/17860-james-fleet/"
          },
          {
            "name": "Simon Callow",
            "url": "/tvurce/12966-simon-callow/"
          },
          {
            "name": "John Hannah",
            "url": "/tvurce/803-john-hannah/"
          },
          {
            "name": "Kristin Scott Thomas",
            "url": "/tvurce/164-kristin-scott-thomas/"
          },
          {
            "name": "Elspet Gray",
            "url": "/tvurce/35549-elspet-gray/"
          },
          {
            "name": "Rowan Atkinson",
            "url": "/tvurce/349-rowan-atkinson/"
          },
          {
            "name": "Corin Redgrave",
            "url": "/tvurce/16584-corin-redgrave/"
          },
          {
            "name": "Anna Chancellor",
            "url": "/tvurce/12166-anna-chancellor/"
          },
          {
            "name": "Hannah Taylor-Gordon",
            "url": "/tvurce/23562-hannah-taylor-gordon/"
          },
          {
            "name": "Bernice Stegers",
            "url": "/tvurce/11078-bernice-stegers/"
          },
          {
            "name": "Jeremy Kemp",
            "url": "/tvurce/53343-jeremy-kemp/"
          },
          {
            "name": "Sophie Thompson",
            "url": "/tvurce/55128-sophie-thompson/"
          },
          {
            "name": "Charlotte Coleman",
            "url": "/tvurce/76910-charlotte-coleman/"
          },
          {
            "name": "David Haig",
            "url": "/tvurce/78156-david-haig/"
          },
          {
            "name": "Nicola Walker",
            "url": "/tvurce/111089-nicola-walker/"
          },
          {
            "name": "Struan Rodger",
            "url": "/tvurce/115678-struan-rodger/"
          },
          {
            "name": "Simon Kunz",
            "url": "/tvurce/145261-simon-kunz/"
          },
          {
            "name": "Duncan Kenworthy",
            "url": "/tvurce/205006-duncan-kenworthy/"
          },
          {
            "name": "Rosalie Crutchley",
            "url": "/tvurce/214678-rosalie-crutchley/"
          },
          {
            "name": "Rupert Vansittart",
            "url": "/tvurce/219917-rupert-vansittart/"
          },
          {
            "name": "Kenneth Griffith",
            "url": "/tvurce/277724-kenneth-griffith/"
          },
          {
            "name": "Philip Voss",
            "url": "/tvurce/298970-philip-voss/"
          },
          {
            "name": "Randall Paul",
            "url": "/tvurce/308830-randall-paul/"
          },
          {
            "name": "Sara Crowe",
            "url": "/tvurce/157205-sara-crowe/"
          },
          {
            "name": "Richard Butler",
            "url": "/tvurce/348163-richard-butler/"
          },
          {
            "name": "Nigel Hastings",
            "url": "/tvurce/368537-nigel-hastings/"
          },
          {
            "name": "Juliette James",
            "url": "/tvurce/529994-juliette-james/"
          },
          {
            "name": "Amanda Mealing",
            "url": "/tvurce/875966-amanda-mealing/"
          }
        ]
      },
      {
        "name": "produkce",
        "people": [
          {
            "name": "Duncan Kenworthy",
            "url": "/tvurce/205006-duncan-kenworthy/"
          },
          {
            "name": "Eric Fellner",
            "url": "/tvurce/150112-eric-fellner/"
          }
        ]
      },
      {
        "name": "střih",
        "people": [
          {
            "name": "Jon Gregory",
            "url": "/tvurce/241299-jon-gregory/"
          }
        ]
      },
      {
        "name": "scénografie",
        "people": [
          {
            "name": "Anna Pinnock",
            "url": "/tvurce/787919-anna-pinnock/"
          }
        ]
      },
      {
        "name": "masky",
        "people": [
          {
            "name": "Ann Buchanan",
            "url": "/tvurce/630463-ann-buchanan/"
          }
        ]
      },
      {
        "name": "kostýmy",
        "people": [
          {
            "name": "Lindy Hemming",
            "url": "/tvurce/254644-lindy-hemming/"
          }
        ]
      }
    ],
    "vod_content": [
      {
        "name": "Apple TV+",
        "ga_name": "vod-service-apple-tv|film|vod",
        "url": "https://tv.apple.com/cz/movie/four-weddings-and-a-funeral/umc.cmc.50uemm7f92zctyyjp8z6x1upu"
      },
      {
        "name": "Google Play",
        "ga_name": "vod-service-google-play|film|vod",
        "url": "https://play.google.com/store/movies/details/Four_Weddings_And_A_Funeral?id=ZnSmxlAWj4s&hl=cs&gl=cz"
      }
    ]
  }
}

View reviews (film + recenze)

Pages of the type https://www.csfd.cz/film/<movie-id>/recenze/?page=<N> and https://www.csfd.cz/film/<movie-id>/<movie-id2>/recenze/?page=<N>.

For request_type=ViewReviews you have to make sure that the url contains film and recenze. You don't have to put ?page=<N> at the end of the url, because it will be replaced with page=1, the number of pages needed to scan will retrieved and then the scraper will scrape all of them one at the time. Input example:

{
    "requests": [
        {
            "request_type": "ViewReviews",
            "url": "https://www.csfd.cz/film/1490468-survivor-cesko-slovensko/1486957-serie-3/recenze/"
        }
    ],
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

The results, are split so that results have roughly 1MB or less, this way we can make sure that the results will be uploaded to apify store. The id stays the same and part key indicates the order of the results. Output example:

[
    {
        "id": "191f5d99-e101-455d-b833-9554e7b102e8",
        "request_type": "ViewReviews",
        "url": "https://www.csfd.cz/film/2294-vykoupeni-z-veznice-shawshank/recenze/",
        "data": {
            "part": 1,
            "content": [
                {
                    "user_name": "golfista",
                    "user_url": "/uzivatel/95-golfista/",
                    "star_rating": "5",
                    "comment": "\n Na velmi ošemetnou a těžko zodpověditelnou otázku \"který film je podle vás nejlepší\", mi dal do úst tímhle dílem Frank Darabont odpověď, za kterou se opravdu nebudu stydět. Pokud bych měl jenom jednu (možná dvě :) možnost, pak právě sem patří 6*. Bohužel jsem nestihl tenhle film v kině, ale vydáním na DVD jsem si ho konečně vychutnal i v originále a je to fakt nádhera (tím nechci hanět český dabing, který je mimochodem vynikající).\n",
                    "comment_html": "Na velmi ošemetnou a těžko zodpověditelnou otázku \"který film je podle vás nejlepší\", mi dal do úst tímhle dílem Frank Darabont odpověď, za kterou se opravdu nebudu stydět. Pokud bych měl jenom jednu (možná dvě :) možnost, pak právě sem patří 6*. Bohužel jsem nestihl tenhle film v kině, ale vydáním na DVD jsem si ho konečně vychutnal i v originále a je to fakt nádhera (tím nechci hanět český dabing, který je mimochodem vynikající).",
                    "date": "14.02.2003"
                },
                ...
            ]
        }
    }
]

View ratings (film + prehled)

User (uzivatel)

For request_type=ViewRatings you have to make sure that the url contains film. You don't have to put ?pageRating=<N> at the end of the url, because it will be replaced with pageRating=1, the number of pages needed to scan will retrieved and then the scraper will scrape all of them one at the time. Input example:

{
    "requests": [
        {
            "request_type": "ViewRatings",
            "url": "https://www.csfd.cz/film/425904-mizerove-na-zivot-a-na-smrt/prehled/"
        }
    ],
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

[
    {
        "id": "7749baa2-b364-4920-afbb-88907fa2f194",
        "request_type": "ViewRatings",
        "url": "https://www.csfd.cz/film/425904-mizerove-na-zivot-a-na-smrt/prehled/",
        "data": {
            "part": 1,
            "content": [
                {
                    "user_name": "POMO",
                    "user_url": "/uzivatel/1-pomo/",
                    "date": "Vloženo v 05.06.2024",
                    "star_rating": "3"
                },
                {
                    "user_name": "kleopatra",
                    "user_url": "/uzivatel/1263-kleopatra/",
                    "date": "Vloženo v 07.06.2024",
                    "star_rating": "4"
                },
                ...
            ]
        }
    }
]

User reviews (uzivatel + recenze)

Pages of the type https://www.csfd.cz/uzivatel/<movie-id>/recenze/?page=<N>.

For request_type=UserReviews you have to make sure that the url contains uzivatel and recenze. You don't have to put ?page=<N> at the end of the url, because it will be replaced with page=1, the number of pages needed to scan will retrieved and then the scraper will scrape all of them one at the time. Input example:

{
    "requests": [
        {
            "request_type": "UserReviews",
            "url": "https://www.csfd.cz/uzivatel/195357-verbal/recenze/"
        }
    ],
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

[
    {
        "id": "e5475be7-a70c-4749-bfff-60ad68bdc38e",
        "request_type": "UserReviews",
        "url": "https://www.csfd.cz/uzivatel/195357-verbal/recenze/",
        "data": {
            "part": 1,
            "content": [
                {
                    "movie_name": "Mizerové: Na život a na smrt",
                    "movie_url": "/film/425904-mizerove-na-zivot-a-na-smrt/",
                    "star_rating": "5",
                    "comment": "\n Jak repujeme my, sportovně založení bílí Dolní Slezané „ Bembajs, bembajs, jak pro tebe du, narobiš pyču!!! “… A oni zas přišli, co nadělám! Šup do kina! Navíc se belgičtí uzenáči od minula o dost zlepšili, scénáristé zavzpomínali na staroškolské fláky, a Špatňáci jsou tak na zase plné kule tím, čím bývali za časů Míši Záliva ve starých dobrých devadesátkách. Tedy pořád docela freš Wilík a ubohý starý trapák Lórenc proti bandě konečně charismatických a bezskrupulózních zlolidí ve vinně potěšující akčně kokotmediální taškařici a lá Smrtonosné smrti. Míša si v tom zase štěknul a docela nechápu, proč si svou vypiplanou značku rovnou nezmáknul sám. Patrně má nahrabáno tolik, že už jen rybaří v Zálivu. Ale i tak furt klasicka blažena oddychovka jak cyp.\n",
                    "comment_html": "Jak repujeme my, sportovně založení bílí Dolní Slezané „<em>Bembajs, bembajs, jak pro tebe du, narobiš pyču!!!</em>“… A oni zas přišli, co nadělám! Šup do kina! Navíc se belgičtí uzenáči od minula o dost zlepšili, scénáristé zavzpomínali na staroškolské fláky, a Špatňáci jsou tak na zase plné kule tím, čím bývali za časů Míši Záliva ve starých dobrých devadesátkách. Tedy pořád docela freš Wilík a ubohý starý trapák Lórenc proti bandě konečně charismatických a bezskrupulózních zlolidí ve vinně potěšující akčně kokotmediální taškařici a lá Smrtonosné smrti. Míša si v tom zase štěknul a docela nechápu, proč si svou vypiplanou značku rovnou nezmáknul sám. Patrně má nahrabáno tolik, že už jen rybaří v Zálivu. Ale i tak furt klasicka blažena oddychovka jak cyp.",
                    "date": "14.06.2024"
                },
                ...
            ]
        }
    }
]

Your feedback

I am always working on improving the performance of my Actors. So if you’ve got any technical feedback for Fast Scraper or simply found a bug, please create an issue on the Actor’s Issues tab in Apify Console.

On this page

What is CSFD Scraper?
Explanation of the input
Your feedback

Share Actor:

Fast Scraper

danielherman/fast-scraper

Fast Scraper is a blazingly fast web scraper powered by Rust on the backend. It allows you to scrape static HTML pages extremely quickly while using only <128 MB of memory. With this scraper, you can maximize the efficiency of your credits on Apify.

Daniel Herman

Rust Scraper

lukaskrivka/rust-scraper

Speed of light scraping with Rust programming language! This is an early alpha version for experimenting, use at your own risk!

Lukáš Křivka

Actor in Rust Example

lukaskrivka/rust-actor-example

Example actor built in Rust programming language. Downloads HTML from any page. Works on Apify platform and locally.

Lukáš Křivka

web-scrape-data

angelbabyai123/my-actor

web-scrape-data

Angel Baby

Rust Input Function Example

lukaskrivka/rust-input-function-example

Dynamically compile and run input-provided page function. Like Cheerio Scraper but in Rust.

Lukáš Křivka

AI Search

desearch/ai-search

allows you to perform AI-powered web searches, gathering relevant information from multiple sources, including web pages, research papers, and social media discussions.

Desearch

Dynamic Web Scraper

josejet/dynamic-web-scraper

Dynamic Web Scraper is an Apify Actor that gathers information online by simulating user browsing behavior on the web. It reduces the time and amount of scraped web pages by using a model (ChatGPT) to make decisions regarding browser navigation and results evaluation.

Pepa J W̚͠h̾̔̎̿͊͛̄͊e̢̦̲̰̦̋̇͗̾̑oi̟͈̯̝̊̉́̇͑̕ğ̆͘͡e͗͛o͊̔̇̄

159

Jobs.cz Scraper

lexis-solutions/jobs-cz-scraper

Scrape job listings from Jobs.cz - including titles, companies, locations, salaries, and requirements. Ideal for building job boards, market analysis, and trend tracking. Fast, structured, and customizable extraction from the Czech Republic’s leading job portal.

Lexis Solutions

5.0

Pinterest Scraper

danielmilevski9/pinterest-crawler

Our free Pinterest Scraper allows you to get "pins" along with a user's profile. This unofficial Pinterest API is designed to give you more details than you can see in the web interface. It also enables you to extract public data from Pinterest without limits.

Daniel Milevski

2.3K

5.0

Alza.cz Product Scraper

bytepulselabs/alza-product-scraper

Scrape all Alza.cz products. Add one or more product category URLs and extract product details, prices, ratings, and availability data. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.