User

Jan Čurn

jancurn

Co-founder and CEO of Apify. Used to be a computer scientist in the previous life.

All
Popularity
Actor

HTML to PDF Converter

jancurn/url-to-pdf

Open a web page in headless Chrome using Puppeteer and print it to PDF. The input is a JSON object and output is a PDF file.

avatarjancurn
120star
FEATURED
Actor

PDF to HTML Converter

jancurn/pdf-to-html

Converts a PDF document to HTML using the pdf2htmlEX tool.

avatarjancurn
33star
FEATURED
Actor

Broken Links Checker

jancurn/find-broken-links

Crawls a website and finds broken links. Unlike other similar SEO analysis tools, the actor also reports broken URL #fragments. The results are stored in a JSON and HTML report.

avatarjancurn
16star
FEATURED
Actor

Algolia Webcrawler

jancurn/algolia-webcrawler

Crawls a website using one or more sitemaps and imports the data to Algolia search index. The text content is identified using simple CSS selectors.

avatarjancurn
12star
FEATURED
Actor

Naked domains analyzer

jancurn/analyze-domains

Crawls and downloads web pages running on a list of provided naked domains (e.g. "example.com"). The actor stores a HTML snapshot, screenshot, text body, and HTTP response headers of all the pages. It also extracts email addresses...

avatarjancurn
28star
Crawler

Data, what now?

Simple example showing how to scrape a list of posts from a personal blog.

avatarjancurn
17cloud_download
Actor

Extract Metadata

jancurn/extract-metadata

A small efficient act that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.

avatarjancurn
16star
Crawler

Machinery trader

Downloads a list of heavy-duty construction equipment for sale or rent, such as heavy duty trucks, trailers etc.

avatarjancurn
16cloud_download
Actor

Send Email On Crawler Finish

jancurn/send-email-on-crawler-finish

Fetches information about a crawler run and sends it to the user by email. For example, this actor can be used to inform the user that the crawler run finished. To do that, simply put the following URL into "Finish webhook URL" se...

avatarjancurn
6star
Crawler

motor-talk.de discussions

Extracts texts from a German automotive discussion portal. For example, such data set can be used by a machine learning system for sentiment analysis to figure out how people perceive various car models.

avatarjancurn
6cloud_download
Actor

Example Analyze Dom Css

jancurn/example-analyze-dom-css

Example showing how to use headless Chromium with Puppeteer to open a web page, fetch the list of DOM nodes on the pages and obtain CSS styling information for each HTML element. The actor uses the Chrome DevTools Protocol to acce...

avatarjancurn
5star
Crawler

Download CSS files

Downloads CSS files linked from a webpage.

avatarjancurn
5cloud_download
Actor

Probe Resources Plus Webhook

jancurn/probe-resources-plus-webhook

Calls jancurn/probe-page-resources and then invokes a hard-coded webhook. The act takes same input as jancurn/probe-page-resources

avatarjancurn
4star
Actor

Probe Page Resources

jancurn/probe-page-resources

Sequentially loads a list of URLs in headless Chrome and analyzes HTTP resources requested by each page. Source code at https://github.com/jancurn/act-probe-page-resources

avatarjancurn
4star
Actor

Cz President Election

jancurn/cz-president-election

Collects voting data from Czech statistical office about the Czech presidential election 2018.

avatarjancurn
4star
Actor

Example Sitemap Cheerio

jancurn/example-sitemap-cheerio

An example actor that first downloads a sitemap in XML format and the crawls each page from the sitemap using the fast CheerioCrawler from Apify SDK.

avatarjancurn
3star
Crawler

m.novinky.cz

Downloads a list of all news articles from novinky.cz from the past one week. Note that we're using the mobile version of the website, because it has a simpler structure and it's faster to load.

avatarjancurn
1cloud_download
Crawler

Firemni_seminare_SIS

Firemni seminare

avatarjancurn
0cloud_download