Founder and CEO of Apify. I still enjoy coding.
138 monthly users
53.1% runs succeeded
25 days response time
Hello world 👋🏻
This is a test.
A small efficient actor that loads a web page, parses its HTML using Cheerio library and extracts the following meta-data from the <HEAD> tag, such as page title, description, author etc.
Residential Proxy Probe
Find residential proxy sessions on Apify Proxy with target IP addresses geo-located in specific postal codes or DMAs.
Broken Link Checker
Crawls a website and finds broken links. Unlike other similar SEO analysis tools, the actor also reports broken URL #fragments. The results are stored in a JSON and HTML report.
Takes a screenshot of one or more web pages using the Chrome browser. The actor enables the setting of custom viewport size, page load timeout, delay, proxies, and output image format.
PDF to HTML Converter
Converts a PDF document to HTML using the pdf2htmlEX tool.
HTML to PDF Converter
Loads a web page in headless Chrome using Puppeteer and prints it to PDF. The input is a JSON object and output is a PDF file.
Naked Domains Analyzer
Crawls and downloads web pages running on a list of provided naked domains e.g. "example.com". The actor stores HTML snapshot, screenshot, text body, and HTTP response headers of all the pages. It also extracts email addresses, phones, social handles for Facebook, Twitter, LinkedIn, and Instagram.
Crawls a website using one or more sitemaps and imports the data to Algolia search index. The text content is identified using simple CSS selectors.
Selenium Custom Firefox POC
Uses Selenium to run custom build of Firefox that is harder to detect. The actor just saves a screenshot and snapshot of the HTML.
Probe Page Resources
Sequentially loads a list of URLs in headless Chrome and analyzes HTTP resources requested by each page. Source code at https://github.com/jancurn/act-probe-page-resources
Example Sitemap Cheerio
An example actor that first downloads a sitemap in XML format and the crawls each page from the sitemap using the fast CheerioCrawler from Apify SDK.
Probe Resources Plus Webhook
Calls jancurn/probe-page-resources and then invokes a hard-coded webhook. The act takes same input as jancurn/probe-page-resources