Deprecated

Pricing

Pay per usage

See alternative Actors

Go to Apify Store

Website Backup

Deprecated

See alternative Actors

Enables to create a backup of any website by crawling it, so that you don’t lose any content by accident. Ideal e.g. for your personal or company blog.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Matej Hamas

Actor stats

Bookmarked

312

Total users

Monthly active users

3 months ago

Last modified

Apify Actor - Website Backup

Description

The purpose of this actor is to enable creation of website backups by recursively crawling them. For example, we’d use it to make regular backups of https://blog.apify.com/, so that we don’t lose any content by accident. Although such backup cannot be automatically restored, it’s better than losing data completely.

Given URL entry points, the actors recursively crawls the links found on the pages using a provided CSS selector and create a separate MHTML snapshot of each page. Each snapshot is taken after the full page is rendered with Puppeteer crawler and includes all the content such as images and CSS. Hence, it can be used on any HTML / JS / Wordpress web sites which don't require authentication.

Input parameters

Field	Type	Description
startURLs	array	List of URL entry points
linkSelector	string	CSS selector matching elements with 'href' attributes that should be enqueued
maxRequestsPerCrawl	integer	The maximum number of pages that the scraper will load. The scraper will stop when this limit is reached. It's always a good idea to set this limit in order to prevent excess platform usage for misconfigured scrapers. Note that the actual number of pages loaded might be slightly higher than this value.If set to `0`, there is no limit.
maxCrawlingDepth	integer	Defines how many links away from the StartURLs will the scraper descend. 0 means unlimited.
maxConcurrency	integer	Defines how many pages can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. Use this option to set a hard limit.
customKeyValueStore	string	Use custom named key value store for saving results. If the key value store with this name doesn't yet exist, it's created. The snapshots of the pages will be saved in the key value store.
customDataset	string	Use custom named dataset for saving metadata. If the dataset with this name doesn't yet exist, it's created. The metadata about the snapshots of the pages will be saves in the dataset.
proxyConfiguration	object	Choose to use no proxy, Apify Proxy, or provide custom proxy URLs.
sameOrigin	boolean	Only backup URLs with the same origin as any of the start URL origins. E.g. when turned on for a single start URL `https://blog.apify.com`, only links with prefix `https://blog.apify.com` will be backed up recursively.
timeoutForSingleUrlInSeconds	integer	Timeout in seconds for doing a backup of a single URL. Try to increase this timeout in case you see an error `Error: handlePageFunction timed out after X seconds.` .
navigationTimeoutInSeconds	integer	Timeout in seconds in which the navigation needs to finish. Try to increase this if you see an error `Navigation timeout of XXX ms exceeded`
searchParamsToIgnore	array	Names of URL search parameters (such as 'source', 'sourceid', etc.) that should be ignored in the URLs when crawling.

Output

Single zip file containing MHTML snapshot and its metadata is stored in a key value store (default or named depending on the input argument) for each URL visited. The key for each zip file includes a timestamp, URL hash and the URL in a human readable form. Note that the Apify platform only supports certain characters and limits the length of the key to 256 characters (that is why e.g. / is removed). Apart from the key value store, metadata for the crawled webpages are also stored in a dataset (default or named).

Compute unit consumption

An example run which did a backup of 323 webpages under <a href='https://blog.apify.com%22%3Eblog.apify.com, configured with 8192 Mb of memory and lasting 12 minutes consumed 1.6617 compute units.

Extract Any Website with Source Code

mikolabs/extract-any-website-with-source-code

Download complete websites and get them as ZIP archives. Perfect for creating offline backups, archiving websites, or downloading entire sites with all assets. Includes source code. For Research purposes

Mikolabs

Website Extractor

mikolabs/website-extractor

Mikolabs

Scrap Any Website with Source Code

mikolabs/web-extractor

Mikolabs

Bulk Image Downloader

onescales/bulk-image-downloader

The Bulk Image Downloader is a powerful Apify actor that extracts and downloads images from web pages or processes direct image URLs in bulk. Whether you need to download a single image or thousands of images from multiple websites, this tool handles it all efficiently.

One Scales

1.4K

5.0

(11)

Full Site Downloader | $4.99/Site | 1-Time Crawl | All Assets

hailey_apify/Full-Website-Downloader

Full-Website-Downloader - Automatically crawls entire websites including HTML and all static assets (CSS, JS, images, etc.), preserves complete structure and exports as ZIP package. Supports depth control and same-domain resource filtering.

Hailey

Website Availability Checker

mina_safwat/website-availability-checker

Need to monitor your websites and ensure they're always accessible? Check website availability, response times, and server status in real-time. You can use it to get instant alerts on downtime and monitor your online presence with reliable uptime monitoring!

Mina

AI Web to Markdown - LLM-Ready Extractor

wiry_kingdom/ai-web-to-markdown

Convert any URL into clean LLM-ready markdown. Strips ads, nav, footer. Preserves headings, lists, tables, code blocks. Returns token count. Perfect for RAG, fine-tuning, AI agents. 10x cheaper than Firecrawl.

Mohieldin Mohamed

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.

Ryan Clinton

113

Website Security & Vulnerability Audit

smart-digital/website-security-vulnerability-audit

Automated security and vulnerability audit for websites. Detects WordPress plugin vulnerabilities, checks for updates, analyzes SSL/TLS, security headers, and CMS security

My Smart Digital

5.0

(1)

Download from Any Website: YouTube, TikTok, IG & 1000

scrapepilot/download-from-any-website-youtube-tiktok-ig-1000

The most powerful way to download videos from any website on Apify. Extract direct download URLs, metadata, thumbnails, and audio from YouTube, TikTok, Instagram, Facebook, Twitter/X, Reddit, Vimeo, Twitch, and 1000+ more platforms — . No browser. No login. Just results.