Actor picture

Website Backup

mhamas/website-backup

Enables to create a backup of any website by crawling it, so that you don’t lose any content by accident. Ideal e.g. for your personal or company blog.

No credit card required

Author's avatarMatej Hamas
  • Modified
  • Users111
  • Runs5,440

Start URLs

startURLs

Optional

array

List of URL entry points. Each entry is an object of type {'url': 'http://www.example.com'}

Link selector

linkSelector

Optional

string

CSS selector matching elements with 'href' attributes that should be enqueued. To enqueue urls from tags, you would enter div.my-class. Leave empty to ignore all links.

Max pages per run

maxRequestsPerCrawl

Optional

integer

The maximum number of pages that the scraper will load. The scraper will stop when this limit is reached. It's always a good idea to set this limit in order to prevent excess platform usage for misconfigured scrapers. Note that the actual number of pages loaded might be slightly higher than this value. If set to 0, there is no limit.

Max crawling depth

maxCrawlingDepth

Optional

integer

Defines how many links away from the StartURLs will the scraper descend. 0 means unlimited.

Max concurrency

maxConcurrency

Optional

integer

Defines how many pages can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. Use this option to set a hard limit.

Custom key value store

customKeyValueStore

Optional

string

Use custom named key value store for saving results. If the key value store with this name doesn't yet exist, it's created. The snapshots of the pages will be saved in the key value store.

Custom dataset

customDataset

Optional

string

Use custom named dataset for saving metadata. If the dataset with this name doesn't yet exist, it's created. The metadata about the snapshots of the pages will be saves in the dataset.

Timeout (in seconds) for backuping a single URL.

timeoutForSingleUrlInSeconds

Optional

integer

Timeout in seconds for doing a backup of a single URL. Try to increase this timeout in case you see an error Error: handlePageFunction timed out after X seconds. .

URL search parameters to ignore

searchParamsToIgnore

Optional

array

Names of URL search parameters (such as 'source', 'sourceid', etc.) that should be ignored in the URLs when crawling.

Only consider pages under the same domain as one of the provided URLs.

sameOrigin

Optional

boolean

Only backup URLs with the same origin as any of the start URL origins. E.g. when turned on for a single start URL https://blog.apify.com, only links with prefix https://blog.apify.com will be backed up recursively.

Proxy configuration

proxyConfiguration

Optional

object

Choose to use no proxy, Apify Proxy, or provide custom proxy URLs.