Pricing

Pay per usage

Go to Store

Dataset Validity Checker

Try for free

Developed by

Matěj Sochor

Automatically checks, whether default datasets created by runs of an actor differ too much from the previously encountered ones, allowing it to warn you about web scraping problems caused by, e.g., a website layout changing, or other significant changes in the resulting data.

0.0 (0)

Pricing

Pay per usage

Last modified

3 years ago

Developer tools

Actor Id

actIdstringOptional

Id of the actor whose datasets the validity checker is supposed to process.

Task Id

taskIdstringOptional

Id of the task whose datasets the validity checker is supposed to process. Supersedes the actId.

User Token

tokenstringOptional

Token of the user owning the examined actor/task. If not filled, token of the user starting the Dataset Validity Checker is used.

Warning Email

warningEmailstringOptional

An email, where warnings about invalid datasets should be sent.

Clear History

clearHistorybooleanOptional

Set to true if you want the validity checker to discard all previously gathered information about datasets and start anew. You should use this option if you change the actor in a way that significantly changes its results, or if the website changes significantly in a way, that doesn't actually break your actor (e.g. the amount of different items available for purchase at an e-shop changes drastically).

Default value of this property is false

Previous Datasets Considered

previousDatasetsTakenIntoAccountintegerOptional

A number of previous datasets that will be considered when determining whether the dataset is valid. If not filled, the value will be 100.

Minimal Datasets

minimalDatasetCountintegerOptional

Minimal number of datasets processed needed to validate further datasets. Needs to be at most the same value as 'Previous Datasets Considered Count'. If not filled, the value will be 10.

Number Handling Policy

numberHandlingPolicyEnumOptional

Governs what attributes the Dataset Validity Checker considers to be numbers. If it is 'Strict', only values saved as number type will be considered as such. If 'Loose', strings that are numbers in a non-scientific notation are also handled like numbers. 'Strict' policy is generally better, but if you don't convert numbers to the proper type, using 'Loose' should give you better results.

Value options:

"loose": string"strict": string

Default value of this property is "loose"

Starting At

startingAtstringOptional

Allows you to control, what will be the earliest run whose dataset will be processed by this run of Dataset Validity Checker. Will be superseded, if runs from later time have already been processed. Has to be ISO 8601 compliant date/time in UTC.

Until

untilstringOptional

Allows you to control, what will be the latest run whose dataset will be processed by this run of Dataset Validity Checker. Has to be ISO 8601 compliant date/time in UTC.

Average Multiplying Coefficient

averageMultiplyingCoefficientstringOptional

Controls how different the dataset can be compared to the previously seen datasets to still be considered valid in terms of multiples of average difference. Default value is 5.

Maximal Multiplying Coefficient

maximalMultiplyingCoefficientstringOptional

Controls how different the dataset can be compared to the previously seen datasets to still be considered valid in terms of multiples of maximal difference. Default value is 2.

Leniency Coefficient

leniencyCoefficientstringOptional

Allows you to control both 'Maximal Multiplying Coefficient' and 'Average Multiplying Coefficient' at the same time. Is multiplicative, so a value of 2 increases both of them by a factor of 2. Default value is 1.

Append to dataset

valek.josef/append-to-dataset

Utility actor that allows you to build a single large dataset from individual default datasets of other actor runs.

Josef Válek

Failed Runs Monitor

jannovotny/failed-runs-monitor

This actor will let you know about failed or time outed runs of your actors and tasks via Slack or email. It can also notice you about successful runs with empty dataset, check JSON schema of dataset items, or about runs that are running for too long.

Jan Novotný

Website-checker-starter

vaclavrut/website-checker-starter

Works with lukaskrivka/website-checker. The idea is that this actor manages more URLs on the input, will start website-checker with 10 runs at a time and store all data to one datasets.

Vaclav Rut

Website Checker Workload

lukaskrivka/website-checker-workload

Creates reasonable workloads for analyzing any website with the Website Checker actor and combines the resulting data. This is the easiest way to analyze any website for compute unit usage and anti-scraping blocking.

Lukáš Křivka

Content Checker

jakubbalada/content-checker

Monitor a website or web page for content changes. Automatically saves before and after screenshots and sends an email notification when content changes are detected.

Jakub Balada

2.4K

4.4

Scraper Results Checker

drobnikj/check-crawler-results

This actor checks results from Apify's scrapers or any other actor that stores its result to a dataset, and sends a notification if there are errors. It's designed to run from webhook.

Jakub Drobník

Web Crawler

rigelbytes/webcrawler

This web crawler is designed to provide users with complete flexibility by allowing them to use their **own proxies**. The scraper collects all pages from the website and returns extracts the **MetaData**, **Title**, and **Content** of the page in MarkDown.

Rigel Bytes

Get Countries Info By Code Scraper

dev_bodex/get-countries-info-by-code-scraper

This scraper is designed to retrieve detailed information about countries by their respective ISO country codes (e.g., "US" for the United States) or by their currency codes (e.g., "USD" for US Dollar).

Festus Befgrp

Extract Website With URL

mrahil/extract-website-with-url

The Extract Website with URL API allows users to extract structured data from any webpage by providing a URL. It retrieves HTML, metadata, tables, and images, returning data in JSON format. Ideal for web scraping, SEO analysis, and content extraction. Use it for e-commerce data, news scraping

Mohammed Rahil

Forward dataset as POST data

anchor/forward-dataset-webhook

This actor forwards the results of an Actor to an endpoint, instead of having to fetch the results manually. It will download the dataset and attach it to the body of a POST request you will specify. It acts as a new webhook. Simplify your Actor process !!!