Cern Opendata Actor

Under maintenance

Pricing

from $0.05 / 1,000 results

Try for free

Go to Apify Store

Cern Opendata Actor

Under maintenance

Try for free

Harvests the CERN OpenData catalog

Pricing

from $0.05 / 1,000 results

Rating

0.0

(0)

Developer

Maksim Kudriavtsev

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

CERN OpenData Harvester

Actor collects the CERN OpenData catalog, bypassing the 10,000 result limit: it reads public sitemaps, converts links to API calls, and, if necessary, downloads additional records by recid ranges. Normalized data is written to a dataset; raw payloads can optionally be stored in a key-value store.

Input

Define in .actor/input_schema.json. All fields are optional and have defaults:

sitemapIndexUrl (string, default https://opendata.cern.ch/sitemap.xml): root sitemap index.
enableSitemapScan (boolean, default true): enable sitemap crawling.
enableRecidScan (boolean, default true): enable searching by recid ranges.
maxWorkers (integer, default 12): Parallel downloads of API pages from sitemaps.
retries (integer, default 5): HTTP error/429/5xx attempts.
recidMax (integer, default 120000): Upper limit of recid for crawling.
recidStep (integer, default 500): Recid step size for search requests.
pageSize (integer, default 200): Search API page size.
skipIds (array of string): List of IDs/slugs to skip (default includes service pages).
storeRaw (boolean, default false): Whether to store raw API payloads in KV. - rawKeyValuePrefix (string, default raw-cern-opendata/): Key prefix in KV when storeRaw=true.

Output

Dataset (default)

Each record is a normalized object:

{
    "id": "12345",
    "title": "Some dataset",
    "type": "dataset",
    "experiment": "CMS",
    "availability": "open",
    "file_count": 4,
    "portal_url": "https://opendata.cern.ch/record/12345",
    "api_url": "https://opendata.cern.ch/api/records/12345",
    "created": "2020-01-01T00:00:00",
    "updated": "2020-05-01T00:00:00",
    "files": [{ "key": "...", "uri": "...", "size": 123, "availability": "...", "checksum": "...", "version_id": "...", "tags": [...] }], 
    "description": "...", 
    "keywords": [...], 
    "collections": [...], 
    "distribution": {...}, 
    "pids": {...}, 
    "publisher": "...", 
    "language": "...", 
    "run_period": "...", 
    "source": "sitemap|recid-range", 
    "bucket_url": "https://opendata.cern.ch/api/files/<bucket>"
}

Dataset view overview shows: id, title, type, experiment, availability, file_count, portal_url, api_url, created, updated.

Vinted UK Catalog Scraper

agenscrape/vinted-uk-catalog-scraper

Scrape product listings from vinted.co.uk catalog pages. Extract comprehensive product data including prices, images, seller information, and more.

Agenscrape

Reddit Subreddit Scraper

backhoe/reddit-subreddit-scraper

Reddit Subreddit Scraper is your plug-and-play radar for Reddit communities: it harvests fresh stats from 100+ subreddits via Apify Residential proxies, returns clean JSON, and drops straight into AI pipelines or dashboards within minutes.

5.0

Kawasaki Parts Catalog

making-data-meaningful/kawasaki-parts-catalog

Unleash performance with the Kawasaki Parts Catalog. Explore genuine OEM and racing-grade components - from engines and exhausts to suspension and electronics - built for power, precision, and the ultimate ride experience.

Making Data Meaningful

Audi Parts Catalog

making-data-meaningful/audi-parts-catalog

Explore the Audi Parts Catalog with vehicle models, series, and spare parts. The Audi Parts Catalog API provides structured, real-time data covering everything from manufacturer details to VIN decoding and OEM cross-references - making it an essential resource for developers, auto parts platforms.

Making Data Meaningful

Deep Research Actor

imdigitalashish/deep-research-actor

Actor

Ashish Kumar Verma

IKEA Products Bycategory

pintostudio/ikea-products-bycategory

This actor extracts product information from IKEA's online catalog by category.

Pinto Studio

GIS scraper

rrroman.usa/gis-scraper

A scraper for extracting company data from the GIS catalog

Roman

5.0

Nissan Parts Catalog

making-data-meaningful/nissan-parts-catalog

Discover the Nissan Parts Catalog featuring genuine OEM and quality aftermarket components. From engines and brakes to interiors and electronics, find precision-built parts designed for innovation, reliability, and performance.

Making Data Meaningful

Cadillac Parts Catalog

making-data-meaningful/cadillac-parts-catalog

Experience modern American luxury with the Cadillac Parts Catalog. Discover genuine OEM and premium aftermarket components - from engines and transmissions to interiors and lighting - crafted for performance, comfort, and timeless design.

Making Data Meaningful

Volkswagen Parts Catalog

making-data-meaningful/volkswagen-parts-catalog

Explore the Volkswagen Parts Catalog featuring genuine OEM and certified aftermarket components. From engines and transmissions to interiors and electronics, find precision-built VW parts designed for quality, safety, and performance.

Making Data Meaningful