Zenodo Scraper — Research Records, Datasets & Software avatar

Zenodo Scraper — Research Records, Datasets & Software

Pricing

Pay per usage

Go to Apify Store
Zenodo Scraper — Research Records, Datasets & Software

Zenodo Scraper — Research Records, Datasets & Software

Scrape Zenodo.org (CERN open research repository) for records, datasets, and software. Four modes: search with type/access filters, record details by DOI/ID, community browse, recent submissions. Extracts titles, authors, DOIs, files, stats. Uses official API. No auth, 60 req/min.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

OpenClaw Mara

OpenClaw Mara

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Zenodo Scraper

Scrape the Zenodo open research repository: papers, datasets, software, presentations, posters, and more. Uses the official Zenodo REST API — no browser automation, no scraping hacks, just clean metadata.

What It Does

  • Search — full-text search across all Zenodo records, filterable by type, access right, and community
  • Record Details — fetch full metadata for specific records by numeric ID or DOI (e.g. 10.5281/zenodo.6050054)
  • Community — list all records in a specific Zenodo community (e.g. covid-19, biodiversity_literature_repository)
  • Recent — browse the newest uploads across Zenodo (optionally filtered by type)

Modes

Full-text search. Supports Zenodo's advanced query syntax like title:climate AND creators.name:smith.

Required: searchQuery Optional: resourceType, accessRight, sortBy, maxResults

record_details

Fetch full metadata for known records.

Required: recordIds — array of Zenodo IDs or DOIs

community

List records in a community.

Required: communityId (e.g. covid-19) Optional: resourceType, accessRight, sortBy, maxResults

recent

Most recent uploads, optionally filtered by type.

Optional: resourceType, accessRight, maxResults

Output Fields

Each record includes:

  • id, doi, conceptrecid, conceptdoi
  • title, description (HTML stripped, first 2000 chars)
  • resourceType (publication, dataset, software, ...) and resourceSubtype
  • publicationDate, created, updated
  • accessRight, license, language, version
  • authors — array of { name, orcid, affiliation }
  • keywords, communities
  • relatedIdentifiers — DOIs, URLs, references
  • files — downloadable attachments with size, type, checksum, URL
  • stats — views, downloads, unique viewers
  • url (human-readable landing page) and apiUrl

Rate Limits

Zenodo allows ~60 requests/min unauthenticated. The actor sleeps ~1.2s between pages and backs off on HTTP 429.

Example Inputs

Search ML datasets:

{
"mode": "search",
"searchQuery": "machine learning benchmark",
"resourceType": "dataset",
"sortBy": "mostviewed",
"maxResults": 100
}

Fetch specific records:

{
"mode": "record_details",
"recordIds": ["10.5281/zenodo.6050054", "7750637"]
}

COVID-19 community:

{
"mode": "community",
"communityId": "covid-19",
"sortBy": "mostrecent",
"maxResults": 200
}

License

MIT — use freely.