DataCite Metadata Scraper avatar

DataCite Metadata Scraper

Pricing

Pay per event

Go to Apify Store
DataCite Metadata Scraper

DataCite Metadata Scraper

Comprehensive DataCite metadata scraper for extracting DOI metadata from DataCite API. Perfect for researchers, librarians, and data scientists who need automated access to scholarly publication metadata, research datasets, and digital object identifiers.

Pricing

Pay per event

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

ParseForge Banner

📚 DataCite Metadata Scraper

🚀 Collect scholarly DOI metadata and research dataset records in seconds. Filter by keyword, repository, publisher, resource type, and year. No coding, no DataCite account required.

🕒 Last updated: 2026-04-16 · 📊 10 fields · 📖 Millions of DOI records · 🔬 Academic and research data

The DataCite Metadata Scraper retrieves Digital Object Identifier (DOI) metadata from the DataCite registry, which indexes over 45 million DOIs across academic publications, research datasets, software, and other scholarly outputs. Each record includes the DOI, title, publisher, publication year, resource type, creation date, update date, and a resolvable URL. You can filter by keyword, specific DOI, repository (Zenodo, Dryad, Figshare, Dataverse), publisher, resource type, and publication year. Free users can collect up to 10 records per run, while paid users can retrieve up to 1,000,000.

Whether you are building a literature database for a systematic review, analyzing publication trends across institutions, tracking open data availability in your research field, or monitoring repository output over time, this tool replaces hours of manual DOI lookups with a single automated query. Results export to JSON, CSV, or Excel for immediate use in citation managers, bibliometric tools, or data analysis pipelines. The scraper handles pagination and rate limiting automatically, letting you focus on research instead of data collection.

Target AudienceUse Cases
Academic ResearchersBuild literature databases and track publications in specific fields
Research LibrariansCatalog DOI records and monitor repository output
Data ScientistsAnalyze publication trends and research metadata at scale
Institutional AnalystsTrack publication volume and output across departments
Science Policy AnalystsStudy open data availability and repository growth
Bibliometric ResearchersCollect DOI metadata for citation and impact analysis

📋 What the DataCite Metadata Scraper does

  • 📚 DOI records - retrieve the full Digital Object Identifier for each scholarly output, ready for citation or resolution
  • 🏷️ Titles - extract publication or dataset titles for cataloging and search
  • 📰 Publishers - capture the organization or institution that registered the DOI
  • 📅 Publication years - filter and sort by year to focus on recent research or historical trends
  • 🗂️ Resource types - classify records as datasets, articles, software, images, or other scholarly object types
  • 🔗 Resolvable URLs - get working DOI links that resolve to the full publication or dataset landing page

The scraper queries the DataCite REST API and iterates through paginated results using your specified filters. Each record is normalized with consistent field names and pushed to an Apify dataset in real time. You can look up a single DOI or search across the entire DataCite registry with keyword and faceted filters.

💡 Why it matters: DataCite indexes DOIs from over 2,000 data centers worldwide. Manually searching and downloading metadata is tedious. This scraper gives you structured, filterable access to the registry in minutes.


🎬 Full Demo

🚧 Coming soon...


⚙️ Input

FieldTypeRequiredDescription
maxItemsintegerNoMaximum records to collect. Free: 10. Paid: up to 1,000,000.
querystringNoSearch term to find DOIs (e.g., "climate change", "machine learning").
doistringNoSpecific DOI to retrieve (e.g., 10.5281/zenodo.1234567). Returns only this record.
repositoryIdstringNoFilter by repository identifier (e.g., Zenodo, Dryad, Figshare).
publisherstringNoFilter by publisher name.
resourceTypestringNoFilter by type: Dataset, Article, Software, Image, etc.
yearintegerNoFilter by publication year (4-digit, e.g., 2023).
sortstringNoSort order: by creation date, update date, or publication year.

Example 1: Climate research datasets

{
"query": "climate",
"maxItems": 50,
"resourceType": "Dataset",
"year": 2023,
"sort": "-created"
}

Example 2: Look up a specific DOI

{
"doi": "10.5281/zenodo.1234567",
"maxItems": 1
}

⚠️ Good to Know: Free users are automatically limited to 10 items per run. When a specific DOI is provided, only that single record is returned. Leave the query field empty to browse all records with other filters applied.


📊 Output

🧾 Schema

EmojiFieldTypeDescription
📚doistringDigital Object Identifier for the record
🔗doiUrlstringResolvable URL (https://doi.org/...)
🏷️titlestringTitle of the publication or dataset
📰publisherstringOrganization that registered the DOI
📅publicationYearintegerYear of publication
🗂️resourceTypestringSpecific resource type (e.g., "Dataset")
📊resourceTypeGeneralstringGeneral resource category
🕐createdDatestringDate the DOI was created in the registry
🔄updatedDatestringDate the record was last updated
⚠️errorstringError message if processing failed

📦 Sample records


✨ Why choose this Actor

FeatureThis ActorAlternatives
Repository-specific filtering (Zenodo, Dryad, Figshare)YesNo
Resource type filtering (dataset, article, software)YesLimited
Publication year filteringYesYes
Publisher filteringYesRarely available
Single DOI lookup modeYesYes
Up to 1,000,000 records per runYesCapped lower
Export to JSON, CSV, and ExcelYesJSON only

📊 DataCite indexes over 45 million DOIs from 2,000+ data centers. This scraper lets you query the entire registry with keyword and faceted filters in a single run.


📈 How it compares to alternatives

CapabilityThis ActorManual DOI LookupsGeneric API Scripts
Bulk metadata retrievalYesOne at a timeRequires coding
Faceted filtering (type, year, publisher, repo)YesLimitedManual implementation
Automatic pagination and rate limitingYesN/AManual implementation
Scheduled recurring runsYesNoRequires infrastructure
No coding requiredYesYesNo
Export to CSV, Excel, JSONYesNoJSON only

This scraper wraps the DataCite API with a user-friendly interface, automatic pagination, and built-in export options.


🚀 How to use

  1. Sign up - Create a free Apify account with $5 credit
  2. Find the Actor - Search for "DataCite Metadata Scraper" in the Apify Store
  3. Set your search criteria - Enter keywords, resource type, year, or a specific DOI
  4. Start the run - Click "Start" and watch results appear in real time
  5. Export your data - Download as JSON, CSV, or Excel from the dataset tab

🕒 Typical run time: 15 to 60 seconds for up to 100 records. Larger runs with 1,000+ records may take a few minutes depending on the query scope.


💼 Business use cases

Academic Research

  • Build literature databases for systematic reviews
  • Track publication output from specific repositories
  • Monitor new datasets in your research field
  • Collect DOI metadata for bibliometric analysis

Library and Information Science

  • Catalog DOI records across institutional repositories
  • Monitor open data availability by subject area
  • Track publisher output and growth over time
  • Build metadata indexes for discovery systems

Institutional Analytics

  • Track departmental publication and dataset output
  • Monitor which repositories your institution uses most
  • Analyze trends in resource types over time
  • Build reports on open data contributions by year

Science Policy

  • Study open data mandates and compliance rates
  • Track growth of data sharing across disciplines
  • Monitor repository adoption trends globally
  • Analyze the distribution of resource types by field


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

🤝 Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

🧪 Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions


🔌 Automating DataCite Metadata Scraper

Node.js example:

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('parseforge/datacite-metadata-scraper').call({
query: 'climate change',
maxItems: 100,
resourceType: 'Dataset'
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python example:

from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('parseforge/datacite-metadata-scraper').call(run_input={
'query': 'climate change',
'maxItems': 100,
'resourceType': 'Dataset'
})
items = list(client.dataset(run['defaultDatasetId']).iterate_items())
print(items)

Schedules: Set up weekly or monthly runs to track new DOI registrations in your field. Combine with Google Sheets or Slack integrations to get notified when new records match your query.

🔌 Integrate with any app

  • Make - Automate DOI metadata workflows and sync with research databases
  • Zapier - Connect to 5,000+ apps and trigger actions on new DOI records
  • Slack - Get notifications when new publications match your query
  • Airbyte - Stream DOI metadata into your data warehouse
  • GitHub - Version control your scraper configurations
  • Google Drive - Export results directly to Google Sheets

ActorDescription
Hugging Face Model ScraperCollect model metadata and download stats from Hugging Face
PR Newswire ScraperCollect press releases and research announcements
GSA eLibrary ScraperCollect government contractor and vendor data
Greatschools ScraperExtract school ratings and performance data
Smart Apify Actor ScraperScrape Apify actor metadata with 70+ fields

💡 Pro Tip: Combine the DataCite Metadata Scraper with the Hugging Face Model Scraper to cross-reference published datasets with ML models trained on them.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue. We typically respond within 24 hours.


Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by DataCite, Zenodo, Dryad, Figshare, or any data center. All trademarks mentioned are the property of their respective owners.