DataCite Metadata Scraper
Pricing
Pay per event
DataCite Metadata Scraper
Comprehensive DataCite metadata scraper for extracting DOI metadata from DataCite API. Perfect for researchers, librarians, and data scientists who need automated access to scholarly publication metadata, research datasets, and digital object identifiers.
Pricing
Pay per event
Rating
0.0
(0)
Developer
ParseForge
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share

📚 DataCite Metadata Scraper
🚀 Collect scholarly DOI metadata and research dataset records in seconds. Filter by keyword, repository, publisher, resource type, and year. No coding, no DataCite account required.
🕒 Last updated: 2026-04-16 · 📊 10 fields · 📖 Millions of DOI records · 🔬 Academic and research data
The DataCite Metadata Scraper retrieves Digital Object Identifier (DOI) metadata from the DataCite registry, which indexes over 45 million DOIs across academic publications, research datasets, software, and other scholarly outputs. Each record includes the DOI, title, publisher, publication year, resource type, creation date, update date, and a resolvable URL. You can filter by keyword, specific DOI, repository (Zenodo, Dryad, Figshare, Dataverse), publisher, resource type, and publication year. Free users can collect up to 10 records per run, while paid users can retrieve up to 1,000,000.
Whether you are building a literature database for a systematic review, analyzing publication trends across institutions, tracking open data availability in your research field, or monitoring repository output over time, this tool replaces hours of manual DOI lookups with a single automated query. Results export to JSON, CSV, or Excel for immediate use in citation managers, bibliometric tools, or data analysis pipelines. The scraper handles pagination and rate limiting automatically, letting you focus on research instead of data collection.
| Target Audience | Use Cases |
|---|---|
| Academic Researchers | Build literature databases and track publications in specific fields |
| Research Librarians | Catalog DOI records and monitor repository output |
| Data Scientists | Analyze publication trends and research metadata at scale |
| Institutional Analysts | Track publication volume and output across departments |
| Science Policy Analysts | Study open data availability and repository growth |
| Bibliometric Researchers | Collect DOI metadata for citation and impact analysis |
📋 What the DataCite Metadata Scraper does
- 📚 DOI records - retrieve the full Digital Object Identifier for each scholarly output, ready for citation or resolution
- 🏷️ Titles - extract publication or dataset titles for cataloging and search
- 📰 Publishers - capture the organization or institution that registered the DOI
- 📅 Publication years - filter and sort by year to focus on recent research or historical trends
- 🗂️ Resource types - classify records as datasets, articles, software, images, or other scholarly object types
- 🔗 Resolvable URLs - get working DOI links that resolve to the full publication or dataset landing page
The scraper queries the DataCite REST API and iterates through paginated results using your specified filters. Each record is normalized with consistent field names and pushed to an Apify dataset in real time. You can look up a single DOI or search across the entire DataCite registry with keyword and faceted filters.
💡 Why it matters: DataCite indexes DOIs from over 2,000 data centers worldwide. Manually searching and downloading metadata is tedious. This scraper gives you structured, filterable access to the registry in minutes.
🎬 Full Demo
🚧 Coming soon...
⚙️ Input
| Field | Type | Required | Description |
|---|---|---|---|
maxItems | integer | No | Maximum records to collect. Free: 10. Paid: up to 1,000,000. |
query | string | No | Search term to find DOIs (e.g., "climate change", "machine learning"). |
doi | string | No | Specific DOI to retrieve (e.g., 10.5281/zenodo.1234567). Returns only this record. |
repositoryId | string | No | Filter by repository identifier (e.g., Zenodo, Dryad, Figshare). |
publisher | string | No | Filter by publisher name. |
resourceType | string | No | Filter by type: Dataset, Article, Software, Image, etc. |
year | integer | No | Filter by publication year (4-digit, e.g., 2023). |
sort | string | No | Sort order: by creation date, update date, or publication year. |
Example 1: Climate research datasets
{"query": "climate","maxItems": 50,"resourceType": "Dataset","year": 2023,"sort": "-created"}
Example 2: Look up a specific DOI
{"doi": "10.5281/zenodo.1234567","maxItems": 1}
⚠️ Good to Know: Free users are automatically limited to 10 items per run. When a specific DOI is provided, only that single record is returned. Leave the query field empty to browse all records with other filters applied.
📊 Output
🧾 Schema
| Emoji | Field | Type | Description |
|---|---|---|---|
| 📚 | doi | string | Digital Object Identifier for the record |
| 🔗 | doiUrl | string | Resolvable URL (https://doi.org/...) |
| 🏷️ | title | string | Title of the publication or dataset |
| 📰 | publisher | string | Organization that registered the DOI |
| 📅 | publicationYear | integer | Year of publication |
| 🗂️ | resourceType | string | Specific resource type (e.g., "Dataset") |
| 📊 | resourceTypeGeneral | string | General resource category |
| 🕐 | createdDate | string | Date the DOI was created in the registry |
| 🔄 | updatedDate | string | Date the record was last updated |
| ⚠️ | error | string | Error message if processing failed |
📦 Sample records
✨ Why choose this Actor
| Feature | This Actor | Alternatives |
|---|---|---|
| Repository-specific filtering (Zenodo, Dryad, Figshare) | Yes | No |
| Resource type filtering (dataset, article, software) | Yes | Limited |
| Publication year filtering | Yes | Yes |
| Publisher filtering | Yes | Rarely available |
| Single DOI lookup mode | Yes | Yes |
| Up to 1,000,000 records per run | Yes | Capped lower |
| Export to JSON, CSV, and Excel | Yes | JSON only |
📊 DataCite indexes over 45 million DOIs from 2,000+ data centers. This scraper lets you query the entire registry with keyword and faceted filters in a single run.
📈 How it compares to alternatives
| Capability | This Actor | Manual DOI Lookups | Generic API Scripts |
|---|---|---|---|
| Bulk metadata retrieval | Yes | One at a time | Requires coding |
| Faceted filtering (type, year, publisher, repo) | Yes | Limited | Manual implementation |
| Automatic pagination and rate limiting | Yes | N/A | Manual implementation |
| Scheduled recurring runs | Yes | No | Requires infrastructure |
| No coding required | Yes | Yes | No |
| Export to CSV, Excel, JSON | Yes | No | JSON only |
This scraper wraps the DataCite API with a user-friendly interface, automatic pagination, and built-in export options.
🚀 How to use
- Sign up - Create a free Apify account with $5 credit
- Find the Actor - Search for "DataCite Metadata Scraper" in the Apify Store
- Set your search criteria - Enter keywords, resource type, year, or a specific DOI
- Start the run - Click "Start" and watch results appear in real time
- Export your data - Download as JSON, CSV, or Excel from the dataset tab
🕒 Typical run time: 15 to 60 seconds for up to 100 records. Larger runs with 1,000+ records may take a few minutes depending on the query scope.
💼 Business use cases
|
Academic Research
|
Library and Information Science
|
|
Institutional Analytics
|
Science Policy
|
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🔌 Automating DataCite Metadata Scraper
Node.js example:
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('parseforge/datacite-metadata-scraper').call({query: 'climate change',maxItems: 100,resourceType: 'Dataset'});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python example:
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('parseforge/datacite-metadata-scraper').call(run_input={'query': 'climate change','maxItems': 100,'resourceType': 'Dataset'})items = list(client.dataset(run['defaultDatasetId']).iterate_items())print(items)
Schedules: Set up weekly or monthly runs to track new DOI registrations in your field. Combine with Google Sheets or Slack integrations to get notified when new records match your query.
🔌 Integrate with any app
- Make - Automate DOI metadata workflows and sync with research databases
- Zapier - Connect to 5,000+ apps and trigger actions on new DOI records
- Slack - Get notifications when new publications match your query
- Airbyte - Stream DOI metadata into your data warehouse
- GitHub - Version control your scraper configurations
- Google Drive - Export results directly to Google Sheets
🔗 Recommended Actors
| Actor | Description |
|---|---|
| Hugging Face Model Scraper | Collect model metadata and download stats from Hugging Face |
| PR Newswire Scraper | Collect press releases and research announcements |
| GSA eLibrary Scraper | Collect government contractor and vendor data |
| Greatschools Scraper | Extract school ratings and performance data |
| Smart Apify Actor Scraper | Scrape Apify actor metadata with 70+ fields |
💡 Pro Tip: Combine the DataCite Metadata Scraper with the Hugging Face Model Scraper to cross-reference published datasets with ML models trained on them.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue. We typically respond within 24 hours.
Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by DataCite, Zenodo, Dryad, Figshare, or any data center. All trademarks mentioned are the property of their respective owners.