DataCite Metadata Scraper
Pricing
Pay per event
DataCite Metadata Scraper
Comprehensive DataCite metadata scraper for extracting DOI metadata from DataCite API. Perfect for researchers, librarians, and data scientists who need automated access to scholarly publication metadata, research datasets, and digital object identifiers.
Pricing
Pay per event
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share

📚 DataCite Metadata Scraper
🚀 Collect scholarly DOI metadata and research dataset records in seconds. Filter by keyword, repository, publisher, resource type, and year. No coding, no DataCite account required.
🕒 Last updated: 2026-04-16 · 📊 10 fields · 📖 Millions of DOI records · 🔬 Academic and research data
The DataCite Metadata Scraper retrieves Digital Object Identifier (DOI) metadata from the DataCite registry, which indexes over 45 million DOIs across academic publications, research datasets, software, and other scholarly outputs. Each record includes the DOI, title, publisher, publication year, resource type, creation date, update date, and a resolvable URL. You can filter by keyword, specific DOI, repository (Zenodo, Dryad, Figshare, Dataverse), publisher, resource type, and publication year. Free users can collect up to 10 records per run, while paid users can retrieve up to 1,000,000.
Whether you are building a literature database for a systematic review, analyzing publication trends across institutions, tracking open data availability in your research field, or monitoring repository output over time, this tool replaces hours of manual DOI lookups with a single automated query. Results export to JSON, CSV, or Excel for immediate use in citation managers, bibliometric tools, or data analysis pipelines. The scraper handles pagination and rate limiting automatically, letting you focus on research instead of data collection.
| Target Audience | Use Cases |
|---|---|
| Academic Researchers | Build literature databases and track publications in specific fields |
| Research Librarians | Catalog DOI records and monitor repository output |
| Data Scientists | Analyze publication trends and research metadata at scale |
| Institutional Analysts | Track publication volume and output across departments |
| Science Policy Analysts | Study open data availability and repository growth |
| Bibliometric Researchers | Collect DOI metadata for citation and impact analysis |
📋 What the DataCite Metadata Scraper does
- 📚 DOI records - retrieve the full Digital Object Identifier for each scholarly output, ready for citation or resolution
- 🏷️ Titles - extract publication or dataset titles for cataloging and search
- 📰 Publishers - capture the organization or institution that registered the DOI
- 📅 Publication years - filter and sort by year to focus on recent research or historical trends
- 🗂️ Resource types - classify records as datasets, articles, software, images, or other scholarly object types
- 🔗 Resolvable URLs - get working DOI links that resolve to the full publication or dataset landing page
The scraper queries the DataCite REST API and iterates through paginated results using your specified filters. Each record is normalized with consistent field names and pushed to an Apify dataset in real time. You can look up a single DOI or search across the entire DataCite registry with keyword and faceted filters.
💡 Why it matters: DataCite indexes DOIs from over 2,000 data centers worldwide. Manually searching and downloading metadata is tedious. This scraper gives you structured, filterable access to the registry in minutes.
🎬 Full Demo
🚧 Coming soon...
⚙️ Input
| Field | Type | Required | Description |
|---|---|---|---|
maxItems | integer | No | Maximum records to collect. Free: 10. Paid: up to 1,000,000. |
query | string | No | Search term to find DOIs (e.g., "climate change", "machine learning"). |
doi | string | No | Specific DOI to retrieve (e.g., 10.5281/zenodo.1234567). Returns only this record. |
repositoryId | string | No | Filter by repository identifier (e.g., Zenodo, Dryad, Figshare). |
publisher | string | No | Filter by publisher name. |
resourceType | string | No | Filter by type: Dataset, Article, Software, Image, etc. |
year | integer | No | Filter by publication year (4-digit, e.g., 2023). |
sort | string | No | Sort order: by creation date, update date, or publication year. |
Example 1: Climate research datasets
{"query": "climate","maxItems": 50,"resourceType": "Dataset","year": 2023,"sort": "-created"}
Example 2: Look up a specific DOI
{"doi": "10.5281/zenodo.1234567","maxItems": 1}
⚠️ Good to Know: Free users are automatically limited to 10 items per run. When a specific DOI is provided, only that single record is returned. Leave the query field empty to browse all records with other filters applied.
📊 Output
🧾 Schema
| Emoji | Field | Type | Description |
|---|---|---|---|
| 📚 | doi | string | Digital Object Identifier for the record |
| 🔗 | doiUrl | string | Resolvable URL (https://doi.org/...) |
| 🏷️ | title | string | Title of the publication or dataset |
| 📰 | publisher | string | Organization that registered the DOI |
| 📅 | publicationYear | integer | Year of publication |
| 🗂️ | resourceType | string | Specific resource type (e.g., "Dataset") |
| 📊 | resourceTypeGeneral | string | General resource category |
| 🕐 | createdDate | string | Date the DOI was created in the registry |
| 🔄 | updatedDate | string | Date the record was last updated |
| ⚠️ | error | string | Error message if processing failed |
📦 Sample records
✨ Why choose this Actor
| Feature | This Actor | Alternatives |
|---|---|---|
| Repository-specific filtering (Zenodo, Dryad, Figshare) | Yes | No |
| Resource type filtering (dataset, article, software) | Yes | Limited |
| Publication year filtering | Yes | Yes |
| Publisher filtering | Yes | Rarely available |
| Single DOI lookup mode | Yes | Yes |
| Up to 1,000,000 records per run | Yes | Capped lower |
| Export to JSON, CSV, and Excel | Yes | JSON only |
📊 DataCite indexes over 45 million DOIs from 2,000+ data centers. This scraper lets you query the entire registry with keyword and faceted filters in a single run.
📈 How it compares to alternatives
| Capability | This Actor | Manual DOI Lookups | Generic API Scripts |
|---|---|---|---|
| Bulk metadata retrieval | Yes | One at a time | Requires coding |
| Faceted filtering (type, year, publisher, repo) | Yes | Limited | Manual implementation |
| Automatic pagination and rate limiting | Yes | N/A | Manual implementation |
| Scheduled recurring runs | Yes | No | Requires infrastructure |
| No coding required | Yes | Yes | No |
| Export to CSV, Excel, JSON | Yes | No | JSON only |
This scraper wraps the DataCite API with a user-friendly interface, automatic pagination, and built-in export options.
🚀 How to use
- Sign up - Create a free Apify account with $5 credit
- Find the Actor - Search for "DataCite Metadata Scraper" in the Apify Store
- Set your search criteria - Enter keywords, resource type, year, or a specific DOI
- Start the run - Click "Start" and watch results appear in real time
- Export your data - Download as JSON, CSV, or Excel from the dataset tab
🕒 Typical run time: 15 to 60 seconds for up to 100 records. Larger runs with 1,000+ records may take a few minutes depending on the query scope.
💼 Business use cases
|
Academic Research
|
Library and Information Science
|
|
Institutional Analytics
|
Science Policy
|
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
💰 How much does it cost?
Apify gives you $5 in free monthly credits on the Apify Free plan, enough to test DataCite Metadata Scraper and pull a real sample dataset. For ongoing usage:
- Starter plan ($49/month) — Recommended for individuals running DataCite Metadata Scraper regularly. Includes higher concurrency and larger datasets.
- Scale plan ($499/month) — Recommended for teams running DataCite Metadata Scraper at production scale.
Pay-Per-Event pricing means you only pay for what you actually use. Failed runs are never charged. See the Pricing tab on this Actor's page for exact event prices.
💡 Tips for using DataCite Metadata Scraper
- Start with a small
maxItems(3-10) to validate output format before running larger jobs. - Use Apify Schedules to run DataCite Metadata Scraper on a recurring basis and keep your dataset fresh.
- Export via Integrations: Apify connects to Google Sheets, Airbyte, Make, Zapier, and direct webhooks — pipe your data anywhere.
- Monitor with webhooks: trigger downstream workflows the moment a run finishes.
- Re-run failed items: if any individual records error out, re-run with their inputs only. Failed events are not charged.
⚖️ Is it legal to use DataCite Metadata Scraper?
Yes. DataCite Metadata Scraper only collects publicly available data. Web scraping public data has been confirmed as legal by US courts (see hiQ Labs v. LinkedIn) and is widely used for research, market analysis, and business intelligence.
However, you are responsible for:
- Respecting the source website's Terms of Service.
- Complying with GDPR, CCPA, and other applicable data-protection laws when personal data is involved.
- Not republishing copyrighted content without permission.
If you have specific compliance concerns, consult your legal team. See the Apify legal docs for more.
❓ Frequently Asked Questions
🔌 Automating DataCite Metadata Scraper
Node.js example:
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('parseforge/datacite-metadata-scraper').call({query: 'climate change',maxItems: 100,resourceType: 'Dataset'});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python example:
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('parseforge/datacite-metadata-scraper').call(run_input={'query': 'climate change','maxItems': 100,'resourceType': 'Dataset'})items = list(client.dataset(run['defaultDatasetId']).iterate_items())print(items)
Schedules: Set up weekly or monthly runs to track new DOI registrations in your field. Combine with Google Sheets or Slack integrations to get notified when new records match your query.
🔌 Integrate with any app
- Make - Automate DOI metadata workflows and sync with research databases
- Zapier - Connect to 5,000+ apps and trigger actions on new DOI records
- Slack - Get notifications when new publications match your query
- Airbyte - Stream DOI metadata into your data warehouse
- GitHub - Version control your scraper configurations
- Google Drive - Export results directly to Google Sheets
🔗 Recommended Actors
| Actor | Description |
|---|---|
| Hugging Face Model Scraper | Collect model metadata and download stats from Hugging Face |
| PR Newswire Scraper | Collect press releases and research announcements |
| GSA eLibrary Scraper | Collect government contractor and vendor data |
| Greatschools Scraper | Extract school ratings and performance data |
| Smart Apify Actor Scraper | Scrape Apify actor metadata with 70+ fields |
💡 Pro Tip: Combine the DataCite Metadata Scraper with the Hugging Face Model Scraper to cross-reference published datasets with ML models trained on them.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue. We typically respond within 24 hours.
Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by DataCite, Zenodo, Dryad, Figshare, or any data center. All trademarks mentioned are the property of their respective owners.