DataCite Metadata Scraper
Pricing
Pay per event
DataCite Metadata Scraper
Comprehensive DataCite metadata scraper for extracting DOI metadata from DataCite API. Perfect for researchers, librarians, and data scientists who need automated access to scholarly publication metadata, research datasets, and digital object identifiers.
Pricing
Pay per event
Rating
5.0
(1)
Developer

ParseForge
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
🚀 Effortlessly collect comprehensive DOI metadata from DataCite with our advanced data collection tool.
Designed for researchers, librarians, data scientists, and academic professionals, this tool extracts detailed metadata from DataCite—the leading provider of Digital Object Identifiers (DOIs) for research publications, datasets, and scholarly works. Get critical information like publication details, creators, publishers, and more, all with no coding required.
Target Audience: Researchers, librarians, data scientists, academic professionals, publishers Primary Use Cases: Research metadata collection, bibliographic analysis, dataset discovery, publication tracking
What Does DataCite Metadata Scraper Do?
This tool collects comprehensive DOI metadata from DataCite API, supporting search queries, specific DOI lookups, and advanced filtering. It delivers:
- Complete publication metadata (titles, creators, publishers)
- Publication years and resource types
- Detailed creator and contributor information
- Subject classifications and descriptions
- Related identifiers and funding references
- Geographic locations and dates
- And more
Business Value: Build comprehensive research databases, track publication trends, discover datasets, and analyze scholarly output with structured, machine-readable metadata.
How to use the DataCite Metadata Scraper - Full Demo
[YouTube video embed or link]
Watch this 3-minute demo to see how easy it is to get started!
Input
To start DataCite metadata collection, simply fill in the input form. You can collect DOI metadata based on:
- Search Query - Enter keywords to find DOIs (e.g., "climate change", "machine learning")
- DOI - Retrieve metadata for a specific Digital Object Identifier (e.g., 10.5281/zenodo.1234567)
- Repository ID - Filter by repository (e.g., zenodo, dryad)
- Publisher - Filter by publisher name
- Resource Type - Filter by type (e.g., Dataset, Software, Article)
- Year - Filter by publication year (e.g., 2023)
- Sort - Sort results (e.g., created, -created, updated, -updated)
- Max Items - Set the maximum number of DOIs to collect. Free users: Limited to 100. Paid users: Optional, max 1,000,000. Leave empty for unlimited (paid users only).
Here's what the input configuration looks like in JSON:
{"query": "climate","maxItems": 10}
Pro Tip: Combine multiple filters (query + repository + year) for more targeted results.
Output
After the Actor finishes its run, you'll get a dataset with the output. The length of the dataset depends on the amount of results you've set. You can download those results as an Excel, HTML, XML, JSON, and CSV document.
Here's an example of scraped DataCite metadata you'll get if you decide to search for "climate":
{"doi": "10.25810/vc6g-1m45","doiUrl": "https://doi.org/10.25810/vc6g-1m45","title": "Media and Climate Change Observatory Monthly Summary: Moral failure and deadly negligence - Issue 107, November 2025","creators": [{"name": "Boykoff, Max","nameType": "Personal","givenName": "Max","familyName": "Boykoff","nameIdentifiers": [],"affiliation": []}],"publisher": "Center for Science and Technology Policy Research, Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder","publicationYear": 2025,"resourceType": "Report","resourceTypeGeneral": "Text","url": "https://scholar.colorado.edu/concern/articles/7w62fb11r","created": "2025-12-12T15:54:51Z","updated": "2025-12-12T15:54:51Z","subjects": [{"subject": "Climate Change","subjectScheme": null,"valueUri": null,"lang": null}],"descriptions": [{"description": "November media coverage of climate change or global warming in newspapers around the globe rose 23% from October 2025.","descriptionType": "Abstract","lang": null}],"scrapedTimestamp": "2025-12-12T16:06:25.086Z"}
What You Get: Complete DOI metadata including titles, creators, publishers, publication years, resource types, subjects, descriptions, and comprehensive bibliographic information Download Options: CSV, Excel, or JSON formats for easy analysis in your research tools
Why Choose the DataCite Metadata Scraper?
- Comprehensive Metadata: Extract all available DOI metadata fields in a single run
- Advanced Filtering: Use multiple filters (repository, publisher, resource type, year) to target specific research areas
- Real-Time Data: Access the latest DOI metadata directly from DataCite API
- Scalable Collection: Process from 10 to 1,000,000+ DOIs efficiently
- User-Friendly: No coding needed—just configure filters and go
- Structured Output: Get machine-readable JSON data ready for analysis
Time Savings: Save hours of manual research compared to browsing individual DOI pages Efficiency: Collect thousands of DOIs in minutes instead of days
How to Use
- Sign Up: Create a free account w/ $5 credit (takes 2 minutes)
- Find the Scraper: Visit the DataCite Metadata Scraper page
- Set Input: Add your search query, filters, and max items
- Run It: Click "Start" and let it collect your data
- Download Data: Get your results in the "Dataset" tab as CSV, Excel, or JSON
Total Time: 5 minutes setup, 10-30 minutes for data collection No Technical Skills Required: Everything is point-and-click
Business Use Cases
Researchers:
- Build comprehensive bibliographic databases
- Track publication trends in your field
- Discover related research datasets
- Analyze scholarly output patterns
Librarians:
- Catalog DOI metadata for institutional repositories
- Monitor new publications in specific areas
- Build research resource collections
- Support reference services
Data Scientists:
- Discover research datasets for analysis
- Build training datasets from DOI metadata
- Analyze publication patterns and trends
- Extract structured data for machine learning
Publishers:
- Monitor competitor publications
- Track publication metrics
- Analyze market trends
- Support editorial decisions
Academic Professionals:
- Track research output
- Build publication databases
- Analyze citation networks
- Support grant applications
Using DataCite Metadata Scraper with the Apify API
For advanced users who want to automate this process, you can control the scraper programmatically with the Apify API. This allows you to schedule regular data collection and integrate with your existing research tools.
- Node.js: Install the apify-client NPM package
- Python: Use the apify-client PyPI package
- See the Apify API reference for full details
Frequently Asked Questions
Q: How does it work? A: DataCite Metadata Scraper is easy to use and requires no technical knowledge. Simply configure your search parameters and let the tool collect the data automatically from DataCite's public API.
Q: How accurate is the data? A: We collect data directly from DataCite's official API in real-time, ensuring the most up-to-date and accurate metadata available.
Q: Can I schedule regular runs? A: Yes! Use the Apify API to schedule daily, weekly, or monthly runs automatically. Perfect for ongoing research monitoring.
Q: What if I need help? A: Our support team is available 24/7. Contact us through the Apify platform.
Q: Is my data secure? A: Absolutely. All data is encrypted in transit and at rest. We never share your data with third parties.
Q: Can I filter by specific repositories? A: Yes! Use the Repository ID filter to target specific repositories like Zenodo, Dryad, or any other DataCite member repository.
Q: What resource types can I filter for? A: You can filter by any resource type including Dataset, Software, Article, Report, and more. Check DataCite's documentation for the complete list.
Q: Can I get metadata for a specific DOI? A: Yes! Simply enter the DOI in the DOI field (e.g., 10.5281/zenodo.1234567) and the scraper will retrieve metadata for that specific DOI.
Integrate DataCite Metadata Scraper with any app and automate your workflow
Last but not least, DataCite Metadata Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform.
These includes:
Alternatively, you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever DataCite Metadata Scraper successfully finishes a run.
🔗 Recommended Actors
Looking for more data collection tools? Check out these related actors:
| Actor | Description | Link |
|---|---|---|
| PR Newswire Scraper | Collects press releases and news content from PR Newswire | https://apify.com/parseforge/pr-newswire-scraper |
| Hugging Face Model Scraper | Extracts AI model information and metadata from Hugging Face | https://apify.com/parseforge/hugging-face-model-scraper |
| OpenCitations Scraper | Collects citation and reference data from OpenCitations API | https://apify.com/parseforge/open-citations-scraper |
| PubMed Scraper | Extracts biomedical literature metadata from PubMed | https://apify.com/parseforge/pubmed-scraper |
| ArXiv Scraper | Collects research paper metadata from arXiv preprint server | https://apify.com/parseforge/arxiv-scraper |
Pro Tip: 💡 Browse our complete collection of data collection actors to find the perfect tool for your business needs.
Need Help? Our support team is here to help you get the most out of this tool.
⚠️ Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by DataCite or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.