DataCite Metadata Scraper
Pricing
Pay per event
DataCite Metadata Scraper
Comprehensive DataCite metadata scraper for extracting DOI metadata from DataCite API. Perfect for researchers, librarians, and data scientists who need automated access to scholarly publication metadata, research datasets, and digital object identifiers.
Pricing
Pay per event
Rating
0.0
(0)
Developer

ParseForge
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share

π DataCite Metadata Scraper
Collect scholarly publication metadata and research dataset information in seconds, without coding. This tool retrieves DOI metadata including titles, publishers, resource types, and publication years. Perfect for researchers building literature databases, librarians tracking open datasets, data scientists analyzing research trends, and institutions monitoring publication patterns across their repositories. Automate what would take hours of manual lookup with simple keyword or repository filtering. Works like a DataCite alternative that gives you direct access to millions of scholarly records.
The DataCite Metadata Scraper retrieves up to 1,000,000 DOI records per run with flexible filtering by query, repository, publisher, resource type, and year.
β¨ What Does It Do
- π DOI - Download complete Digital Object Identifier records for academic articles, datasets, and research outputs
- π·οΈ Title - Extract publication titles to build searchable metadata catalogs or track specific research areas
- π° Publisher - Capture publisher information to analyze publication patterns across different organizations
- π Publication Year - Filter by when research was published to focus on recent findings or historical trends
- ποΈ Resource Type - Identify whether records are datasets, articles, software, images, or scholarly objects
- π DOI URL - Get direct links to resolve each record and access full publication information online
- β° Created and Updated Dates - Track when records were created and last modified in the system
π§ Input
- Search Query - Keyword to find matching DOIs, such as climate change, machine learning, or genomics
- Specific DOI - Retrieve metadata for a single Digital Object Identifier if you know the exact identifier
- Repository Filter - Narrow results to specific sources like Zenodo, Dryad, Figshare, or Dataverse
- Publisher - Filter results by publisher name to track output from specific organizations
- Resource Type - Limit results to datasets, articles, software, images, or other specific formats
- Publication Year - Filter by 4-digit year to focus on research from specific time periods
- Max Items - Free users get up to 100 results, paid users can collect up to 1,000,000 per run
- Sort Order - Arrange results by creation date, update date, or publication year
Example input:
{"query": "climate","maxItems": 50,"resourceType": "Dataset","year": 2023,"sort": "-created"}
π Output
Each DOI record includes up to 10 data fields. Download as JSON, CSV, or Excel.
| π DOI | π DOI URL | π·οΈ Title |
|---|---|---|
| π° Publisher | π Publication Year | ποΈ Resource Type |
| π Resource Type General | π Created Date | π Updated Date |
| β±οΈ Scraped Timestamp |
π Why Choose the DataCite Metadata Scraper?
| Feature | DataCite Scraper | Similar Tools |
|---|---|---|
| Search by keyword, DOI, repository, or publisher | βοΈ | β |
| Filter by resource type (Dataset, Article, Software) | βοΈ | Partial |
| Filter by publication year | βοΈ | β |
| Public access without authentication required | βοΈ | β |
| Supports up to 1,000,000 results per run | βοΈ | β |
| Sort by creation, update, or publication date | βοΈ | β |
| Automatic duplicate detection and removal | βοΈ | Partial |
| Real-time progress tracking | βοΈ | β |
| Export to JSON, CSV, Excel | βοΈ | βοΈ |
| Free tier 100 results, paid unlimited to 1M | βοΈ | Partial |
| Supports multiple repository filtering | βοΈ | β |
| Simple point-and-click interface | βοΈ | βοΈ |
π How to Use
No technical skills required. Follow these simple steps:
- Sign Up: Create a free account with $5 credit
- Find the Tool: Search for "DataCite Metadata Scraper" in the Apify Store and configure your input
- Run It: Click "Start" and watch your results appear
That's it. No coding, no setup, no complicated configuration. Now you can export your data in CSV, Excel, or JSON format.
π― Business Use Cases
- π Academic Researcher - Search for peer-reviewed articles about neural networks published since 2020 to build a comprehensive literature review in minutes instead of days
- π’ University Librarian - Monitor research output from your institution by filtering Zenodo and Dataverse repositories to track open-access datasets being shared by faculty
- π¬ Data Scientist - Collect all publicly available datasets related to climate modeling from multiple repositories to identify training data for machine learning projects
β FAQ
π How does it work? The scraper retrieves DOI metadata based on your search criteria and automatically deduplicates results to give you clean data.
π How accurate is the data? All metadata comes directly from DataCite, which is maintained by the DataCite Foundation and used by major repositories like Zenodo and Figshare. Data accuracy depends on how repositories report their information.
π Can I schedule automatic runs? Yes. Once you've set up your input, you can schedule the actor to run daily, weekly, or monthly to monitor new publications and datasets matching your criteria.
βοΈ Is this legal? Yes. DataCite metadata is public information maintained by the DataCite Foundation. You are responsible for complying with each repository's terms of service and any applicable local laws when using the data.
π‘οΈ Will DataCite block me? Unlikely. The DataCite service is designed for public access. The scraper respects rate limits and uses standard requests. However, if you're planning very large-scale runs, contact DataCite support first.
β‘ How long does a run take? Typical runs of 100-500 items complete in 10-60 seconds depending on your internet connection and the complexity of your filters. Larger runs processing 10,000+ items may take 5-15 minutes.
β οΈ Are there any limits? Free users can collect up to 100 results per run. Paid users can collect up to 1,000,000 results per run.
π Integrate DataCite Metadata Scraper with any app
- Make - Automate workflows
- Zapier - Connect 5000+ apps
- GitHub - Version control integration
- Slack - Get notifications
- Airbyte - Data pipelines
- Google Drive - Export to spreadsheets
π‘ More ParseForge Actors
- Unpaywall Scraper - Discover open access research articles with powerful searching and filtering
- Crossref Scraper - Transform scholarly data collection with titles, authors, abstracts, and citations
- PLOS Journals Scraper - Extract article data from PLOS ONE and peer-reviewed content
- OpenAlex Scraper - Optimize academic research with comprehensive publication data and citation metrics
Browse our complete collection of data extraction tools for more.
π Ready to Start?
Create a free account with $5 credit and collect your first 100 results for free. No coding, no setup.
π Need Help?
- Check the FAQ section above for common questions
- Visit the Apify support page for documentation and tutorials
- Contact us to request a new scraper, propose a custom project, or report an issue at Tally contact form
β οΈ Disclaimer
This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by DataCite or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.