Pricing

Pay per event

DataCite Metadata Scraper

Comprehensive DataCite metadata scraper for extracting DOI metadata from DataCite API. Perfect for researchers, librarians, and data scientists who need automated access to scholarly publication metadata, research datasets, and digital object identifiers.

Pricing

Pay per event

Rating

0.0

(0)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

📚 DataCite Metadata Scraper

🚀 Collect scholarly DOI metadata and research dataset records in seconds. Filter by keyword, repository, publisher, resource type, and year. No coding, no DataCite account required.

Pull structured records from DataCite Metadata — clean fields ready as CSV, JSON, JSONL, Excel, or XML for downstream pipelines.

Copy to your AI assistant

Copy this block into ChatGPT, Claude, Cursor, or any LLM to start using this actor.

parseforge/datacite-metadata-scraper on Apify. Call: ApifyClient("TOKEN").actor("parseforge/datacite-metadata-scraper").call(run_input={...}), then client.dataset(run["defaultDatasetId"]).list_items().items for results. Key inputs: maxItems (integer, default 10), query (string, default "climate"), doi (string), repositoryId (string), publisher (string), resourceType (string). Full actor spec: fetch build via GET https://api.apify.com/v2/acts/parseforge~datacite-metadata-scraper (Bearer TOKEN). Get token: https://console.apify.com/account/integrations

The DataCite Metadata Scraper retrieves Digital Object Identifier (DOI) metadata from the DataCite registry, which indexes over 45 million DOIs across academic publications, research datasets, software, and other scholarly outputs. Each record includes the DOI, title, publisher, publication year, resource type, creation date, update date, and a resolvable URL. You can filter by keyword, specific DOI, repository (Zenodo, Dryad, Figshare, Dataverse), publisher, resource type, and publication year. Free users can collect up to 10 records per run, while paid users can retrieve up to 1,000,000.

Whether you are building a literature database for a systematic review, analyzing publication trends across institutions, tracking open data availability in your research field, or monitoring repository output over time, this tool replaces hours of manual DOI lookups with a single automated query. Results export to JSON, CSV, or Excel for immediate use in citation managers, bibliometric tools, or data analysis pipelines. The scraper handles pagination and rate limiting automatically, letting you focus on research instead of data collection.

Target Audience	Use Cases
Academic Researchers	Build literature databases and track publications in specific fields
Research Librarians	Catalog DOI records and monitor repository output
Data Scientists	Analyze publication trends and research metadata at scale
Institutional Analysts	Track publication volume and output across departments
Science Policy Analysts	Study open data availability and repository growth
Bibliometric Researchers	Collect DOI metadata for citation and impact analysis

📋 What the DataCite Metadata Scraper does

📚 DOI records - retrieve the full Digital Object Identifier for each scholarly output, ready for citation or resolution
🏷️ Titles - extract publication or dataset titles for cataloging and search
📰 Publishers - capture the organization or institution that registered the DOI
📅 Publication years - filter and sort by year to focus on recent research or historical trends
🗂️ Resource types - classify records as datasets, articles, software, images, or other scholarly object types
🔗 Resolvable URLs - get working DOI links that resolve to the full publication or dataset landing page

The scraper queries the DataCite REST API and iterates through paginated results using your specified filters. Each record is normalized with consistent field names and pushed to an Apify dataset in real time. You can look up a single DOI or search across the entire DataCite registry with keyword and faceted filters.

💡 Why it matters: DataCite indexes DOIs from over 2,000 data centers worldwide. Manually searching and downloading metadata is tedious. This scraper gives you structured, filterable access to the registry in minutes.

📊 Data fields

Each record includes: container, contributors, created, creators, dates, descriptions, doi, doiUrl, formats, fundingReferences, geoLocations, language, publicationYear, publisher, registered, relatedIdentifiers, resourceType, resourceTypeGeneral, rightsList, schemaVersion, scrapedTimestamp, sizes, subjects, title, updated, url, version. All 27 field names come from a real production run, so what you see here is what lands in your dataset.

⚠️ Good to Know: Free users are automatically limited to 10 items per run. When a specific DOI is provided, only that single record is returned. Leave the query field empty to browse all records with other filters applied.

🚀 How to use

Sign up - Create a free Apify account with $5 credit
Find the Actor - Search for "DataCite Metadata Scraper" in the Apify Store
Set your search criteria - Enter keywords, resource type, year, or a specific DOI
Start the run - Click "Start" and watch results appear in real time
Export your data - Download as JSON, CSV, or Excel from the dataset tab

🕒 Typical run time: 15 to 60 seconds for up to 100 records. Larger runs with 1,000+ records may take a few minutes depending on the query scope.

🔗 Recommended Actors

Actor	Description
Hugging Face Model Scraper	Collect model metadata and download stats from Hugging Face
PR Newswire Scraper	Collect press releases and research announcements
GSA eLibrary Scraper	Collect government contractor and vendor data
Greatschools Scraper	Extract school ratings and performance data
Smart Apify Actor Scraper	Scrape Apify actor metadata with 70+ fields

💡 Pro Tip: Combine the DataCite Metadata Scraper with the Hugging Face Model Scraper to cross-reference published datasets with ML models trained on them.

Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by DataCite, Zenodo, Dryad, Figshare, or any data center. All trademarks mentioned are the property of their respective owners.

🆘 Need Help?

If you hit a bug, have questions about setup, or need a scraper we haven't built yet, open our contact form or write to parseforge@protonmail.com. We also take on paid custom data projects.

For faster answers, join our Discord. It's the best place to get support and suggest new actors.

Datacite Scraper

velvety_bedbug/datacite-scraper

DataCite Research Dataset Scraper. Structured data export for lead generation, enrichment, and competitive research.

Peters Bugs

DataCite DOI Scraper - Research Metadata Search

benthepythondev/datacite-doi-scraper

Search DataCite DOI metadata for titles, creators, publishers, resource types, subjects, abstracts, dates and canonical DOI links.

Ben

Open Citations Scraper

parseforge/open-citations-scraper

Comprehensive OpenCitations scraper for extracting citation and reference data from OpenCitations API. Perfect for researchers, academics, and data scientists who need automated access to citation networks, bibliographic metadata, and citation analysis data.

ParseForge

Crossref Scholarly Metadata Scraper

scrapers_lat/crossref-scraper

Scrape scholarly works with DOI, title, type, publisher, journal, publication year, authors and citation count. Search by keyword. Export to JSON, CSV or Excel.

Scrapers Lat

CrossRef Scraper - Academic DOI & Metadata Extractor

klondikeking/crossref-academic-scraper

Extract academic paper metadata, DOIs, authors, citations, and abstracts from CrossRef via the public REST API. No scraping needed - fast, reliable, and cost-effective for researchers and data scientists.

Pierrick McD0nald

Url Metadata Extractor

agiliton/url-metadata-extractor

Christian Gick

Crossref DOI Metadata Scraper

enfex/crossref-doi-metadata-scraper

Scrape bounded Crossref scholarly-work metadata without contributor, abstract, or reference fields.

Marcel K

TikTok Metadata Scraper

scrapers-hub/tiktok-metadata-scraper

TikTok metadata scraper to extract publicly available video metadata, captions, hashtags, engagement metrics, creator details, posting dates, and other content insights 🎵📊 Perfect for social media analytics, trend tracking, market research, and competitor analysis.

Scrapers Hub

OpenAlex Scholarly Works Scraper

scrapers_lat/openalex-scraper

Scrape scholarly works with title, DOI, publication year, type, citation count, authors, venue, open access status and a direct link. Search by keyword. Export to JSON, CSV or Excel.

Scrapers Lat

Unpaywall Scraper

parseforge/unpaywall-scraper

Discover open access research articles with our powerful Unpaywall scraper! Search through millions of articles in the Unpaywall database to find free-to-read scholarly publications. Perfect for researchers, librarians, and academics who need to find and access open access articles efficiently.

ParseForge