Pricing

Pay per event

Try for free

Go to Apify Store

Crossref DOI Metadata Scraper

Try for free

Export citation metadata for 155M+ DOIs from the Crossref Works API. Every published research paper, book chapter, conference proceeding, and dataset with a DOI. Search by query, filter by publisher, funder, type, or year range.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

3 hours ago

Last modified

📖 Crossref DOI Metadata Scraper

🚀 Extract citation metadata for 155M+ DOIs from Crossref in seconds. Search by query, filter by title, author, or DOI. No coding, no API keys required.

🕒 Last updated: 2026-04-16 · 📊 30+ fields · 📚 155M+ DOIs indexed · 🔍 Title, author, and free-text search

Crossref is the largest DOI registration agency, indexing over 155 million research papers, book chapters, conference proceedings, datasets, and preprints. This scraper connects to the Crossref Works API and returns structured citation metadata including titles, authors, publication dates, journals, DOIs, citation counts, abstracts, license information, and funding details. Whether you need metadata for a single DOI or want to search across the entire Crossref database, the scraper handles pagination and rate limiting automatically.

Researchers, librarians, and data analysts use this actor to build citation databases, verify publication records, analyze research trends, and enrich existing datasets with DOI metadata. Instead of querying the Crossref API manually and parsing JSON responses, you get clean, structured data exported as JSON, CSV, or Excel. Every record includes the full title, all authors with ORCID IDs when available, journal name, volume, issue, pages, publication date, license, funder information, and reference lists.

🎯 Target Audience	💡 Use Cases
Academic researchers	Build citation databases for literature reviews
University librarians	Verify and enrich publication records
Bibliometric analysts	Analyze citation patterns and research impact
Data scientists	Enrich datasets with DOI metadata
Publishers	Track citations and references across journals
Grant managers	Verify publication records from funded research

📋 What the Crossref Scraper does

🔍 Free-text search across titles, authors, and container titles in the 155M+ DOI database
📝 Title-specific search to find publications matching exact title keywords
👤 Author search to find all works by a specific researcher
🎯 Single DOI lookup to fetch full metadata for a specific publication
🔧 Filter strings to narrow results by type, date, ORCID, publisher, and more
📧 Polite pool access by providing an email for faster Crossref response times

The scraper queries the Crossref Works API, retrieves matching records, and extracts full citation metadata for each item. Results include the publication title, all authors (with ORCID IDs), journal or container title, volume, issue, pages, publication dates, DOI, license info, funder details, reference count, citation count, and direct links. Each record is timestamped and includes the content type (journal-article, book-chapter, etc.).

💡 Why it matters: Crossref's API returns complex nested JSON that requires parsing. This scraper flattens and normalizes the data, delivering clean records ready for spreadsheets, databases, or analysis tools. Add your email to get routed to Crossref's faster "polite pool."

🎬 Full Demo

🚧 Coming soon...

⚙️ Input

Field	Type	Required	Description
maxItems	integer	No	Max records to collect. Free: up to 10. Paid: up to 1,000,000
query	string	No	Free text search across titles, authors, and journals
queryTitle	string	No	Match only within publication titles
queryAuthor	string	No	Match by author name
filter	string	No	Crossref filter string (e.g., "type:journal-article,from-pub-date:2024")
doi	string	No	Fetch metadata for a single DOI (overrides query)
email	string	No	Your email for Crossref's faster "polite pool"

Example 1: Free-text search

{
  "query": "attention is all you need",
  "maxItems": 10
}

Example 2: Filtered author search with date range

{
  "queryAuthor": "Hinton, Geoffrey",
  "filter": "type:journal-article,from-pub-date:2020",
  "email": "your@email.com",
  "maxItems": 100
}

⚠️ Good to Know: Providing an email address routes your requests to Crossref's "polite pool," which has faster response times and higher rate limits. The filter field accepts Crossref filter syntax. See Crossref API docs for all available filter options.

📊 Output

🧾 Schema

Emoji	Field	Type	Description
📝	title	string	Full publication title
👥	authors	array	Author names with ORCID IDs when available
📅	publishedDate	string	Publication date
📖	containerTitle	string	Journal or book title
🆔	doi	string	Digital Object Identifier
🔗	url	string	Direct URL to the publication
📊	volume	string	Journal volume
📄	issue	string	Journal issue
📍	pages	string	Page range
🏢	publisher	string	Publisher name
🏷️	type	string	Content type (journal-article, book-chapter, etc.)
📊	citationCount	number	Number of times cited
📋	referenceCount	number	Number of references in the work
📄	abstract	string	Abstract text (when available)
🆔	issn	array	ISSN identifiers
🆔	isbn	array	ISBN identifiers
⚖️	license	array	License information and URLs
💰	funder	array	Funding organizations and grant numbers
🏷️	subject	array	Subject classifications
📅	depositedDate	string	Date deposited in Crossref
📅	indexedDate	string	Date indexed by Crossref
🔗	references	array	List of referenced DOIs
⏰	scrapedAt	string	Collection timestamp
⚠️	error	string	Error message if processing failed

📦 Sample records

✨ Why choose this Actor

Feature	Details
📚 155M+ DOIs	Access the full Crossref database of research publications
🔍 Multi-field search	Query by title, author, free text, or specific DOI
📊 Citation counts	Track how many times each work has been cited
💰 Funding data	Identify funders and grant numbers for each publication
🆔 ORCID support	Author identifiers included when available
⚖️ License info	Know the access rights for each publication
📧 Polite pool	Faster responses when you provide an email address

📊 Search across 155M+ DOIs and collect up to 1,000,000 records per run with full citation metadata.

📈 How it compares to alternatives

Feature	This Actor	Manual API Calls	Generic Scrapers
Automatic pagination	✅	Manual	❌
Polite pool routing	✅	Manual	❌
Citation count included	✅	✅	❌
Funding data extraction	✅	✅	❌
Structured JSON/CSV output	✅	JSON only	Varies
Bulk collection (1M+ records)	✅	Manual	❌
Scheduled recurring runs	✅	❌	❌

Get structured citation metadata at scale without writing API code or managing pagination.

🚀 How to use

Create an Apify account - Sign up free with $5 credit
Open the Crossref DOI Metadata Scraper - Navigate to the actor page on Apify
Enter your search query - Type keywords, an author name, or a specific DOI
Add optional filters - Set date range, publication type, or provide your email for faster responses
Click Start - The actor collects matching records and delivers structured citation data

⏱️ A typical run with 10 records completes in under 30 seconds.

💼 Business use cases

🎓 Academic Research Build citation databases for systematic reviews Track publication records for tenure evaluations Analyze citation patterns across research fields Verify DOI metadata for bibliographies	📊 Bibliometric Analysis Measure research impact by citation count Map collaboration networks through co-authorship Track publication trends by subject area Compare publisher output across disciplines
📚 Library Services Enrich catalog records with DOI metadata Verify publication details for acquisitions Build subject-specific reference collections Track open access availability by license type	💰 Research Funding Verify publication records of grant applicants Track outputs from funded research programs Identify high-impact journals for publication strategies Monitor open access compliance by funder

🔌 Automating Crossref Scraper

Integrate the Crossref Scraper into your workflow using the Apify API or client libraries.

Node.js:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor("parseforge/crossref-scraper").call({
  query: "attention is all you need",
  maxItems: 50,
  email: "your@email.com"
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python:

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("parseforge/crossref-scraper").call(run_input={
    "query": "attention is all you need",
    "maxItems": 50,
    "email": "your@email.com"
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(items)

Schedules: Set up recurring runs to track new publications matching your query, monitor citation count changes, or build growing bibliometric datasets. Configure daily, weekly, or monthly schedules from the Apify Console.

❓ Frequently Asked Questions

🔌 Integrate with any app

🔗 Make (Integromat) - Connect citation data to Google Sheets, Notion, or any of 1,500+ apps
🔗 Zapier - Trigger workflows when new citation records are collected
🔗 Slack - Get notified when a Crossref data run completes
🔗 Airbyte - Stream citation metadata into your data warehouse
🔗 GitHub - Store citation datasets in repositories for version control
🔗 Google Drive - Automatically save CSV exports to shared folders

🔗 Recommended Actors

Actor	Description
PubMed Citation Scraper	Extract publication metadata from PubMed for biomedical research
OpenCitations Scraper	Collect citation networks and bibliographic metadata
Open Library Scraper	Search and download book data from the Internet Archive
NASA Reports Scraper	Collect technical reports from NASA's NTRS database
ROR Scraper	Collect research organization data from the Research Organization Registry

💡 Pro Tip: Combine the Crossref Scraper with the PubMed Scraper to get both citation metadata and full biomedical abstracts for the same publications.

🆘 Need Help? Open our contact form and we will get back to you within 24 hours. We are happy to help with custom setups, integrations, or feature requests.

Disclaimer: This actor is not affiliated with, endorsed by, or connected to Crossref. It accesses publicly available data through the Crossref Works API. Use responsibly and in accordance with Crossref's Metadata Terms of Use.

Crossref Scraper

openclawmara/crossref-scraper

OpenClaw Mara

Crossref Academic Paper Search

ryanclinton/crossref-paper-search

Search over 150 million scholarly works indexed by Crossref -- the largest open registry of DOI metadata in the world. Retrieve structured publication data including titles, authors with ORCID identifiers, citation counts, journal names, funding information, abstracts, and more. No API key required.

ryan clinton

Crossref Academic Citation Scraper

cloud9_ai/crossref-scraper

Search and extract scholarly publication metadata from Crossref. Get DOIs, citations, authors, journals for 140M+ works.

cloud9

Crossref Scraper

automation-lab/crossref-scraper

Search and extract academic paper metadata from Crossref — titles, authors, DOIs, citations, abstracts, and journal details. Process thousands of scholarly articles in a single run. Export to JSON, CSV, or Excel for literature reviews and citation analysis.

Stas Persiianenko

CrossRef Academic Metadata Scraper

fortuitous_pirate/crossref-scraper

Search CrossRef for academic paper metadata. Get DOIs, authors, journals, citations, and publication dates. Essential for research and bibliography building.

Fortuitous Pirate

Academic Paper Scraper

labrat011/academic-paper-scraper

Search MILLIONS of academic papers from Semantic Scholar and arXiv by keyword, DOI, or citation graph. Returns titles, authors, abstracts, citation counts, and open access PDFs as clean JSON. Works as an MCP tool for AI agents.

mick_

Book Metadata Scraper

datapilot/book-metadata-scraper

Book Metadata Scraper uses the Open Library API to collect detailed book data by query. It extracts title, author, ISBN, publisher, publish year, pages, categories, ratings, description, cover image, and preview link. Outputs structured JSON for catalogs, apps, and research use.

Data Pilot

ArXiv Research Paper Scraper

datapilot/arxiv-research-paper-scraper

arXiv Research Paper Scraper retrieves academic paper metadata from the arXiv API based on a keyword. It extracts titles, abstracts, authors with affiliations, DOI, categories, submission dates, and PDF links. Supports proxy usage and outputs structured JSON results for research and data analysis.

Data Pilot

Semantic Scholar Paper Scraper

agenscrape/semantic-scholar-paper-scraper

Scrape academic papers from Semantic Scholar. Search by keyword and extract paper titles, abstracts, authors, citation counts, publication dates, DOIs, open access PDFs... Perfect for literature reviews, citation analysis, and research databases. Real time data output with pagination support.

Agenscrape

arXiv Metadata Collector— Metadata, PDF, Authors & Abstract

scrapepilot/arxiv-metadata-collector---metadata-pdf-authors-abstract

Scrape arXiv research papers with metadata including title, authors, abstract, PDF links, DOI, and categories. Supports keyword search, proxy integration, and structured dataset output for AI, ML, and academic research use