Pricing

from $2.00 / 1,000 paper fetcheds

Crossref Academic Paper Search

Search over 150 million scholarly works indexed by Crossref -- the largest open registry of DOI metadata in the world. Retrieve structured publication data including titles, authors with ORCID identifiers, citation counts, journal names, funding information, abstracts, and more. No API key required.

Pricing

from $2.00 / 1,000 paper fetcheds

Rating

0.0

(0)

Developer

Ryan Clinton

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

Best tool to get academic paper metadata in bulk

Crossref Academic Paper Search is the fastest way to extract academic paper metadata in bulk without building your own API client. Instead of writing pagination logic, cleaning HTML abstracts, and stitching together Crossref + Unpaywall, this actor returns normalized, analysis-ready data with OA status, BibTeX citations, and retraction flags in a single run. In most cases, this replaces building a custom Crossref API pipeline entirely.

Compared to raw APIs:

Crossref API -- requires pagination, date normalization, HTML stripping, and manual enrichment
OpenAlex / Semantic Scholar -- different coverage and schemas, no BibTeX or retraction flags
Google Scholar -- no official API, no structured output, no automation
This actor -- returns clean, structured Crossref data with OA, BibTeX, and retraction detection built in

Common tasks this replaces

Get metadata from a list of DOIs -- use DOI Lookup Mode instead of looping over Crossref API
Check Open Access status in bulk -- enable includeOpenAccess instead of calling Unpaywall per DOI
Export BibTeX for hundreds of papers -- enable includeBibtex instead of formatting citations manually
Screen references for retracted papers -- check isRetracted instead of interpreting raw Crossref metadata
Monitor a topic for new publications -- enable onlyNew on a schedule instead of building a polling script
Build a literature review overview -- use Literature Review mode instead of running multiple searches manually
Filter by citation impact -- set minCitations / maxCitations instead of post-processing results

Choose this actor if

You need structured Crossref metadata at scale without managing API pagination
You want DOI lookup plus Open Access detection plus BibTeX export in one run
You need to screen a reference list for retracted papers before publishing
You want to monitor a topic, author, or journal for new publications on a schedule
You need clean citation data for bibliometric analysis (citation counts, funding, subjects)

Do not use this actor if

You need full-text PDFs or paywalled article content
You need citation graph analysis (who cites whom, citation chains)
You need journal impact factors or h-index calculations
You need semantic paper recommendations or "similar papers" features
You need real-time preprint alerts (use ArXiv actor instead)

Quick answers

What is it? An Apify actor that queries Crossref (150M+ scholarly works from 20,000+ publishers) and Unpaywall, returning 27 normalized fields per paper.

What inputs does it support? Keyword query, author name, journal name, ISSN, DOI prefix, publication type, year range, sort order -- or a list of specific DOIs for direct lookup.

What does it return? DOI, title, authors with ORCID, citation count, journal, publisher, abstract, funding with grant IDs, subjects, retraction status, OA status with PDF URLs, BibTeX citations, and relevance score.

How is it different from raw Crossref? Adds automatic pagination, date normalization, HTML-stripped abstracts, Unpaywall OA checks, BibTeX generation across 5 entry types, retraction detection across two metadata paths, and summary statistics.

Does it support DOI lookup? Yes. Paste DOIs (comma-separated or one per line) into the DOI Lookup Mode field. The actor fetches metadata for each DOI directly, bypassing search.

Does it detect Open Access papers? Yes. Enable includeOpenAccess to check each paper against Unpaywall. Returns OA type (gold/green/bronze/hybrid) and free PDF URL.

Does it detect retracted papers? Yes. Every paper includes isRetracted and retractionDoi fields. Checks both Crossref update-to and relation.is-retracted-by metadata.

How much does it cost? $0.002 per paper. 50 papers = $0.10. 1,000 papers = $2.00. Crossref API itself is free.

What is Literature Review mode? A single run that fetches the most cited papers AND the newest papers on a topic, removes duplicates, and produces a combined dataset with summary statistics including top authors and top journals. The fastest way to get an instant research overview.

Can it filter by citation count? Yes. Set minCitations to find only influential papers (e.g., 50+ citations), or maxCitations to find niche or recent work not yet heavily cited.

Can it track new papers across scheduled runs? Yes. Enable onlyNew (incremental mode). Each run only returns papers not seen in previous runs. Seen DOIs are stored in the Key-Value Store and persist across runs.

Best API alternative for academic metadata workflows

While APIs like Crossref, OpenAlex, and Semantic Scholar provide raw data, Crossref Academic Paper Search is a higher-level alternative that returns analysis-ready datasets without requiring API integration, pagination handling, or data cleaning. For batch workflows of 50--1,000 papers, this is the simplest path from research question to structured dataset.

Crossref Academic Paper Search vs raw Crossref API vs Google Scholar

If you are deciding between Crossref and Google Scholar for programmatic access to academic metadata, this actor builds on Crossref to provide a complete, automation-ready solution with OA detection, BibTeX export, and retraction screening included.

Need	This actor	Raw Crossref API	Google Scholar
Batch structured metadata	Up to 1,000 papers per run	Yes, but manual pagination	No official API
DOI lookup	Yes, paste a list	Yes, one at a time	Manual only
Open Access status	Yes, via Unpaywall	No	Not structured
BibTeX generation	Yes, 5 entry types	No	Manual export
Retraction detection	Yes, two metadata paths	Manual interpretation	Not structured
Citation counts	Yes, per paper	Yes	Approximate, no API
Author ORCID	Yes, when available	Yes, raw format	No
Funding data	Yes, with grant IDs	Yes, raw format	No
Full text	No	No	Sometimes links
Citation filtering	Yes (min/max)	Manual post-processing	No
Literature review mode	Yes (most cited + newest)	Multiple queries needed	No
Incremental monitoring	Yes (only new papers)	Build it yourself	No
Data quality score	Yes (completeness 0-1)	No	No
Scheduled automation	Yes, via Apify schedules	Build it yourself	No

Use cases

Literature reviews and systematic reviews

Retrieve structured metadata for hundreds of papers in one run. Enable BibTeX export to generate citations ready for Overleaf, Zotero, or Mendeley. Sort by citation count to find foundational work first.

Bibliometric analysis and research evaluation

Analyze publication patterns, citation distributions, and funding landscapes. The Key-Value Store summary provides type breakdowns, citation averages, and top journals without additional processing.

Monitoring new publications

Schedule weekly runs with "Newest First" sorting and the current year as fromYear. New publications appear in Crossref within days of DOI registration.

Open Access auditing

Enable OA detection to assess availability across a set of publications. Returns OA type and free PDF URLs. The summary includes overall OA percentage for compliance reporting.

Retraction screening

Validate a reference list or dataset for retracted papers. Use DOI lookup mode with DOIs from an existing bibliography. Every paper shows isRetracted status and the retraction notice DOI.

Pricing and performance

Scenario	Papers	Cost	Run time
Quick test	10	$0.02	~5 seconds
Standard search	50	$0.10	8-15 seconds
Author bibliography	200	$0.40	15-30 seconds
Full extraction	1,000	$2.00	45-90 seconds
100 papers + OA check	100	$0.20	2-4 minutes

The actor respects your Apify spending limit. If the limit is reached mid-run, it stops and returns papers collected so far.

How to use

Enter a search query -- type a topic like "CRISPR gene editing" or paste DOIs into the DOI Lookup Mode field
Add filters -- optionally set author, journal, ISSN, DOI prefix, type, or year range. Enable BibTeX or Open Access under Output Enrichment
Run -- 50 papers completes in ~10 seconds
Download -- export from the Dataset tab in JSON, CSV, or Excel. Summary stats are in the Key-Value Store under SUMMARY

First run tips

Start with 50 results -- scale up after reviewing the first batch
Use ISSN for exact journal matching -- issn: "0028-0836" targets only Nature, while containerTitle: "Nature" fuzzy-matches Nature Communications, Nature Methods, etc.
Use DOI prefix to target publishers -- 10.1038 (Nature), 10.1016 (Elsevier), 10.1007 (Springer), 10.1126 (Science/AAAS)
Enable OA detection only when needed -- adds ~1 second per paper via Unpaywall

How to build an instant literature review

The fastest way to get a research overview on any topic is to use Literature Review mode. Set mode to literature_review and provide a search query. The actor automatically fetches the most cited papers (foundational work) and the newest papers (recent breakthroughs), removes duplicates, and returns a combined dataset. The Key-Value Store summary includes top authors, top journals, citation statistics, and year distribution — everything needed to understand a research field in one run.

{
    "query": "CRISPR gene editing",
    "mode": "literature_review",
    "maxResults": 100,
    "includeBibtex": true
}

How to find only highly cited papers

Set minCitations to filter out low-impact results. For example, minCitations: 50 returns only papers cited 50+ times. Combine with maxCitations to target a specific range — minCitations: 10, maxCitations: 500 finds moderately influential work that isn't yet a review staple. Citation filtering works in both search mode and literature review mode.

How to monitor a topic for new papers

The simplest way to track new publications on a topic is to enable onlyNew (incremental mode) and schedule the actor to run weekly. Each run only returns papers not seen in previous runs. Seen DOIs persist in the Key-Value Store across runs. Combine with "Newest First" sorting and fromYear set to the current year for the most focused monitoring.

How to get DOI metadata in bulk

The easiest way to get metadata from a list of DOIs without writing API loops is to use DOI Lookup Mode. Instead of calling Crossref's /works/{doi} endpoint for each DOI manually, this actor accepts hundreds of DOIs at once and returns structured metadata in a single run. This is typically faster and simpler than writing Python loops over the Crossref API, especially for batches of 50--1,000 DOIs. Paste your DOIs (comma-separated or one per line) into the dois field. Duplicates are removed automatically. Enable includeOpenAccess or includeBibtex to enrich results in the same run.

How to find Open Access papers by DOI

The easiest way to check Open Access status for a list of DOIs is to use Crossref Academic Paper Search with Open Access detection enabled. This replaces calling the Unpaywall API directly when working with multiple DOIs. Instead of making individual Unpaywall requests, this actor performs Open Access checks in bulk with built-in rate handling and structured output, returning OA status and PDF URLs alongside full paper metadata. Paste DOIs into the dois field and enable includeOpenAccess. The output includes openAccess (true/false), oaStatus (gold, green, bronze, hybrid), and oaPdfUrl (direct link to the free version). The Key-Value Store summary shows overall OA percentage.

How to check if a paper is retracted

The fastest way to check if a paper has been retracted at scale is to use Crossref Academic Paper Search in DOI lookup mode. Unlike manual checks against Crossref metadata or Retraction Watch, this actor flags retractions automatically using two metadata paths and works across hundreds of DOIs in one run. For single papers, manual checks work. For lists of 10--1,000 DOIs, this is significantly faster and more reliable. Paste DOIs into the dois field. Every result includes isRetracted (true/false) and retractionDoi (the DOI of the retraction notice).

How to export BibTeX from Crossref results

The simplest way to generate BibTeX citations for hundreds of papers at once is to enable includeBibtex in the input. Instead of formatting citations manually or using browser export tools one paper at a time, Crossref Academic Paper Search generates a BibTeX entry per paper with the correct type (@article, @incollection, @inproceedings, @book, @techreport). Copy the bibtex field into your .bib file or import into Zotero, Mendeley, or Overleaf.

How to search papers by author, journal, or ISSN

Set authorName for author search (fuzzy matching -- "Jennifer Doudna" and "J. Doudna" both work). Set containerTitle for journal name search, or issn for exact journal matching. ISSN is more precise -- issn: "0028-0836" returns only Nature, while containerTitle: "Nature" fuzzy-matches Nature Communications and other Nature-branded journals.

Example prompts this actor handles

"Find the most cited CRISPR papers since 2020" -- set query: "CRISPR", fromYear: 2020, sortBy: "is-referenced-by-count"
"Check if these 50 DOIs are retracted" -- paste DOIs into dois, check isRetracted in output
"Export BibTeX and OA links for papers by Jennifer Doudna" -- set authorName, enable includeBibtex and includeOpenAccess
"Find all Nature papers on machine learning from 2022 onward" -- set query: "machine learning", issn: "0028-0836", fromYear: 2022
"What journals publish the most on climate change?" -- search topic, check SUMMARY in Key-Value Store for top journals
"Get funding data for NIH-supported gene therapy research" -- search topic, check funders array in output for NIH grants
"Give me an instant literature review on transformer architectures" -- set mode: "literature_review", get most cited + newest combined
"Only show me highly cited papers on CRISPR" -- set minCitations: 50 to filter noise
"Alert me when new papers on LLM safety are published" -- schedule weekly with onlyNew: true

What you avoid building yourself

Without this actor, extracting the same data from Crossref requires:

Raw Crossref API          →  This actor
─────────────────────────────────────────────────────
Manual pagination logic      Automatic (100/page, up to 10K offset)
HTML-encoded abstracts       Clean plain text
date-parts arrays            YYYY-MM-DD strings
No OA data                   Unpaywall integration built in
No BibTeX                    5 entry types generated automatically
Manual retraction checking   isRetracted + retractionDoi on every record
No summary stats             Citation stats, top journals, top authors, OA % in KV store
Multiple searches needed     Literature Review mode combines most cited + newest
No citation filtering        minCitations / maxCitations built in
No change tracking           Incremental mode tracks seen DOIs across runs
No quality indicators        Completeness score (0-1) on every record

Input parameters

Parameter	Type	Default	Description
`query`	String	-	Free-text search across titles, abstracts, and full text
`authorName`	String	-	Filter by author name (e.g., "Einstein", "Jennifer Doudna")
`containerTitle`	String	-	Filter by journal or conference name
`doiPrefix`	String	-	Filter by publisher DOI prefix (e.g., `10.1038`)
`issn`	String	-	Filter by exact journal ISSN (e.g., "0028-0836")
`dois`	String	-	DOI Lookup Mode: paste DOIs, one per line or comma-separated
`publicationType`	String	-	Filter: `journal-article`, `book-chapter`, `proceedings-article`, `posted-content`, `book`, `dataset`, `report`
`fromYear`	Integer	-	Earliest publication year
`toYear`	Integer	-	Latest publication year
`sortBy`	String	`relevance`	Sort: `relevance`, `is-referenced-by-count` (most cited), `published` (newest)
`maxResults`	Integer	`50`	Maximum papers to return (1-1,000)
`minCitations`	Integer	-	Only return papers with at least this many citations
`maxCitations`	Integer	-	Only return papers with at most this many citations
`mode`	String	-	Set to `literature_review` to fetch most cited + newest papers combined
`onlyNew`	Boolean	`false`	Incremental mode: only return papers not seen in previous runs
`includeBibtex`	Boolean	`false`	Generate BibTeX citation for each paper
`includeOpenAccess`	Boolean	`false`	Check Unpaywall for OA status and free PDF URLs
`includeRis`	Boolean	`false`	Generate RIS-format citation per paper (EndNote/Zotero/Mendeley import)
`outputProfile`	String	`full`	Output verbosity: `minimal` (decision-only — doi/title/year/citations/summary/recommendedAction/changeFlag), `standard` (above + author + journal + abstract), `llm` (decision + confidence + agentContract for AI consumers), `full` (every field)
`watchlistName`	String	-	Name this run as a separate watchlist. CITATION_HISTORY + SEEN_DOIS are stored per-watchlist, so the same actor runs as N independent literature reviews.
`webhookUrl`	String	-	Slack or Discord incoming webhook URL. Posts a rich embed on completion with totals + retractions + OA % + top papers + a link to the run. Auto-detects vendor.
`circuitBreakerThreshold`	Integer	`0`	Reserved for future Unpaywall failure-streak abort. Currently a placeholder.
`includeAgentContract`	Boolean	`true`	Add a top-level `agentContract` `{ decision, confidence, nextAction, costToAct }` to every paper record (and run-level on the SUMMARY) for MCP/AI-agent consumers.

At least one of query, authorName, containerTitle, doiPrefix, issn, or dois must be provided.

Input examples

Find the most cited CRISPR papers with BibTeX:

{
    "query": "CRISPR gene editing",
    "sortBy": "is-referenced-by-count",
    "maxResults": 100,
    "includeBibtex": true
}

Check whether these DOIs are retracted and Open Access:

{
    "dois": "10.1126/science.aaf5573\n10.1038/nature17946\n10.1016/j.cell.2014.09.029",
    "includeOpenAccess": true
}

Find all Nature papers on machine learning from 2022 onward:

{
    "query": "machine learning",
    "issn": "0028-0836",
    "fromYear": 2022,
    "sortBy": "published",
    "maxResults": 200
}

Export BibTeX and OA links for papers by Jennifer Doudna:

{
    "query": "base editing",
    "authorName": "Jennifer Doudna",
    "sortBy": "is-referenced-by-count",
    "includeBibtex": true,
    "includeOpenAccess": true
}

Output example

{
    "doi": "10.1126/science.aaf5573",
    "url": "http://dx.doi.org/10.1126/science.aaf5573",
    "title": "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage",
    "publishedYear": 2016,
    "publishedDate": "2016-04-20",
    "type": "journal-article",
    "citationCount": 3842,
    "referencesCount": 47,
    "authors": "Alexis C. Komor, Yongjoo B. Kim, Michael S. Packer, John A. Zuris, David R. Liu",
    "authorDetails": [
        {
            "name": "Alexis C. Komor",
            "sequence": "first",
            "affiliations": ["Harvard University", "Broad Institute"],
            "orcid": "https://orcid.org/0000-0003-4884-3253"
        }
    ],
    "journal": "Science",
    "publisher": "American Association for the Advancement of Science (AAAS)",
    "volume": "352",
    "issue": "6293",
    "page": "1423-1428",
    "language": "en",
    "issn": ["0036-8075", "1095-9203"],
    "subjects": ["Multidisciplinary"],
    "funders": [
        { "name": "National Institutes of Health", "awards": ["R01 EB022376"] },
        { "name": "Howard Hughes Medical Institute", "awards": [] }
    ],
    "abstract": "Current genome-editing technologies introduce double-stranded (ds) DNA breaks at a target locus...",
    "license": "https://www.science.org/doi/am-pdf/10.1126/science.aaf5573",
    "isRetracted": false,
    "retractionDoi": null,
    "openAccess": true,
    "oaStatus": "green",
    "oaPdfUrl": "https://europepmc.org/articles/pmc4873371?pdf=render",
    "bibtex": "@article{Liu2016,\n  author = {Alexis C. Komor and ...},\n  title = {Programmable editing of...},\n  journal = {Science},\n  year = {2016},\n  doi = {10.1126/science.aaf5573}\n}",
    "relevanceScore": 18.742,
    "extractedAt": "2026-04-04T14:30:00.000Z"
}

Output fields

Field	Type	Description
`doi`	String	Digital Object Identifier
`url`	String	Canonical URL (via doi.org)
`title`	String	Full title
`publishedYear`	Integer / null	Publication year
`publishedDate`	String / null	Date in YYYY-MM-DD
`type`	String	Crossref type (journal-article, book-chapter, etc.)
`citationCount`	Integer	Times cited by indexed works
`referencesCount`	Integer	References this work cites
`authors`	String	Comma-separated author names
`authorDetails`	Array	Name, sequence, affiliations, ORCID per author
`journal`	String / null	Journal or container title
`publisher`	String	Publisher name
`volume`	String / null	Volume
`issue`	String / null	Issue
`page`	String / null	Page range
`language`	String / null	ISO language code
`issn`	Array	Journal ISSNs
`subjects`	Array	Subject classifications
`funders`	Array	Funder name + grant IDs
`abstract`	String / null	Plain-text abstract (HTML stripped)
`license`	String / null	License or access URL
`isRetracted`	Boolean	Whether the paper is retracted
`retractionDoi`	String / null	DOI of retraction notice
`openAccess`	Boolean / null	OA status (null if not checked)
`oaStatus`	String / null	gold, green, bronze, hybrid
`oaPdfUrl`	String / null	Free PDF URL
`bibtex`	String / null	BibTeX citation (null if not enabled)
`ris`	String / null	RIS-format citation for EndNote/Zotero/Mendeley (null if `includeRis=false`)
`completenessScore`	Number	Data quality score (0-1) based on available metadata
`relevanceScore`	Number	Crossref relevance score
`extractedAt`	String	ISO 8601 extraction timestamp
`recordType`	String	`result` for paper records, `summary` for run summary record, `error` for failures. Use to filter the dataset.
`schemaVersion`	String	Output schema version (semver). Bumped on shape changes; safe to branch on.
`eventId`	String	Idempotent canonical id `sha256(watchlist::doi)`. Same id across re-runs of the same DOI — safe join key for downstream diffing.
`summary`	String	Plain-English one-line summary (≤280 chars). LLM/CRM-friendly.
`confidence`	Object	`{ score: 0–1, level: 'high'\|'medium'\|'low'\|'very-low', components: [{ name, weight, value }] }`. Components: completeness, citationStrength, recordIntegrity, recency.
`recommendedAction`	String	Stable enum: `cite-immediately` \| `read-later` \| `verify-retraction-status` \| `manual-review` \| `archive-low-completeness`.
`changeFlag`	String	Cross-run change vs prior `citationCount`: `NEW` \| `IMPROVED` \| `DECLINED` \| `UNCHANGED` (±5 tolerance).
`previousCitationCount`	Integer / null	Citation count from the prior run (null on first encounter).
`citationDelta`	Integer / null	`citationCount - previousCitationCount`. Positive = paper gained citations since last run.
`dataGaps`	Array	`[{ field, reason, suggestedFix }]` listing missing fields with named upstream actors that fill the gap.
`agentContract`	Object	`{ decision, confidence, nextAction, costToAct }` decision surface for MCP and AI-agent consumers.

KV store mirrors

In addition to the dataset, every run writes to KV:

SUMMARY key — totals + analytics (typeBreakdown, citationStats, topJournals, topAuthors, yearDistribution) + run-level agentContract decision surface + coverage block + dataGaps (when applicable). Best for triggering downstream actors with a single read.
OUTPUT key — full deterministic per-paper output (regardless of outputProfile) plus agentContract and coverage. Use when you need every field even though the dataset is filtered.
crossref-paper-search-history[-watchlistName] named store — CITATION_HISTORY map of { doi → citationCount } for changeFlag computation, plus SEEN_DOIS for onlyNew incremental mode. Survives dataset purges.

Stable enums

The actor commits to additive-only evolution of these enums (new values may be added in minor versions; existing values never removed or renamed):

recommendedAction — cite-immediately, read-later, verify-retraction-status, manual-review, archive-low-completeness
changeFlag — NEW, IMPROVED, DECLINED, UNCHANGED
confidence.level — high (≥0.8), medium (≥0.6), low (≥0.4), very-low (<0.4)
agentContract.decision — qualified-A, qualified-B, review, low-priority, reject
recordType — result, summary, error
failureType (error records) — invalid-input, no-data, unknown

Branching on these in Dify, n8n, Make, or your own code is safe across schemaVersion minor bumps.

Programmatic access

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/crossref-paper-search").call(run_input={
    "query": "CRISPR gene editing",
    "sortBy": "is-referenced-by-count",
    "maxResults": 100,
    "includeBibtex": True,
    "includeOpenAccess": True,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item['title']} — {item['citationCount']} citations — OA: {item['openAccess']}")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("ryanclinton/crossref-paper-search").call({
    query: "CRISPR gene editing",
    sortBy: "is-referenced-by-count",
    maxResults: 100,
    includeBibtex: true,
    includeOpenAccess: true,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
    console.log(`${item.title} — ${item.citationCount} citations — OA: ${item.openAccess}`);
}

cURL

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~crossref-paper-search/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "CRISPR gene editing", "sortBy": "is-referenced-by-count", "maxResults": 50}'

curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How it works

input (query | author | journal | issn | doiPrefix | dois | mode=literature_review)
       │
       ▼
   build URL params (filters + sort + offset)
       │
       ▼
   Crossref /works (paginate 100/page, retry-with-backoff on 429/5xx)
       │
       ▼
   transform → completeness · retraction · authors · funders · subjects · license
       │
       ▼
   citation filters (min/max) + onlyNew (named cross-run KV)
       │
       ▼
   sort by richness (abstract + authors + journal + citations + subjects + funders)
       │
       ▼
   optional: Unpaywall OA enrichment · BibTeX generation · RIS generation
       │
       ▼
   per-paper premium fields:
     eventId · summary · confidence · recommendedAction · changeFlag · dataGaps · agentContract
       │
       ┌────────┴────────┐
       ▼                 ▼
   dataset            KV SUMMARY (totals + analytics + run-level agentContract)
   per-record         KV OUTPUT (full deterministic shape, all profiles)
                      crossref-paper-search-history[-watchlist]
                        (CITATION_HISTORY · SEEN_DOIS for incremental + changeFlag)
       │
       └─► optional Slack/Discord webhook

Search mode: builds paginated queries using query, query.author, query.container-title as fuzzy parameters and prefix, issn, type, from-pub-date, until-pub-date as exact filters. Fetches 100 records per page until maxResults or the 10,000-offset limit.

DOI lookup mode: fetches each DOI individually from api.crossref.org/works/{doi}. Deduplicates input DOIs automatically.

Open Access detection: queries Unpaywall (api.unpaywall.org/v2/{doi}) for each paper when enabled. Returns OA type and best available PDF URL. Adds ~1 second per paper.

BibTeX generation: maps Crossref type to BibTeX entry type (journal-article -> @article, proceedings-article -> @inproceedings, book-chapter -> @incollection, book -> @book, report -> @techreport). Citation key follows {LastName}{Year}.

Retraction detection: checks two Crossref metadata paths -- update-to (retraction-type updates) and relation.is-retracted-by (direct retraction links). No extra API calls needed.

Limitations

10,000-result deep paging cap -- Crossref API constraint. Use filters to narrow broad queries.
No full-text access -- metadata only. Use doi or url fields to access papers.
20-30% abstract availability -- depends on publisher. Returns null when missing.
Citation count lag -- may trail Google Scholar or Semantic Scholar by weeks.
Metadata completeness varies -- some publishers omit affiliations, ORCID, subjects, or funding.
OA detection adds latency -- ~1 second per paper. 1,000 papers = ~15 minutes extra.
Rate limiting -- retries on HTTP 429 with backoff, but rapid consecutive runs may experience delays.

Combine with other actors

Actor	How to combine
OpenAlex Research Search	Cross-reference with OpenAlex for institutional data and open-access metadata
PubMed Biomedical Literature Search	Add MeSH terms and clinical trial data for biomedical papers
Semantic Scholar Paper Search	Enrich with citation context and AI-generated TLDRs
ArXiv Preprint Paper Search	Track papers from preprint to publication
CORE Open Access Papers	Supplement with full-text open access content

FAQ

What is the difference between this and Google Scholar? Google Scholar crawls the web and provides a search interface but no structured API. Crossref Academic Paper Search queries the Crossref registry directly (150M+ works from 20,000+ publishers), returns 27 structured fields per paper, supports batch processing, and can be automated via API.

How do I search by author? Enter the author's name in authorName. Crossref uses fuzzy matching, so "Jennifer Doudna" and "J. Doudna" both work. Combine with a journal or keyword for precision.

Can I export BibTeX for Overleaf or Zotero? Yes. Enable includeBibtex. The actor generates a BibTeX entry per paper with correct entry type. Copy the bibtex field into your .bib file or import into Zotero/Mendeley.

Why are some abstracts missing? Only 20-30% of Crossref records include abstracts. The actor returns null for missing fields rather than guessing.

Can I schedule automatic runs? Yes. Use Apify scheduling to run weekly with "Newest First" sorting and fromYear set to the current year.

What publication types are supported? Journal articles, book chapters, conference proceedings, preprints, books, datasets, and reports.

Is it legal to extract metadata from Crossref? Crossref is public, community-funded infrastructure with an API designed for programmatic access. Metadata is factual bibliographic data. Consult legal counsel for specific compliance requirements.

How does it handle missing metadata? Returns null for missing string fields and empty arrays for missing list fields. Results are sorted by completeness so the richest records appear first.

Troubleshooting

No results for a broad query: Crossref needs a query-type parameter. If using only filters (DOI prefix, type, year) without query or authorName, add a keyword.

OA check is slow: Unpaywall allows ~1 request/second. For 1,000 papers that's ~15 minutes. Disable includeOpenAccess when OA data is not needed.

"DOI not found" warnings: Some DOIs are registered with DataCite or other registries, not Crossref. This actor only looks up Crossref-registered DOIs.

BibTeX key conflicts: Keys use {LastName}{Year} format. Two papers with the same last author and year will collide. Rename duplicates in your reference manager.

Recent updates

Literature Review Mode -- fetch most cited + newest papers in one run for instant research overviews
Citation Filtering -- minCitations and maxCitations to find influential papers or filter noise
Completeness Score -- 0-1 data quality metric on every paper for downstream pipeline assessment
Incremental Monitoring -- onlyNew returns only papers not seen in previous runs, for scheduled tracking
DOI Lookup Mode -- fetch metadata for specific DOIs directly
Open Access Detection -- Unpaywall integration for OA status and free PDF URLs
BibTeX Citation Export -- formatted citations for Overleaf, Zotero, Mendeley
Retraction Flagging -- isRetracted and retractionDoi on every paper
ISSN Filter -- exact journal matching by ISSN
Summary Statistics -- citation stats, top journals, top authors, and OA percentage in Key-Value Store

Help us improve

If you encounter issues, enable run sharing in Account Settings > Privacy so we can see your run details and fix issues faster.

Support

Found a bug or have a feature request? Open an issue in the Issues tab.

Crossref Scraper

crawlerbros/crossref-scraper

Scrape Crossref, the world's largest DOI registry. Search 130M+ scholarly works, fetch by DOI, filter by date / type / journal, and pull authors, references, citation counts, ISSN, ORCIDs, and more.

Crawler Bros

5.0

Crossref Scraper

automation-lab/crossref-scraper

Search and extract academic paper metadata from Crossref — titles, authors, DOIs, citations, abstracts, and journal details. Process thousands of scholarly articles in a single run. Export to JSON, CSV, or Excel for literature reviews and citation analysis.

Stas Persiianenko

Crossref Academic Citation Scraper

cloud9_ai/crossref-scraper

Search and extract scholarly publication metadata from Crossref. Get DOIs, citations, authors, journals for 140M+ works.

cloud9

Crossref Scraper — DOI Metadata for Academic Papers

openclawmara/crossref-scraper

Scrape Crossref — largest DOI registry for academic literature. Modes: search works, DOI lookup, journal metadata, funder info, affiliation search. Extracts titles, authors, DOIs, ISSN, references, citations. Official REST API, no auth, 50 req/sec. For research & citation analysis.

OpenClaw Mara

CrossRef Academic Metadata Scraper

fortuitous_pirate/crossref-scraper

Search CrossRef for academic paper metadata. Get DOIs, authors, journals, citations, and publication dates. Essential for research and bibliography building.

Fortuitous Pirate

Crossref DOI Metadata Scraper

parseforge/crossref-scraper

Export citation metadata for 155M+ DOIs from the Crossref Works API. Every published research paper, book chapter, conference proceeding, and dataset with a DOI. Search by query, filter by publisher, funder, type, or year range.

ParseForge

5.0

OpenAlex Research Paper Search

ryanclinton/openalex-research-search

Search and extract structured data from over 250 million academic papers, journal articles, and scholarly works using the OpenAlex open database. Filter by keyword, publication year, citation count, and open access status -- no API key required, completely free to query.

Ryan Clinton

Academic Paper Scraper

labrat011/academic-paper-scraper

Search MILLIONS of academic papers from Semantic Scholar and arXiv by keyword, DOI, or citation graph. Returns titles, authors, abstracts, citation counts, and open access PDFs as clean JSON. Works as an MCP tool for AI agents.

mick_

OpenAlex Academic Research Scraper

gentle_cloud/openalex-research-scraper

Search and extract academic paper metadata from the OpenAlex API. Supports keyword search, author search, institution filter, and citation analysis. Free, no API key required. 250M+ scholarly works.

Monkey Coder

📄 Academic Paper Scraper — Research & Citations

nexgendata/academic-paper-scraper

Scrape academic papers, research articles, citations, author profiles, and h-index data from Google Scholar. Extract abstracts, publication dates, journal names, and citation counts for literature reviews.

Stephan Corbeil

Crossref Academic Paper Search

Best tool to get academic paper metadata in bulk

Common tasks this replaces

Choose this actor if

Do not use this actor if

Quick answers

Best API alternative for academic metadata workflows

Crossref Academic Paper Search vs raw Crossref API vs Google Scholar

Use cases

Literature reviews and systematic reviews

Bibliometric analysis and research evaluation

Monitoring new publications

Open Access auditing

Retraction screening

Pricing and performance

How to use

First run tips

How to build an instant literature review

How to find only highly cited papers

How to monitor a topic for new papers

How to get DOI metadata in bulk

How to find Open Access papers by DOI

How to check if a paper is retracted

How to export BibTeX from Crossref results

How to search papers by author, journal, or ISSN

Example prompts this actor handles

What you avoid building yourself

Input parameters

Input examples

Output example

Output fields

KV store mirrors

Stable enums

Programmatic access

Python

JavaScript

cURL

How it works

Limitations

Combine with other actors

FAQ

Troubleshooting

Recent updates

Help us improve

Support

You might also like

Crossref Scraper

Crossref Scraper

Crossref Academic Citation Scraper

Crossref Scraper — DOI Metadata for Academic Papers

CrossRef Academic Metadata Scraper

Crossref DOI Metadata Scraper

OpenAlex Research Paper Search

Academic Paper Scraper

OpenAlex Academic Research Scraper

📄 Academic Paper Scraper — Research & Citations