Academic Papers Scraper - Citations & Metadata
Pricing
from $1.50 / 1,000 results
Academic Papers Scraper - Citations & Metadata
Search 150M+ scholarly works by keyword (or look up by DOI) for structured metadata: title, authors with ORCID, journal, publication date, type, publisher, citation count, subjects, ISSN, volume/issue/pages and URL. Fast and reliable via the public Crossref API.
Pricing
from $1.50 / 1,000 results
Rating
0.0
(0)
Developer
ben
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
π Academic Papers Scraper β Crossref Citations & Metadata
Search 150M+ scholarly works by keyword β or look up exact papers by DOI β and get clean, structured metadata for every result: title, authors, journal, publication date, work type, publisher, citation count, subjects, ISSN, volume, issue, pages, language, DOI and resolver URL. It turns a research question into a ready-to-analyze dataset in seconds, with no manual copy-pasting from publisher sites.
The actor is powered by the public Crossref REST API, the same registration agency that mints DOIs for most of the world's journals, so it is fast, reliable and needs no browser, no login and no API key. Export to JSON/CSV/Excel, run on a schedule, call via API, or connect to Make, Zapier or n8n.
π What is the Academic Papers Scraper?
Crossref is the central index behind academic publishing β nearly every journal article, conference paper, book chapter, dataset and preprint with a DOI is registered there. This actor lets you query that index programmatically. Give it one or more topics (e.g. machine learning, crispr gene editing) and it returns the top matching works as structured rows, ranked by Crossref relevance. Prefer exact papers? Pass a list of DOIs and it looks each one up directly. You can also restrict results to recent years to focus on the current state of a field.
Because it runs against a clean JSON API rather than scraping HTML, results are consistent, well-typed and never blocked β ideal for reproducible literature reviews and automated research pipelines.
What data does it extract?
- DOI β the persistent identifier for the work
- Title of the paper
- Authors β full list of author names, plus an author_count
- Journal / container title
- Published date and year
- Type β journal-article, proceedings-article, book-chapter, dataset, posted-content, etc.
- Publisher β e.g. Springer, Elsevier, IEEE, Wiley
- Citations β Crossref
is-referenced-bycount (how often the work is cited) - Reference count β number of works it cites
- Subjects β subject/field classifications
- ISSN β journal serial number(s)
- Volume, issue and pages
- Language of the work
- URL β the DOI resolver link (
https://doi.org/β¦) - Query β the search term that surfaced the row
β¬οΈ Input
Run it two ways β search by keyword, or look up exact papers by DOI. You can combine both in one run.
| Field | Type | Description |
|---|---|---|
searchTerms | array | Keywords/topics to search, e.g. machine learning. One or many. |
dois | array | Optional: exact works to pull by DOI, e.g. 10.1038/nphys1170. |
fromYear | string | Optional: only return works published from this year onwards, e.g. 2020. |
maxPerTerm | integer | Max papers to return per search term. Default 20, up to 100. |
Example input
{"searchTerms": ["machine learning", "protein folding"],"fromYear": "2020","maxPerTerm": 50}
β¬οΈ Output
Each work is one clean row (view as a table, or export JSON / CSV / Excel):
{"doi": "10.1038/s41586-020-2649-2","title": "Array programming with NumPy","authors": ["Charles R. Harris", "K. Jarrod Millman", "StΓ©fan J. van der Walt"],"author_count": 26,"journal": "Nature","published_date": "2020-09-16","year": 2020,"type": "journal-article","publisher": "Springer Science and Business Media LLC","citations": 9712,"reference_count": 47,"subjects": ["Multidisciplinary"],"issn": ["0028-0836", "1476-4687"],"volume": "585","issue": "7825","pages": "357-362","language": "en","url": "https://doi.org/10.1038/s41586-020-2649-2","query": "machine learning"}
π‘ Use cases
- π Literature reviews β gather every relevant paper on a topic with citations, authors and journals in one pass, then filter and sort in a spreadsheet.
- π Research dashboards & bibliometrics β track publication output by journal, year, subject or publisher and visualize trends over time.
- π Citation analysis β rank works by citation count to find the seminal papers in a field and spot rising research.
- π€ RAG / LLM & app pipelines β feed structured paper metadata into retrieval systems, reference managers or your own research tools.
β FAQ
How do I scrape academic papers? Enter one or more searchTerms (or dois), set maxPerTerm, and Run. You get structured rows with title, authors, journal, date, citations, subjects and DOI.
Do I need an API key or login? No. It uses the public Crossref API β just provide keywords or DOIs.
How many works are covered? 150M+ records across virtually every scholarly publisher that registers DOIs with Crossref.
Can I look up a specific paper? Yes β pass one or more DOIs in dois and the actor pulls each exact record.
Can I restrict results to recent years? Yes β set fromYear (e.g. 2020) to only return works published from that year onwards.
Does it include citation counts? Yes β the Crossref is-referenced-by count is returned as citations, along with the outgoing reference_count.
Which publishers and document types are included? All of them β journal articles, conference proceedings, book chapters, datasets, preprints and more, from Springer, Elsevier, IEEE, Wiley, Nature, PLOS and thousands of others.
Can I run it on a schedule or via API? Yes β schedule recurring runs on Apify, call it via the API/SDK, or connect it to Make, Zapier or n8n.
How does pricing work? Pay per paper returned β no subscription, no fixed monthly fee.
Is it legal? It uses the public, openly documented Crossref REST API, which is designed for exactly this kind of metadata retrieval. Use the data responsibly and in line with Crossref's terms.
π You might also like
- arXiv Scraper β preprints in physics, CS, math & more.
- PubMed Scraper β biomedical & life-sciences literature.
- OpenAlex Scraper β open scholarly graph: works, authors & institutions.
- Wikipedia Scraper β clean article summaries & metadata.
Keywords: academic papers scraper, crossref api, crossref scraper, scholarly metadata, citation data, doi lookup, research papers, journal articles, bibliographic data, literature review, citation analysis, bibliometrics, science data, paper metadata, academic search, publication data, scholarly works, research dataset, doi metadata, journal scraper