Crossref Scholarly Works Scraper
Pricing
from $3.00 / 1,000 results
Crossref Scholarly Works Scraper
Extract scholarly works metadata from Crossref — DOIs, titles, authors, journals, publication dates, and citation counts. Filter by query, date range, and work type. No API key required.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Compute Edge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Extract scholarly works metadata from Crossref — DOIs, titles, authors, journals, publication dates, and citation counts. Query 135+ million scholarly articles, books, proceedings, and datasets via the Crossref REST API. Perfect for academic research, bibliometric analysis, literature reviews, and citation network studies.
What This Actor Does
This Actor provides a complete interface to the Crossref REST API, the world's largest scholarly work database. It supports four flexible search and filtering options:
- Free-Text Search — Search by keyword across titles, abstracts, and metadata (e.g., "machine learning", "COVID-19", "renewable energy")
- Publication Date Filtering — Restrict results to works published within a date range
- Work Type Filtering — Target specific work types (e.g., journal articles, books, proceedings, datasets)
- Pagination & Bulk Extraction — Automatically fetch up to 5,000 records per run using cursor-based pagination
Key Features
- 135+ million works — Access the complete Crossref dataset
- Rich metadata — DOI, title, authors, journal/container, publication date, citation counts, references
- Flexible filtering — Combine free-text search with date range and work type filters
- High-speed pagination — Cursor-based API ensures fast, stable bulk extracts
- No authentication required — Public API, free to use
- Error handling — Graceful fallback for missing or incomplete metadata
- Batch processing — Efficient extraction for large datasets
Popular Use Cases
| Use Case | Query Example | Work Type | Output |
|---|---|---|---|
| Literature Review | "climate change mitigation" | journal-article | Top 500 recent articles on climate solutions |
| Citation Network Analysis | "neural networks" | journal-article + proceedings | Papers by citation count for network mapping |
| Trend Tracking | "AI safety" | all types | New works published in last 30 days |
| Researcher Database | None (recent works) | all types | Latest 1,000 scholarly works across all fields |
| Book Discovery | "sustainable development" | book | Recent books on sustainability |
| Conference Proceedings | "machine learning" | proceedings | Peer-reviewed conference papers |
Getting Started
Step 1: Run the Actor
- Choose your input parameters (see below)
- Click Start
- Results appear in the Dataset tab
- Export as JSON or CSV via Apify UI
Step 2: Simple Example — Search Recent Works
To fetch 50 recent works (no search query):
- Query: (leave blank)
- Filter From Date: (leave blank)
- Work Type: (leave blank)
- Max Results:
50
Results include title, authors, journal, publication date, and DOI for each work.
How to scrape Crossref scholarly works
Tutorial 1: Search for Papers on Machine Learning
Goal: Find the top 100 recent journal articles on machine learning.
Input configuration:
- Query:
machine learning - Work Type:
journal-article - Filter From Date: (leave blank for all time)
- Max Results:
100
Expected output:
[{"doi": "10.1038/nature12373","title": "Deep Neural Networks Capture Context-Dependent Neural Activity in the Primate Visual System","type": "journal-article","publisher": "Nature Publishing Group","journal": "Nature","publishedDate": "2024-03-15","authorsCount": 5,"firstAuthor": "Antolik Mark","citationCount": 1240,"referenceCount": 45,"issn": "0028-0836","url": "https://doi.org/10.1038/nature12373"},...]
Use case: Build a curated bibliography of the most-cited machine learning papers for a literature review or research project.
Tutorial 2: Track Recent Works in a Specific Domain
Goal: Monitor all scholarly works on renewable energy published in the last 90 days.
Input configuration:
- Query:
renewable energy - Filter From Date:
2026-03-21(90 days before today) - Work Type: (leave blank for all types)
- Max Results:
500
Expected output:
[{"doi": "10.1016/j.renene.2026.03.001","title": "Advances in Perovskite Solar Cell Efficiency and Stability","type": "journal-article","publisher": "Elsevier","journal": "Renewable Energy","publishedDate": "2026-03-20","authorsCount": 8,"firstAuthor": "Liu Chen","citationCount": 0,"referenceCount": 67,"issn": "0960-1481","url": "https://doi.org/10.1016/j.renene.2026.03.001"},...]
Use case: Stay current with emerging research in your domain. Track high-impact journals and new author collaborations. Feed into a data pipeline for weekly research digest emails.
Tutorial 3: Citation Network Analysis
Goal: Extract 200 highly-cited papers on artificial intelligence to map research influence.
Input configuration:
- Query:
artificial intelligence - Work Type:
journal-article - Filter From Date: (leave blank)
- Max Results:
200
Expected output (sorted by citation count):
[{"doi": "10.1145/3495243.3560528","title": "Attention Is All You Need","type": "journal-article","publisher": "ACM","journal": "Transactions on Machine Learning Research","publishedDate": "2017-12-06","authorsCount": 8,"firstAuthor": "Vaswani Ashish","citationCount": 88450,"referenceCount": 72,"issn": "","url": "https://doi.org/10.1145/3495243.3560528"},...]
Use case: Build a citation network graph showing how papers reference each other. Identify foundational works and research clusters. Track influence trajectories of key researchers.
Input Parameters
All Modes
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
| query | string | (blank) | No | Free-text search query (e.g., "machine learning", "COVID-19"). Leave blank to fetch recent works. Case-insensitive. |
| filterFromDate | string (YYYY-MM-DD) | (blank) | No | Only include works published on or after this date (e.g., "2024-01-01"). Leave blank for all dates. |
| workType | string | (blank) | No | Filter by work type. Common values: journal-article, book, proceedings, report, dataset. Leave blank for all types. |
| maxResults | integer | 50 | No | Maximum works to fetch (1–5,000). Default is 50. |
Common Work Types
journal-article— Peer-reviewed journal articlesproceedings-articleorproceedings— Conference proceedingsbook— Complete booksbook-chapter— Chapters within booksreport— Technical reports, white papersdataset— Data publicationsdissertation— Theses and dissertationscomponent— Article components (figures, tables, appendices)
Full list: Visit https://github.com/CrossRef/rest-api-doc#work-types
Output Schema
Each record contains:
| Field | Type | Example | Description |
|---|---|---|---|
| doi | string | 10.1038/nature12373 | Digital Object Identifier — unique identifier for the work |
| title | string | Deep Neural Networks Capture... | Title of the work |
| type | string | journal-article | Work type (journal-article, book, proceedings, etc.) |
| publisher | string | Nature Publishing Group | Publisher name |
| journal | string | Nature | Journal or container name (empty for books) |
| publishedDate | string | 2024-03-15 | Publication date (YYYY-MM-DD, YYYY-MM, or YYYY format) |
| authorsCount | integer | 5 | Number of authors |
| firstAuthor | string | Antolik Mark | First author's full name (Given Family) |
| citationCount | integer | 1240 | Number of works that cite this work (from-referenced-by-count) |
| referenceCount | integer | 45 | Number of works referenced by this work |
| issn | string | 0028-0836 | International Standard Serial Number (for journals) |
| url | string | https://doi.org/10.1038/nature12373 | Persistent URL to the work via DOI |
Pricing
This Actor uses the free Crossref REST API (no usage limits or authentication required). You pay only for Apify compute time.
- Compute cost: ~$0.0001–0.001 per run (depends on result volume and API latency)
- Typical cost per batch: $0.01–0.10 for 50–500 works
- Bulk runs (1000–5000 works): ~$0.10–0.50 per run
The Crossref API itself is completely free — no subscriptions, no per-request charges, no rate limits for research use.
Example Workflows
Workflow 1: Weekly Research Digest Pipeline
- Run Actor every Monday with
filterFromDateset to last 7 days - Extract results to cloud storage (CSV/JSON export)
- Feed into email template to send digest to stakeholders
- Cost: ~$0.02/week
Workflow 2: Citation Network Analysis (Research Project)
- Run Actor with
query= your domain (e.g., "quantum computing") - Extract top 500 results (maxResults = 500)
- Load into network analysis tool (Gephi, Cytoscape)
- Visualize author collaborations and citation influence
- Cost: ~$0.05 per analysis run
Workflow 3: Automated Literature Review
- Run Actor monthly with your research keywords
- Filter by
workType= "journal-article" - Combine with external citation tools (Semantic Scholar, OpenAlex)
- Build automated bibliography in BibTeX or RIS format
- Cost: ~$0.01/month per search term
FAQ
"No works found" when searching
- Verify the query: Try a simpler term (e.g., "cancer" instead of "advanced oncology research methodologies")
- Check Crossref directly: https://search.crossref.org to validate query
- Try with blank query: Leave search blank to fetch recent works and verify the actor is working
- Expand date range: Remove
filterFromDateto include older works
Empty or incomplete author names
- Some works have missing or incomplete author metadata in Crossref's database
- The
firstAuthorfield will be empty if author data is unavailable - Crossref's data quality depends on publisher submission quality
- Check the URL (DOI link) for author details if needed
Missing ISSN or journal name
- Not all works have journal information (e.g., books, datasets, preprints)
- ISSN is only present for journal articles; other types may have empty
issn - The
journalfield corresponds tocontainer-titlein Crossref (may be empty for non-journal works)
Result limits (maxResults > 5000)
- Crossref cursor-based pagination supports up to 5,000 results per query
- For larger datasets, run the actor multiple times with different date ranges
- Example: Run once for 2024, once for 2023, etc.
API timeout or slow responses
- Crossref API is generally fast but can have occasional latency spikes
- Actor has a 60-second timeout per API request; retries are automatic
- If timeouts occur frequently, reduce
maxResultsand run multiple smaller batches
Advanced Usage
Combining Filters
You can combine query, filterFromDate, and workType in a single run:
Example: Find all conference proceedings on "quantum computing" published since 2024:
- Query:
quantum computing - Work Type:
proceedings - Filter From Date:
2024-01-01
Pagination & Large Extracts
The actor uses Crossref's cursor-based pagination internally. Each API request fetches up to 100 results; the actor automatically loops to fetch up to your maxResults limit.
- Requesting 5,000 results requires ~50 API calls
- Cost scales linearly: 5x results ≈ 5x cost (but still under $0.50)
Filtering Tips
By date range: Use filterFromDate (no "to date" parameter; filter is forward-looking)
- To get works from 2024 only, run once with
filterFromDate=2024-01-01, then again withfilterFromDate=2025-01-01and exclude those results
By work type: Common types are listed above; others exist but are rare
By publisher: Not a direct input, but you can add publisher names to your query text (e.g., "machine learning IEEE" to bias toward IEEE publications)
Output Examples
Example 1: Journal Article
{"doi": "10.1038/s41586-024-07301-x","title": "AlphaFold 3: Structure Prediction for Biology","type": "journal-article","publisher": "Nature Publishing Group","journal": "Nature","publishedDate": "2024-05-08","authorsCount": 47,"firstAuthor": "Abramson Josh","citationCount": 450,"referenceCount": 86,"issn": "0028-0836","url": "https://doi.org/10.1038/s41586-024-07301-x"}
Example 2: Conference Proceedings
{"doi": "10.1109/CVPR52688.2022.00988","title": "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks","type": "proceedings-article","publisher": "IEEE","journal": "2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","publishedDate": "2022-06-19","authorsCount": 3,"firstAuthor": "Lu Jiasen","citationCount": 2100,"referenceCount": 52,"issn": "2575-7075","url": "https://doi.org/10.1109/CVPR52688.2022.00988"}
Example 3: Book
{"doi": "10.1016/b978-0-08-102618-8.00001-3","title": "Sustainable Materials and Manufacturing","type": "book","publisher": "Elsevier","journal": "","publishedDate": "2023-09-15","authorsCount": 12,"firstAuthor": "Smith Richard","citationCount": 85,"referenceCount": 203,"issn": "","url": "https://doi.org/10.1016/b978-0-08-102618-8.00001-3"}
Related Actors
Looking for complementary research data sources?
- Open Research Online (Crossref-based) — Alternative Crossref interface
- DOAJ Open Journals Scraper — Extract open-access journals
- ROR Research Organizations Scraper — Academic institution metadata
- FRED Economic Data Scraper — Economic time series for research context
API Reference
For detailed Crossref API documentation:
- Crossref REST API Docs: https://github.com/CrossRef/rest-api-doc
- Search Guide: https://github.com/CrossRef/rest-api-doc#queries
- Filter Guide: https://github.com/CrossRef/rest-api-doc#filter-names
- Work Types: https://github.com/CrossRef/rest-api-doc#work-types
- Crossref Search Interface: https://search.crossref.org
Legal & Support
Disclaimer: This Actor fetches data from Crossref (https://www.crossref.org), a non-profit digital object identifier (DOI) registration agency. Crossref data is provided under the CC0 1.0 Universal (Public Domain Dedication) license and is free to use for any purpose. Crossref's terms: https://www.crossref.org/documentation/metadata-plus-service/metadata-plus-service-terms-and-conditions/
Support: If you encounter issues:
- Check the Crossref API documentation: https://github.com/CrossRef/rest-api-doc
- Test your query directly: https://search.crossref.org
- Verify work types: https://github.com/CrossRef/rest-api-doc#work-types
- Open an issue on Apify Community or contact support
User-Agent: This Actor identifies itself as apify-factory/1.0 (mailto:bciccarelli6@gmail.com) to access Crossref's polite pool (higher rate limits for well-behaved agents).
Built with ❤️ for researchers, academics, and bibliometricians.