bioRxiv & medRxiv Preprint Scraper
Pricing
from $3.00 / 1,000 results
bioRxiv & medRxiv Preprint Scraper
Scrape preprints from bioRxiv and medRxiv with the leading open-access preprint servers for biology and medicine. Search by date range, fetch by DOI, or retrieve published journal version information.
Pricing
from $3.00 / 1,000 results
Rating
5.0
(11)
Developer
Crawler Gang
Maintained by CommunityActor stats
11
Bookmarked
2
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share
Scrape preprints from bioRxiv and medRxiv — the leading open-access preprint servers for biology and medicine — powered by the official bioRxiv/medRxiv API.
No account, no API key, and no proxy required. Works on the Apify free plan.
What It Does
- Search by date range — retrieve all preprints posted within a date window (up to any span; automatically paginates through 90-day API chunks)
- Fetch by DOI — look up one or more specific preprints using their DOI
- Published version info — check whether a preprint has been published in a journal and retrieve the journal DOI and name
- Filter by category — narrow results to a specific scientific field (neuroscience, genomics, immunology, etc.)
- Both servers — query bioRxiv, medRxiv, or both simultaneously
Use Cases
- Track new preprints in your research field
- Build a literature monitoring or alerting pipeline
- Analyze publishing trends across biomedical disciplines
- Identify preprints that have been formally published in journals
- Aggregate author/institution data for research network analysis
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | select | search | search (date range), byDoi (DOI lookup), or published (journal version info) |
server | select | biorxiv | biorxiv, medrxiv, or both |
dateFrom | date | 2024-01-01 | Start date (YYYY-MM-DD). Required for mode=search |
dateTo | date | 2024-01-07 | End date (YYYY-MM-DD). Required for mode=search |
dois | array | — | One or more DOIs to look up (required for mode=byDoi and mode=published) |
category | select | All | Filter to a specific scientific category (mode=search only) |
maxItems | integer | 50 | Maximum number of records to return (1–10000) |
Supported Categories
Neuroscience, Bioinformatics, Genomics, Microbiology, Cell Biology, Biochemistry, Evolutionary Biology, Pharmacology and Toxicology, Immunology, Molecular Biology, Genetics, Cancer Biology, Scientific Communication, Pathology, Systems Biology, Ecology, Physiology, Epidemiology, Developmental Biology, Clinical Trials, Bioengineering, Plant Biology, Zoology, Biophysics, Synthetic Biology.
Output Fields
search and byDoi Modes
| Field | Type | Description |
|---|---|---|
doi | string | Preprint DOI |
title | string | Preprint title |
authors | string | All authors as a single string |
authorList | array | Authors as an array of strings |
correspondingAuthor | string | Name of the corresponding author |
institution | string | Corresponding author's institution |
submittedDate | string | Date submitted (YYYY-MM-DD) |
version | integer | Version number of the preprint |
type | string | Preprint type (e.g. "new results") |
license | string | License code (e.g. "cc_by", "cc0") |
category | string | Scientific category |
server | string | Source server (biorxiv or medrxiv) |
abstractText | string | Full abstract text |
jatsXmlUrl | string | URL to the JATS/XML version |
previewUrl | string | URL to view the preprint on biorxiv/medrxiv |
isPublished | boolean | Whether the preprint has a journal publication |
publishedDoi | string | Journal publication DOI (if published) |
scrapedAt | string | Timestamp when the record was scraped (ISO-8601) |
published Mode
| Field | Type | Description |
|---|---|---|
doi | string | bioRxiv/medRxiv preprint DOI |
title | string | Preprint title |
authors | string | Authors string |
category | string | Scientific category |
server | string | Preprint server |
isPublished | boolean | Whether a journal publication exists |
publishedDoi | string | Journal publication DOI |
publishedJournal | string | Journal name |
publishedDate | string | Journal publication date |
preprintDate | string | Date originally submitted as preprint |
preprintDoi | string | Original preprint DOI |
scrapedAt | string | Timestamp when the record was scraped |
Sample Output
Preprint Record
{"doi": "10.1101/2024.01.15.575123","title": "A Study of Neural Circuits in the Hippocampus","authors": "Smith J, Jones A, Brown C","authorList": ["Smith J", "Jones A", "Brown C"],"correspondingAuthor": "Smith J","institution": "Harvard University","submittedDate": "2024-01-15","version": 1,"type": "new results","license": "cc_by","category": "neuroscience","server": "biorxiv","abstractText": "This paper studies hippocampal circuits...","jatsXmlUrl": "https://www.biorxiv.org/content/10.1101/2024.01.15.575123v1.source.xml","previewUrl": "https://www.biorxiv.org/content/10.1101/2024.01.15.575123","isPublished": false,"scrapedAt": "2026-05-23T10:00:00+00:00"}
FAQ
Does this require an API key or account? No. The bioRxiv/medRxiv API is completely public and free. No registration required.
What is the maximum date range I can query? The bioRxiv API returns up to 100 preprints per call with a 90-day window. This scraper automatically splits larger date ranges into 90-day chunks and paginates through all of them.
How do I fetch a specific preprint?
Use mode=byDoi and enter the DOI (e.g. 10.1101/2024.01.01.612345) in the dois field.
Can I check if preprints have been published?
Yes — use mode=published with a list of DOIs to retrieve journal publication information including the journal name and published DOI.
What categories are available? bioRxiv covers biological sciences; medRxiv covers health sciences and clinical research. See the category dropdown in the input form for the full list.
Can I query both bioRxiv and medRxiv at once?
Yes — set server=both and the scraper will query both servers and combine results.
Why are some preprints missing fields like institution or abstractText?
These fields are only included when the data is available in the API response. Records with missing data will simply omit those fields rather than including null values.
How many records can I retrieve per run? Up to 10,000 records per run. For larger datasets, use narrower date ranges or run multiple times with offset date ranges.