Biorxiv Preprint Scraper
Pricing
from $10.00 / 1,000 results
Biorxiv Preprint Scraper
Scrapes preprint paper metadata from the bioRxiv API by date range and optional category.
Pricing
from $10.00 / 1,000 results
Rating
0.0
(0)
Developer

Donny
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 hours ago
Last modified
Categories
Share
bioRxiv Preprint Paper Scraper
What it does
This actor scrapes preprint paper metadata from the bioRxiv API. Unlike keyword-based search, the bioRxiv API operates on date ranges, allowing you to retrieve all preprints published within a specified period. You can optionally filter results by scientific category (e.g., neuroscience, bioinformatics, genomics). Each record includes DOI, title, authors, corresponding author, publication date, category, abstract, and publication status.
Why use this actor
This actor provides a simple, reliable, and scalable way to extract data from public APIs without needing to write any code or manage infrastructure. It handles pagination, rate limiting, error recovery, and data normalization automatically. Whether you are a researcher, analyst, or developer, this actor saves you hours of manual data collection work. The structured JSON output integrates seamlessly with spreadsheets, databases, dashboards, and downstream data pipelines. Run it on Apify platform for automatic scheduling, monitoring, and proxy management.
Input parameters
- startDate (string, optional): Start date in YYYY-MM-DD format. Default:
"2024-01-01". - endDate (string, optional): End date in YYYY-MM-DD format. Default:
"2024-12-31". - category (string, optional): Filter by scientific category (e.g.,
"neuroscience","bioinformatics"). Leave empty for all categories. - maxResults (integer, optional): Maximum number of preprints to return. Range: 1-1000. Default:
100.
Output data
The actor outputs a dataset with the following fields:
- doi: Digital Object Identifier for the preprint
- title: Title of the preprint paper
- authors: Full author list
- correspondingAuthor: Name of the corresponding author
- date: Publication date on bioRxiv
- category: Scientific category (e.g., neuroscience)
- abstract: Full abstract text
- published: Journal publication status or name
Each record is validated and null-checked before being pushed to the dataset. Missing or unavailable fields are set to null rather than being omitted, ensuring consistent schema across all records.
Example output
[{"doi": "10.1101/2024.01.15.575123","title": "Novel Neural Circuit Mechanisms in Mouse Hippocampus","authors": "Smith, J.; Johnson, K.; Williams, L.","correspondingAuthor": "Smith, J.","date": "2024-01-15","category": "neuroscience","abstract": "We describe a novel neural circuit mechanism...","published": "NA"}]
Pricing
This actor is priced based on usage:
- $0.01 per result returned in the dataset
- $0.005 per actor start (flat fee per run)
These costs cover Apify platform compute and proxy resources. For large-scale scraping jobs, consider using the maxResults parameter to control costs and stay within your budget. Typical runs of 100 results cost approximately $1.01.
More scrapers from brave_paradise
Check out other actors published by brave_paradise on the Apify Store for more data extraction tools covering scientific databases, developer communities, news aggregators, and government open-data APIs. All actors follow the same high-quality patterns with robust error handling, automatic pagination, and clean structured output.