medRxiv Scraper avatar
medRxiv Scraper

Pricing

Pay per event

Go to Apify Store
medRxiv Scraper

medRxiv Scraper

Extract comprehensive preprint data from medRxiv, including titles, authors, abstracts, full text, DOIs, citations, and metadata. Automate access to health-science preprints with structured outputs, ideal for researchers and analysts who need reliable, large-scale article data without manual work.

Pricing

Pay per event

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

📄 medRxiv Scraper

🚀 Supercharge your health sciences research with our comprehensive medRxiv scraper! Automate daily collection of detailed preprint article data with advanced search and filtering capabilities.

Designed for researchers, academics, data analysts, and healthcare professionals, this tool pulls comprehensive preprint information from medRxiv.org—the leading health sciences preprint server. Get critical information like article titles, authors, abstracts, full text, DOIs, citations, funding statements, and all metadata, all with no coding required.

Target Audience: Researchers, academics, data analysts, healthcare professionals, medical librarians
Primary Use Cases: Literature reviews, research analysis, academic data collection, systematic reviews, competitive intelligence

What Does medRxiv Scraper Do?

This tool collects detailed preprint article data from medRxiv.org, supporting both search-based scraping and custom URL scraping. It delivers:

  • Complete article metadata - Titles, authors, publication dates, DOIs, subject areas, keywords
  • Full text content - Complete article text extracted from detail pages
  • Author information - Author names, affiliations, corresponding author details
  • Abstracts and summaries - Full abstract text for each article
  • Citation information - Complete citation data for academic referencing
  • Funding and declarations - Funding statements, competing interests, author declarations
  • Data availability - Data/code URLs and supplementary materials links
  • Version history - Article version tracking and publication dates
  • Related articles - Links to related research papers
  • PDF downloads - Direct links to article PDFs
  • And much more - Comprehensive preprint intelligence in one scrape

Business Value: Make informed research decisions, track medical research trends, and identify relevant studies with comprehensive, up-to-date preprint intelligence that saves hours of manual research.

How to use the medRxiv Scraper - Full Demo

[YouTube video embed or link]

Watch this 3-minute demo to see how easy it is to get started!

Input

To start medRxiv web scraping, simply fill in the input form. You can scrape medRxiv using two different methods (choose one):

  • searchQuery - Enter a search term to find articles (e.g., "COVID-19", "autism", "diabetes treatment")
    • Required if startUrl is not provided
  • orderBy - Select how to sort results (optional):
    • "relevance" - Best match (default)
    • "oldest" - Oldest articles first
    • "newest" - Newest articles first
  • maxItems - Set the maximum number of articles to collect (optional):
    • Range: 1 to 1,000,000 articles
    • Required for free users (limited to 50 items)
    • Paid users can leave empty for unlimited (up to 1,000,000)

The scraper will automatically search medRxiv and collect all matching articles with complete details.

Method 2: Custom URL Scraping 🔗

  • startUrl - Use medRxiv.org search URLs in one of these supported formats:
    • Required if searchQuery is not provided
    • Cannot be used together with searchQuery
  • orderBy - Select how to sort results (optional):
    • "relevance" - Best match (default)
    • "oldest" - Oldest articles first
    • "newest" - Newest articles first
  • maxItems - Set the maximum number of articles to collect (optional):
    • Range: 1 to 1,000,000 articles
    • Required for free users (limited to 50 items)
    • Paid users can leave empty for unlimited (up to 1,000,000)

✅ Supported URL Formats:

Search Results Pages:

  • https://www.medrxiv.org/search/asd
  • https://www.medrxiv.org/search/covid-19
  • https://www.medrxiv.org/search/diabetes%20treatment

Paginated Search Results:

  • https://www.medrxiv.org/search/asd/page/2
  • https://www.medrxiv.org/search/covid-19?page=1

❌ Unsupported URL Formats:

  • URLs from other domains (must be medrxiv.org)
  • URLs that don't contain search or article information
  • Individual article detail page URLs (use search instead)

⚠️ Important Input Rules:

  1. Choose One Method: You must use either search-based scraping OR custom URL scraping, not both
  2. Required Fields:
    • Either searchQuery OR startUrl must be provided
    • maxItems is required for free users
  3. Mutual Exclusivity:
    • If using startUrl, you cannot use searchQuery
    • If using searchQuery, you cannot use startUrl
  4. Optional Fields: orderBy is available with both methods

Here's what the filled-out input configuration looks like in JSON:

{
"searchQuery": "autism spectrum disorder",
"orderBy": "newest",
"maxItems": 100
}

Example 2: Custom URL Scraping

{
"startUrl": "https://www.medrxiv.org/search/covid-19",
"orderBy": "relevance",
"maxItems": 50
}

Example 3: Latest Research on a Topic

{
"searchQuery": "diabetes treatment",
"orderBy": "newest",
"maxItems": 200
}

Pro Tips:

For Search-Based Scraping (Recommended):

  1. 🎯 Use specific search terms - More specific queries yield better results
  2. 📊 Sort by newest - Get the latest research first with orderBy: "newest"
  3. 🔍 Try different keywords - Experiment with synonyms and related terms
  4. Faster than manual URLs - No need to find and copy individual medRxiv URLs

For Custom URL Scraping:

  1. Go to medRxiv.org
  2. Use the search functionality to find articles on your topic
  3. Copy the search URL from your browser
  4. Paste it into the startUrl field

Output

After the Actor finishes its run, you'll get a dataset with the output. The length of the dataset depends on the amount of results you've set. You can download those results as an Excel, HTML, XML, JSON, and CSV document.

Here's an example of scraped medRxiv data you'll get if you decide to scrape article information:

{
"url": "https://www.medrxiv.org/content/10.1101/2025.10.17.25338146v1",
"title": "Highly scalable technology-assisted differential diagnostics of ASD",
"authors": [
"Irene Sophia Plank",
"Jana C. Koehler",
"Jonathan Eckelmann",
"Afton M. Bierlich",
"Richard Musil",
"Nikolaos Koutsouleris",
"Christine M. Falter-Wagner"
],
"authorDetails": [
{
"name": "Irene Sophia Plank"
},
{
"name": "Jana C. Koehler"
},
{
"name": "Jonathan Eckelmann"
},
{
"name": "Afton M. Bierlich"
},
{
"name": "Richard Musil"
},
{
"name": "Nikolaos Koutsouleris"
},
{
"name": "Christine M. Falter-Wagner"
}
],
"publicationDate": "Posted October 19, 2025.",
"doi": "https://doi.org/10.1101/2025.10.17.25338146",
"abstract": "Diagnosing autism spectrum disorder (ASD) in adulthood is time-consuming and markedly complicated by the requirement to distinguish between ASD and differential diagnoses also associated with social interaction difficulties, such as Borderline Personality Disorder (BPD) – a distinction for which currently no valid screening or diagnostic tool exists.",
"fullText": "Complete article text extracted from the full text page...",
"subjectAreas": [
"Psychiatry and Clinical Psychology"
],
"keywords": [],
"correspondingAuthor": {
"name": "Christine M. Falter-Wagner",
"email": "christine.falter-wagner@example.com"
},
"pdfUrl": "https://www.medrxiv.org/content/10.1101/2025.10.17.25338146v1.full.pdf",
"citationInformation": "Highly scalable technology-assisted differential diagnostics of ASD",
"relatedArticles": [],
"versionHistory": [
{
"version": "1",
"date": "October 19, 2025"
}
],
"licenseInformation": "The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.",
"fundingStatement": "This research was supported by...",
"competingInterestStatement": "The authors declare no competing interests.",
"authorDeclarations": "All authors contributed to the study design and data collection...",
"dataAvailability": "Data and code are available at the following repository...",
"dataCodeUrl": "https://github.com/example/repository",
"supplementaryMaterials": [],
"scrapedTimestamp": "2025-12-03T16:54:35.000Z"
}

What You Get:

  • 📄 Complete Article Data - Full titles, abstracts, and article text
  • 👥 Author Information - All authors with affiliations and contact details
  • 📅 Publication Dates - When articles were posted to medRxiv
  • 🔗 DOIs and Citations - Complete citation information for academic use
  • 💰 Funding Information - Funding statements and grant information
  • 📊 Subject Areas - Research categories and keywords
  • 📎 PDF Downloads - Direct links to article PDFs
  • 🔄 Version History - Track article versions and updates
  • 📦 Data Availability - Links to data repositories and code
  • ⚖️ Declarations - Competing interests and author declarations

Download Options: CSV, Excel, or JSON formats for easy analysis in your research tools

Why Choose the medRxiv Scraper?

  • 🎯 Comprehensive Data: Get all available article information in one scrape - metadata, full text, declarations, and more
  • 🔍 Flexible Search: Simply enter a search term or use custom URLs to find articles
  • 📊 Multiple Sort Options: Sort by relevance, newest, or oldest articles
  • 📄 Full Text Extraction: Get complete article text, not just abstracts
  • 👥 Complete Author Info: Access author names, affiliations, and contact details
  • 💰 Funding Data: Extract funding statements and grant information
  • 🔗 Data Repositories: Find links to data and code repositories
  • 📎 PDF Downloads: Direct links to article PDFs for offline reading
  • ⚡ User-Friendly: No coding needed—just input search term or URL and go
  • 🔄 Parallel Processing: Fetches multiple articles simultaneously for faster results

Time Savings: Save 6-8 hours per week compared to manual article collection
Cost Efficiency: Fraction of the cost of hiring a research assistant or using expensive academic databases

How to Use

  1. Sign Up: Create a free account w/ $5 credit (takes 2 minutes)
  2. Find the Scraper: Visit the medRxiv Scraper page
  3. Set Input:
    • Option A (Recommended): Enter a search query and select sort order
    • Option B: Add your custom medRxiv search URL
    • Set max items (optional)
  4. Run It: Click "Start" and let it collect your data
  5. Download Data: Get your results in the "Dataset" tab as CSV, Excel, or JSON

Total Time: 3 minutes setup, 10-30 minutes for data collection
No Technical Skills Required: Everything is point-and-click

Business Use Cases

🔬 Researchers:

  • Conduct systematic literature reviews
  • Track research trends in specific fields
  • Collect data for meta-analyses
  • Monitor new publications in your area of interest
  • Build comprehensive research databases

👩‍🏫 Academics:

  • Stay current with latest research
  • Find relevant papers for teaching materials
  • Track citations and related work
  • Analyze research trends over time
  • Support grant applications with literature reviews

📊 Data Analysts:

  • Build research databases for analysis
  • Track publication patterns and trends
  • Analyze funding and collaboration networks
  • Create research intelligence dashboards
  • Support evidence-based decision making

🏥 Healthcare Professionals:

  • Stay updated on latest medical research
  • Find evidence for clinical decisions
  • Track research in specific medical conditions
  • Access preprints before peer review
  • Build personal research libraries

📚 Medical Librarians:

  • Help researchers find relevant articles
  • Build institutional research collections
  • Track new publications in specific fields
  • Support systematic review processes
  • Create curated research databases

Using medRxiv Scraper with the Apify API

For advanced users who want to automate this process, you can control the scraper programmatically with the Apify API. This allows you to schedule regular data collection and integrate with your existing research tools.

Example API Usage:

// Node.js example
const { ApifyApi } = require('apify-client');
const client = new ApifyApi({
token: 'YOUR_API_TOKEN',
});
// Run with search query
await client.actor('YOUR_ACTOR_ID').call({
searchQuery: "COVID-19 treatment",
orderBy: "newest",
maxItems: 100
});
// Run with custom URL
await client.actor('YOUR_ACTOR_ID').call({
startUrl: "https://www.medrxiv.org/search/diabetes",
orderBy: "relevance",
maxItems: 50
});
  • Node.js: Install the apify-client NPM package
  • Python: Use the apify-client PyPI package
  • See the Apify API reference for full details

Frequently Asked Questions

Q: How accurate is the data? A: We collect data directly from medRxiv.org's official website in real-time, ensuring the most up-to-date and accurate preprint information available.

Q: Can I get full text of articles? A: Yes! The scraper extracts complete article text from detail pages, not just abstracts. Full text is available when the article has a full text page.

Q: What search terms work best? A: Use specific medical or research terms. For example, "autism spectrum disorder", "COVID-19 treatment", or "diabetes prevention" work well. More specific terms yield better results.

Q: Can I sort results by date? A: Yes! Use the orderBy parameter to sort by "newest" (latest first), "oldest" (oldest first), or "relevance" (best match).

Q: How many articles can I scrape? A: Free users can scrape up to 50 articles per run. Paid users can scrape up to 1,000,000 articles or leave it unlimited.

Q: Can I schedule regular runs? A: Yes! Use the Apify API to schedule daily, weekly, or monthly runs automatically. Perfect for ongoing research monitoring.

Q: What if I need help? A: Our support team is available 24/7. Contact us through the Apify platform.

Q: Is my data secure? A: Absolutely. All data is encrypted in transit and at rest. We never share your data with third parties.

Integrate medRxiv Scraper with any app and automate your workflow

Last but not least, medRxiv Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform.

These includes:

Alternatively, you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever medRxiv Scraper successfully finishes a run.

Looking for more data collection tools? Check out these related actors:

ActorDescriptionLink
PubMed ScraperExtracts research articles and citations from PubMed databasehttps://apify.com/parseforge/pubmed-scraper
PubMed Citation ScraperCollects citation data and references from PubMed articleshttps://apify.com/parseforge/pubmed-citation-scraper
arXiv ScraperExtracts research papers from arXiv preprint serverhttps://apify.com/parseforge/arxiv-scraper
Semantic Scholar ScraperCollects academic paper data from Semantic Scholarhttps://apify.com/parseforge/semantic-scholar-scraper
Open Library ScraperExtracts book and publication data from Open Libraryhttps://apify.com/parseforge/open-library-scraper

Pro Tip: 💡 Browse our complete collection of data collection actors to find the perfect tool for your research needs.

Need Help? Our support team is here to help you get the most out of this tool.


⚠️ Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by medRxiv or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.