Pricing

Pay per usage

Academic Paper Scraper

Search arXiv and PubMed in one request. Returns unified paper data: titles, authors, abstracts, DOIs, and PDF links. Filter by keywords, authors, categories, and date range. Built-in rate limiting and cross-source deduplication. Export to JSON, CSV, or Excel.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

Academic Research Paper Scraper

Apify actor that scrapes academic papers from arXiv and PubMed with a unified output format.

Features

Dual Source Support: Search both arXiv and PubMed simultaneously
Unified Output: Consistent paper format regardless of source
Smart Deduplication: Remove duplicates by DOI across sources
Flexible Filtering: Filter by title, author, categories, and date range
Rate Limit Compliance: Built-in throttling for API guidelines
PubMed API Key Support: Optional API key for faster PubMed access

Input Parameters

Parameter	Type	Required	Default	Description
`searchQuery`	string	Yes	-	Keywords, phrases, or terms to search
`sources`	array	No	`["arxiv", "pubmed"]`	Which databases to search
`titleFilter`	string	No	-	Filter papers by title keywords
`authorFilter`	string	No	-	Filter by author name
`categories`	array	No	-	arXiv categories (e.g., `cs.AI`, `physics.quant-ph`)
`dateFrom`	string	No	-	Start date (YYYY-MM-DD)
`dateTo`	string	No	-	End date (YYYY-MM-DD)
`maxResults`	integer	No	100	Max papers per source (1-10000)
`sortBy`	string	No	`relevance`	Sort order: `relevance`, `date_desc`, `date_asc`
`pubmedApiKey`	string	No	-	NCBI API key for faster rate limits
`unpaywallEmail`	string	No	-	Your email for Unpaywall API (free, no signup)
`includeAbstract`	boolean	No	true	Include full abstract text
`deduplicateByDoi`	boolean	No	true	Remove cross-source duplicates

Output Format

Each paper in the dataset includes:

{
  "id": "arxiv:2401.12345",
  "source": "arxiv",
  "doi": "10.1234/example",
  "arxivId": "2401.12345",
  "title": "Paper Title",
  "abstract": "Full abstract text...",
  "authors": [
    { "name": "John Doe", "affiliation": "University" }
  ],
  "publishedDate": "2024-01-15",
  "updatedDate": "2024-01-20",
  "categories": ["cs.AI", "cs.LG"],
  "journal": "Nature",
  "abstractUrl": "https://arxiv.org/abs/2401.12345",
  "pdfUrl": "https://arxiv.org/pdf/2401.12345.pdf"
}

Example Usage

Search for AI papers

{
  "searchQuery": "transformer attention mechanism",
  "sources": ["arxiv", "pubmed"],
  "categories": ["cs.AI", "cs.LG", "cs.CL"],
  "maxResults": 50,
  "sortBy": "date_desc"
}

Search by author

{
  "searchQuery": "deep learning",
  "authorFilter": "Hinton",
  "dateFrom": "2020-01-01",
  "maxResults": 100
}

PubMed only with API key

{
  "searchQuery": "CRISPR gene editing",
  "sources": ["pubmed"],
  "pubmedApiKey": "your-ncbi-api-key",
  "maxResults": 500
}

arXiv Categories

Common categories you can filter by:

Computer Science: cs.AI, cs.CL, cs.CV, cs.LG, cs.NE, cs.RO
Physics: physics.quant-ph, physics.comp-ph
Mathematics: math.OC, math.ST
Statistics: stat.ML, stat.ME
Quantitative Biology: q-bio.BM, q-bio.GN, q-bio.NC

Full taxonomy: https://arxiv.org/category_taxonomy

PDF Link Availability

Understanding why some papers lack PDF links:

Source	PDF Availability	Notes
arXiv	✅ Always available	arXiv is fully open access
PubMed with PMCID	✅ Available	Paper deposited in PubMed Central
PubMed without PMCID	⚠️ Often unavailable	Paywalled journal articles

Unpaywall Integration

When you provide unpaywallEmail, the actor queries Unpaywall to find open access versions of papers that lack PDF links. This can recover PDFs from:

Institutional repositories
Author preprint servers
Publisher open access copies

Limitations:

New papers (< 2-4 weeks old): Unpaywall may not have indexed them yet
Paywalled papers with no OA version: No legal free PDF exists
Papers without DOI: Cannot be looked up in Unpaywall

For recent papers without PDFs, the abstractUrl field always provides a link to the paper's landing page.

Rate Limits

The actor respects API rate limits:

arXiv: 3-second delay between requests
PubMed: 3 requests/second (or 10/second with API key)
Unpaywall: 10 requests/second

Get a free PubMed API key at: https://www.ncbi.nlm.nih.gov/account/

Data Sources

arXiv API - Open access preprint server
PubMed E-utilities - Biomedical literature database

License

MIT License

arXiv Search Scraper 📚

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. 🎓📚

EasyApi

5.0

ArXiv Academic Paper Scraper

fortuitous_pirate/arxiv-scraper

Scrape academic papers from ArXiv. Extract titles, authors, abstracts, categories, and PDF links. Essential for research and literature reviews.

Fortuitous Pirate

arXiv Scraper

artificially/arxiv-scraper

Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.

Artificially

Arxiv Paper Scraper

technicaldost/arxiv-paper-scraper

Technical Dost Solutions

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

ArXiv Paper Scraper

nexgendata/arxiv-scraper

Extract research papers, abstracts, authors, and citations from arXiv.org. Perfect for academic research monitoring, literature reviews, and scientific trend analysis.

Stephan Corbeil

Semantic Scholar Scraper - Cheap 📚🔎🤖

scrapestorm/semantic-scholar-scraper---cheap

🔎 Easily collect research papers from Semantic Scholar Provide one or multiple search keywords, paper URLs or author profiles and extract structured academic data such as 📄 Paper Title👨‍🔬 Authors 📅 Publication Year 🔗 Paper URL & more Perfect for academic research & AI research monitoring 📚

Storm_Scraper

5.0

Arxiv Paper Intelligence

viralanalyzer/arxiv-paper-intelligence

Search and extract ArXiv papers, abstracts, authors, and citations. Track research trends across any scientific field. AI-powered analysis.

viralanalyzer

5.0

CORE Open Access Paper Search

ryanclinton/core-academic-search

Search 300M+ open access academic papers via the CORE API. Find research by keywords, year, and language. Get titles, authors, abstracts, DOIs, citation counts, and direct PDF download links.

ryan clinton

ArXiv Preprint Paper Search

ryanclinton/arxiv-paper-search

Search 2.4M+ preprint papers on ArXiv. Filter by keyword, author, category (cs.AI, cs.CL, math, physics, etc.), sort by relevance or date. Returns titles, abstracts, authors, categories, PDF links, DOIs. Free API, no key needed.

ryan clinton