arXiv Research Paper Scraper
Pricing
from $3.00 / 1,000 results
arXiv Research Paper Scraper
Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Crawler Bros
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape research papers from arXiv.org — the world's largest preprint repository with 2M+ papers in physics, mathematics, computer science, quantitative biology, economics, and more.
Uses the official arXiv Atom API (http://export.arxiv.org/api/query). No login, no API key, no proxy required.
Features
- Search papers by free-text keyword (e.g.
"neural networks","quantum computing") - Browse by category — 25+ subject categories (cs.AI, math.CO, physics.optics, etc.)
- Get papers by author — all papers by a specific researcher
- Fetch specific papers by arXiv ID (e.g.
1706.03762for "Attention Is All You Need") - Full metadata — title, authors, abstract, categories, PDF URL, DOI, journal reference, comment
- Pagination — retrieve up to 2,000 papers per run
- Rate-limited — respects arXiv's polite-use policy (0.5s between requests)
Input Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
mode | Select | searchPapers, getByCategory, getByAuthor, getById | searchPapers |
query | String | Free-text query for searchPapers mode | "neural networks" |
category | Select | Subject category for getByCategory mode | cs.AI |
authorName | String | Author name for getByAuthor mode | — |
arxivIds | Array | List of arXiv IDs for getById mode | — |
sortBy | Select | relevance, submittedDate, lastUpdatedDate | submittedDate |
sortOrder | Select | ascending, descending | descending |
maxItems | Integer | Maximum papers to return (1–2000) | 50 |
Output Fields
Each record contains:
| Field | Type | Description |
|---|---|---|
arxivId | String | arXiv paper ID (e.g. 2401.12345) |
title | String | Paper title |
abstract | String | Full abstract |
authors | Array | List of author names |
categories | Array | Subject categories (e.g. ["cs.AI", "cs.LG"]) |
published | String | Original submission date (ISO 8601) |
updated | String | Last update date (ISO 8601) |
pdfUrl | String | Direct PDF download link |
abstractUrl | String | arXiv abstract page URL |
doi | String | DOI if assigned to published version |
journalRef | String | Journal reference if published |
comment | String | Author comment (e.g. "15 pages, 5 figures") |
scrapedAt | String | ISO 8601 timestamp of when the record was scraped |
Supported Categories
Browse all papers in a subject area:
| Category | Description |
|---|---|
cs.AI | Artificial Intelligence |
cs.LG | Machine Learning |
cs.CV | Computer Vision |
cs.CL | Computation & Language (NLP) |
cs.SE | Software Engineering |
cs.CR | Cryptography & Security |
math.CO | Combinatorics |
math.ST | Statistics Theory |
physics.optics | Optics |
q-bio.GN | Genomics |
econ.EM | Econometrics |
stat.ML | Machine Learning (Statistics) |
astro-ph.GA | Astrophysics of Galaxies |
| ...and more | See input schema for full list |
Example Use Cases
Search for recent AI papers
{"mode": "searchPapers","query": "large language models","sortBy": "submittedDate","sortOrder": "descending","maxItems": 100}
Browse computer vision papers
{"mode": "getByCategory","category": "cs.CV","maxItems": 50}
Get papers by a specific author
{"mode": "getByAuthor","authorName": "LeCun","maxItems": 30}
Fetch specific papers by ID
{"mode": "getById","arxivIds": ["1706.03762", "2005.14165", "2303.08774"]}
Data Source
Data is fetched from the official arXiv API (export.arxiv.org), which is freely accessible without registration or authentication.
arXiv is operated by Cornell University and serves as the primary preprint repository for physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering, and economics.
FAQs
Does this require an API key? No. The arXiv API is publicly accessible without authentication.
Is there a rate limit? arXiv recommends at least 0.5 seconds between requests. This actor respects that limit automatically.
How many papers can I scrape? Up to 2,000 papers per run. arXiv's API supports up to 30,000 results per query, but response times increase with page depth.
Can I get papers from a specific date range?
Use the searchPapers mode with arXiv's query syntax: "all:neural networks AND submittedDate:[2024 TO 2025]".
Are preprints included? Yes — arXiv is a preprint server, so most papers are preprints before or alongside journal publication.
What's the difference between published and updated?
published is when the paper was first submitted to arXiv. updated is when the latest version was uploaded.