arXiv Scraper
Pricing
Pay per event
arXiv Scraper
Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Returns titles, authors, abstracts, categories, and PDF links.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Export preprints and papers from arXiv.org — the leading open-access repository for 2.5 million+ scientific papers across physics, mathematics, computer science, biology, economics, and quantitative finance.
This actor queries the official ArXiv Atom API (export.arxiv.org/api/query) — the method ArXiv officially supports for programmatic data access. No scraping, no JavaScript rendering, no account required.
What you get
Each result includes:
- arxiv_id — the canonical short ID (e.g.
2301.12345) - abs_url — link to the abstract page
- pdf_url — direct PDF download link
- title — full paper title
- abstract — complete abstract / summary
- authors — comma-separated author names
- primary_category — primary subject category (e.g.
cs.AI) - categories — all subject categories, comma-separated
- published — original submission date (ISO 8601)
- updated — date of the latest version
- comment — author notes (page count, conference, etc.) if available
Search query syntax
The searchQuery field supports ArXiv's full query language:
| Pattern | Example | Meaning |
|---|---|---|
| Plain keyword | machine learning | Full-text search |
| Title | ti:attention | Papers with "attention" in the title |
| Author | au:Hinton | Papers by Hinton |
| Abstract | abs:transformer | Papers with "transformer" in abstract |
| Category | cat:cs.AI | Papers in the cs.AI category |
| Boolean | cat:cs.LG AND ti:diffusion | Category AND title filter |
| Date range | submittedDate:[202301010000 TO 202312312359] | Papers from 2023 |
See the ArXiv query language reference for the full syntax.
Common arXiv categories
| Category | Field |
|---|---|
cs.AI | Artificial Intelligence |
cs.LG | Machine Learning |
cs.CL | Computation and Language (NLP) |
cs.CV | Computer Vision |
physics.hep-th | High Energy Physics Theory |
math.CO | Combinatorics |
q-bio.NC | Neurons and Cognition |
econ.GN | General Economics |
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
searchQuery | string | required | ArXiv query expression |
maxItems | integer | 50 | Maximum number of papers to return |
sortBy | string | submittedDate | Sort field: relevance, lastUpdatedDate, submittedDate |
sortOrder | string | descending | ascending or descending |
Usage examples
Fetch the 100 most recent cs.AI papers:
{"searchQuery": "cat:cs.AI","maxItems": 100,"sortBy": "submittedDate","sortOrder": "descending"}
Find papers by a specific author:
{"searchQuery": "au:LeCun","maxItems": 50,"sortBy": "relevance"}
Search for diffusion model papers from 2024:
{"searchQuery": "ti:diffusion AND submittedDate:[202401010000 TO 202412312359]","maxItems": 200}
Technical notes
- Uses the ArXiv Atom API — ArXiv's official programmatic interface
- Pagination is handled automatically; set
maxItemsto any number - Rate-limited to ~1 request/second per ArXiv usage guidelines
- No authentication required
- Results span all of arXiv's subject areas (2.5M+ papers total)