arXiv Paper Scraper
Pricing
from $1.00 / 1,000 results
Go to Apify Store
arXiv Paper Scraper
Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer

Daniel
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrapes academic papers from arXiv using the public Atom API.
Features
- Search mode — free-text search across all arXiv papers
- Category mode — browse papers by arXiv category (e.g.
cs.AI,math.CO,physics.optics) - Configurable sorting (by relevance, last updated, or submission date)
- Pagination with polite rate limiting (3s between requests)
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | search | search or category |
query | string | machine learning | Search query (for search mode) |
category | string | cs.AI | arXiv category code (for category mode) |
max_items | integer | 10 | Maximum papers to scrape (1–1000) |
sort_by | string | relevance | relevance, lastUpdatedDate, or submittedDate |
sort_order | string | descending | ascending or descending |
Output
Each result contains:
arxiv_id— arXiv paper ID (e.g.2301.07041)title— paper titlesummary— abstractauthors— list of author namescategories— list of arXiv categoriesprimary_category— primary categorypublished— publication date (ISO 8601)updated— last updated date (ISO 8601)pdf_url— direct PDF linkabs_url— abstract page linkcomment— author comment (optional)journal_ref— journal reference (optional)doi— DOI (optional)
Local Usage
$python -m src -i .actor/input.json
Example Input
{"mode": "search","query": "transformer architecture","max_items": 20,"sort_by": "submittedDate","sort_order": "descending"}
Notes
- The arXiv API has a rate limit of ~1 request per 3 seconds. The scraper respects this.
- Maximum 100 results per API request; pagination is handled automatically.
- arXiv categories list: https://arxiv.org/category_taxonomy