arXiv Scraper - Scientific Papers, Abstracts & PDFs
Pricing
Pay per usage
arXiv Scraper - Scientific Papers, Abstracts & PDFs
arXiv Scraper for the official arXiv API. Search 2M+ scientific papers in CS, physics, math and biology by keyword, title, author, abstract or category. Extract title, authors, abstract, categories, DOI, dates and PDF links. For AI/ML research, literature reviews and RAG datasets.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
ben
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
1
Monthly active users
15 hours ago
Last modified
Categories
Share
arXiv Scraper — Scientific Papers, Abstracts & PDFs
Search arXiv.org — 2M+ open-access scientific papers in physics, CS, math, biology, economics and more — via the official arXiv API.
Built for AI/ML research, literature reviews, RAG datasets, and research analytics. Keyless, fast and reliable — no proxy or browser needed.
What you get
Per paper:
- title, arxiv_id
- authors, author_count
- abstract (full text)
- primary_category, categories
- published, updated
- doi, journal_ref, comment
- pdf_url, abstract_url
- scraped_at
Why this Actor
| arXiv Scraper | Manual search | Raw arXiv API | |
|---|---|---|---|
| Clean flat JSON output | Yes | — | Atom XML to parse |
| Search + filters + paging | Yes | Slow | DIY |
| PDF + abstract links | Yes | Manual | Yes |
| Pay per result | Yes | — | — |
Input
Use the simple fields, or a raw searchQuery for full arXiv syntax.
| Field | Type | Description |
|---|---|---|
allFields | string | Keyword across title/abstract/authors |
title | string | Title contains |
author | string | Author name |
abstract | string | Abstract contains |
category | string | arXiv category (e.g. cs.LG, cs.CL, cs.AI) |
searchQuery | string | Advanced raw query (overrides the above) |
sortBy | string | Relevance / Newest / Recently updated |
maxResults | integer | Max papers to return |
Example: newest LLM papers
{"allFields": "large language models","sortBy": "newest","maxResults": 100}
Example: a category, advanced syntax
{"searchQuery": "cat:cs.CL AND abs:retrieval augmented","sortBy": "newest","maxResults": 200}
Sample output
{"arxiv_id": "2605.30351v1","title": "VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Video","authors": ["Hidir Yesiltepe", "Jiazhen Hu"],"primary_category": "cs.CV","categories": ["cs.CV", "cs.AI"],"published": "2026-05-28T17:59:57Z","abstract": "Long-rollout causal video diffusion...","pdf_url": "https://arxiv.org/pdf/2605.30351v1","abstract_url": "https://arxiv.org/abs/2605.30351v1"}
Use cases
- AI/ML research — track the latest papers in a field or category
- RAG / LLM datasets — build corpora of abstracts + PDF links by topic
- Literature reviews — gather and rank relevant papers fast
- Research analytics — analyse output by category, author and time
Pricing
Pay-per-result. You are charged only for the papers returned — empty runs cost nothing.
Notes & legal
- Uses the official arXiv API. Please respect arXiv's API terms and rate limits (the Actor waits between requests).
- Use data only for lawful purposes.
Related actors
More scrapers from the same author:
- OpenAlex Scraper — academic papers & citations
- PubMed Scraper — biomedical literature & citations
- Reddit Archive Scraper — years of historical posts & comments