ArXiv Paper Scraper
Pricing
from $2.00 / 1,000 results
ArXiv Paper Scraper
Search and extract scientific papers from ArXiv.org across any field. Returns title, authors, full abstract, PDF link, arXiv ID, categories, and submission date. Ideal for AI research monitoring, RAG pipelines, literature reviews, and academic trend analysis. No API key needed.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
Sheshinmcfly
Actor stats
1
Bookmarked
2
Total users
0
Monthly active users
5 hours ago
Last modified
Categories
Share
Search and extract scientific papers from ArXiv.org — the largest open-access repository of preprints in physics, mathematics, computer science, AI, and more.
Returns full metadata including title, authors, abstract, categories, submission date, and PDF link. Perfect for AI research pipelines, RAG systems, and academic trend monitoring.
What data does it extract?
| Field | Description | Example |
|---|---|---|
arxivId | ArXiv paper ID | "2604.18584" |
title | Full paper title | "MathNet: a Global Multimodal Benchmark..." |
authors | List of authors | ["Shaden Alshammari", "Kevin Wen"] |
abstract | Full abstract text | "Mathematical problem solving remains..." |
categories | ArXiv subject tags | ["cs.AI", "cs.LG", "cs.IR"] |
primaryCategory | Primary category | "cs.AI" |
submittedDate | Submission date | "20 April, 2026" |
comments | Author comments | "ICLR 2026; 30 pages" |
journalRef | Journal reference | "Proceedings of ICLR, 2026" |
pdfUrl | Direct PDF link | "https://arxiv.org/pdf/2604.18584" |
url | ArXiv abstract page | "https://arxiv.org/abs/2604.18584" |
query | Search query used | "large language models" |
extractedAt | Extraction timestamp | "2026-04-21T12:00:00Z" |
Use cases
- RAG pipelines: Feed domain-specific papers into retrieval-augmented AI systems
- AI research monitoring: Track the latest publications in LLMs, computer vision, NLP
- Academic trend analysis: Identify hot topics and emerging research areas
- Literature review automation: Collect papers for a specific topic at scale
- LLM fine-tuning data: High-quality scientific text for model training
- Competitive intelligence: Monitor what research competitors are publishing
How to use
- Open the actor and configure:
- Search queries: One or more search terms (e.g.
"diffusion models","reinforcement learning") - Search field: All fields, title only, abstract only, or author
- Sort by: Newest first or by relevance
- Max results: Number of papers per query
- Search queries: One or more search terms (e.g.
- Click Start
- Download results as JSON, CSV, or Excel
Example output (JSON)
{"arxivId": "2604.18584","title": "MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval","authors": ["Shaden Alshammari", "Kevin Wen", "Antonio Torralba"],"abstract": "Mathematical problem solving remains a challenging test of reasoning...","categories": ["cs.AI", "cs.DL", "cs.IR", "cs.LG"],"primaryCategory": "cs.AI","submittedDate": "20 April, 2026","comments": "ICLR 2026; Website: http://mathnet.mit.edu","journalRef": "Proceedings of ICLR, 2026","pdfUrl": "https://arxiv.org/pdf/2604.18584","url": "https://arxiv.org/abs/2604.18584","query": "large language models","extractedAt": "2026-04-21T12:00:00.000Z"}
Pricing
This actor charges $0.002 USD per paper extracted. Extracting 100 papers costs approximately $0.20 USD.
Keywords
arxiv scraper, scientific paper extractor, research paper scraper, arxiv API, AI paper scraper, academic data extractor, preprint scraper, NLP research data, LLM training data, arxiv search scraper
Legal Disclaimer
This actor extracts publicly available open-access data only from ArXiv.org, in compliance with Chilean Law 19.628 on the Protection of Private Life (Ley 19.628 sobre Protección de la Vida Privada).
ArXiv is an open-access repository operated by Cornell University. All papers and metadata extracted are freely and publicly accessible without authentication.
What this actor does NOT collect:
- Names or personal data of any private individuals
- User accounts, submissions portals, or private information
- Any data not freely visible to anonymous visitors
What this actor collects:
- Paper titles, abstracts, and author names (public academic data)
- Subject categories and submission dates
- Public URLs and PDF links
Users are solely responsible for ensuring their use of this data complies with applicable laws and ArXiv's terms of use.