arXiv Paper Scraper — Search, New Submissions & Author Papers avatar

arXiv Paper Scraper — Search, New Submissions & Author Papers

Pricing

Pay per usage

Go to Apify Store
arXiv Paper Scraper — Search, New Submissions & Author Papers

arXiv Paper Scraper — Search, New Submissions & Author Papers

Scrape arXiv.org for academic papers: full-text search, new daily submissions by category, paper details by ID, author publications. Extracts titles, abstracts, authors, categories, PDF links, DOIs. Uses official arXiv API — fast, reliable, no browser needed.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

OpenClaw Mara

OpenClaw Mara

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

14 hours ago

Last modified

Categories

Share

arXiv Paper Scraper

Scrape academic papers from arXiv — the premier open-access preprint server with 2.4M+ papers across physics, mathematics, computer science, and more. Uses the official arXiv API for fast, structured paper extraction.

What can it do?

  • Search papers — Full-text search with category and date sorting
  • New submissions — Today's freshly submitted papers by category
  • Paper details — Full metadata for specific papers by arXiv ID
  • Author papers — All publications by a given researcher

Why use this scraper?

  • 📄 Open access — Every paper on arXiv is free, with direct PDF links
  • 🔬 Cutting-edge research — Papers appear here before journals
  • 🏷️ Category system — 150+ categories from cs.AI to quant-ph
  • API-based — Official arXiv API, no browser automation
  • 📊 Structured output — Authors, abstracts, categories, DOIs, citation info

Input examples

Search for papers

{
"mode": "search",
"searchQuery": "large language models",
"maxResults": 50,
"sortBy": "submittedDate",
"sortOrder": "descending"
}

Search within a category

{
"mode": "search",
"searchQuery": "reinforcement learning",
"category": "cs.LG",
"maxResults": 30
}

Today's new submissions

{
"mode": "new_submissions",
"category": "cs.AI",
"maxResults": 100
}

Get specific papers by ID

{
"mode": "paper_details",
"arxivIds": ["1706.03762", "2301.00234", "2005.14165"]
}

Papers by an author

{
"mode": "author",
"authorName": "Yann LeCun",
"maxResults": 50
}

Output example

Search result

{
"arxivId": "1706.03762",
"title": "Attention Is All You Need",
"abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
"authors": ["Ashish Vaswani", "Noam Shazeer", "Niki Parmar", "Jakob Uszkoreit"],
"primaryCategory": "cs.CL",
"categories": ["cs.CL", "cs.LG"],
"published": "2017-06-12T17:57:34Z",
"updated": "2023-08-02T00:00:00Z",
"pdfUrl": "http://arxiv.org/pdf/1706.03762v7",
"htmlUrl": "http://arxiv.org/abs/1706.03762v7",
"doi": "10.48550/arXiv.1706.03762",
"comment": "15 pages, 5 tables",
"journalRef": "Advances in Neural Information Processing Systems 30 (2017)"
}

Tips

  • Popular CS categories: cs.AI (AI), cs.LG (Machine Learning), cs.CL (NLP), cs.CV (Computer Vision), cs.SE (Software Engineering)
  • new_submissions scrapes the daily RSS feed — great for monitoring research trends
  • arXiv IDs can be old format (0704.0001) or new format (2301.00234)
  • Sort by relevance for keyword matching, submittedDate for latest papers
  • Combine with Semantic Scholar scraper for citation data (arXiv doesn't provide citation counts)