ArXiv Paper Scraper avatar

ArXiv Paper Scraper

Pricing

from $2.00 / 1,000 results

Go to Apify Store
ArXiv Paper Scraper

ArXiv Paper Scraper

Search and extract scientific papers from ArXiv.org across any field. Returns title, authors, full abstract, PDF link, arXiv ID, categories, and submission date. Ideal for AI research monitoring, RAG pipelines, literature reviews, and academic trend analysis. No API key needed.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Sheshinmcfly

Sheshinmcfly

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

0

Monthly active users

5 hours ago

Last modified

Share

Search and extract scientific papers from ArXiv.org — the largest open-access repository of preprints in physics, mathematics, computer science, AI, and more.

Returns full metadata including title, authors, abstract, categories, submission date, and PDF link. Perfect for AI research pipelines, RAG systems, and academic trend monitoring.


What data does it extract?

FieldDescriptionExample
arxivIdArXiv paper ID"2604.18584"
titleFull paper title"MathNet: a Global Multimodal Benchmark..."
authorsList of authors["Shaden Alshammari", "Kevin Wen"]
abstractFull abstract text"Mathematical problem solving remains..."
categoriesArXiv subject tags["cs.AI", "cs.LG", "cs.IR"]
primaryCategoryPrimary category"cs.AI"
submittedDateSubmission date"20 April, 2026"
commentsAuthor comments"ICLR 2026; 30 pages"
journalRefJournal reference"Proceedings of ICLR, 2026"
pdfUrlDirect PDF link"https://arxiv.org/pdf/2604.18584"
urlArXiv abstract page"https://arxiv.org/abs/2604.18584"
querySearch query used"large language models"
extractedAtExtraction timestamp"2026-04-21T12:00:00Z"

Use cases

  • RAG pipelines: Feed domain-specific papers into retrieval-augmented AI systems
  • AI research monitoring: Track the latest publications in LLMs, computer vision, NLP
  • Academic trend analysis: Identify hot topics and emerging research areas
  • Literature review automation: Collect papers for a specific topic at scale
  • LLM fine-tuning data: High-quality scientific text for model training
  • Competitive intelligence: Monitor what research competitors are publishing

How to use

  1. Open the actor and configure:
    • Search queries: One or more search terms (e.g. "diffusion models", "reinforcement learning")
    • Search field: All fields, title only, abstract only, or author
    • Sort by: Newest first or by relevance
    • Max results: Number of papers per query
  2. Click Start
  3. Download results as JSON, CSV, or Excel

Example output (JSON)

{
"arxivId": "2604.18584",
"title": "MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval",
"authors": ["Shaden Alshammari", "Kevin Wen", "Antonio Torralba"],
"abstract": "Mathematical problem solving remains a challenging test of reasoning...",
"categories": ["cs.AI", "cs.DL", "cs.IR", "cs.LG"],
"primaryCategory": "cs.AI",
"submittedDate": "20 April, 2026",
"comments": "ICLR 2026; Website: http://mathnet.mit.edu",
"journalRef": "Proceedings of ICLR, 2026",
"pdfUrl": "https://arxiv.org/pdf/2604.18584",
"url": "https://arxiv.org/abs/2604.18584",
"query": "large language models",
"extractedAt": "2026-04-21T12:00:00.000Z"
}

Pricing

This actor charges $0.002 USD per paper extracted. Extracting 100 papers costs approximately $0.20 USD.


Keywords

arxiv scraper, scientific paper extractor, research paper scraper, arxiv API, AI paper scraper, academic data extractor, preprint scraper, NLP research data, LLM training data, arxiv search scraper


This actor extracts publicly available open-access data only from ArXiv.org, in compliance with Chilean Law 19.628 on the Protection of Private Life (Ley 19.628 sobre Protección de la Vida Privada).

ArXiv is an open-access repository operated by Cornell University. All papers and metadata extracted are freely and publicly accessible without authentication.

What this actor does NOT collect:

  • Names or personal data of any private individuals
  • User accounts, submissions portals, or private information
  • Any data not freely visible to anonymous visitors

What this actor collects:

  • Paper titles, abstracts, and author names (public academic data)
  • Subject categories and submission dates
  • Public URLs and PDF links

Users are solely responsible for ensuring their use of this data complies with applicable laws and ArXiv's terms of use.