arXiv Scraper - Scientific Papers, Abstracts & PDFs avatar

arXiv Scraper - Scientific Papers, Abstracts & PDFs

Pricing

Pay per usage

Go to Apify Store
arXiv Scraper - Scientific Papers, Abstracts & PDFs

arXiv Scraper - Scientific Papers, Abstracts & PDFs

arXiv Scraper for the official arXiv API. Search 2M+ scientific papers in CS, physics, math and biology by keyword, title, author, abstract or category. Extract title, authors, abstract, categories, DOI, dates and PDF links. For AI/ML research, literature reviews and RAG datasets.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

ben

ben

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

14 hours ago

Last modified

Categories

Share

arXiv Scraper — Scientific Papers, Abstracts & PDFs

Search arXiv.org — 2M+ open-access scientific papers in physics, CS, math, biology, economics and more — via the official arXiv API.

Built for AI/ML research, literature reviews, RAG datasets, and research analytics. Keyless, fast and reliable — no proxy or browser needed.

What you get

Per paper:

  • title, arxiv_id
  • authors, author_count
  • abstract (full text)
  • primary_category, categories
  • published, updated
  • doi, journal_ref, comment
  • pdf_url, abstract_url
  • scraped_at

Why this Actor

arXiv ScraperManual searchRaw arXiv API
Clean flat JSON outputYesAtom XML to parse
Search + filters + pagingYesSlowDIY
PDF + abstract linksYesManualYes
Pay per resultYes

Input

Use the simple fields, or a raw searchQuery for full arXiv syntax.

FieldTypeDescription
allFieldsstringKeyword across title/abstract/authors
titlestringTitle contains
authorstringAuthor name
abstractstringAbstract contains
categorystringarXiv category (e.g. cs.LG, cs.CL, cs.AI)
searchQuerystringAdvanced raw query (overrides the above)
sortBystringRelevance / Newest / Recently updated
maxResultsintegerMax papers to return

Example: newest LLM papers

{
"allFields": "large language models",
"sortBy": "newest",
"maxResults": 100
}

Example: a category, advanced syntax

{
"searchQuery": "cat:cs.CL AND abs:retrieval augmented",
"sortBy": "newest",
"maxResults": 200
}

Sample output

{
"arxiv_id": "2605.30351v1",
"title": "VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Video",
"authors": ["Hidir Yesiltepe", "Jiazhen Hu"],
"primary_category": "cs.CV",
"categories": ["cs.CV", "cs.AI"],
"published": "2026-05-28T17:59:57Z",
"abstract": "Long-rollout causal video diffusion...",
"pdf_url": "https://arxiv.org/pdf/2605.30351v1",
"abstract_url": "https://arxiv.org/abs/2605.30351v1"
}

Use cases

  • AI/ML research — track the latest papers in a field or category
  • RAG / LLM datasets — build corpora of abstracts + PDF links by topic
  • Literature reviews — gather and rank relevant papers fast
  • Research analytics — analyse output by category, author and time

Pricing

Pay-per-result. You are charged only for the papers returned — empty runs cost nothing.

  • Uses the official arXiv API. Please respect arXiv's API terms and rate limits (the Actor waits between requests).
  • Use data only for lawful purposes.

More scrapers from the same author: