Pricing

from $2.00 / 1,000 results

🔬 arXiv Scraper - Scientific Papers, Abstracts & PDFs

arXiv Scraper for the official arXiv API. Search 2M+ scientific papers in CS, physics, math and biology by keyword, title, author, abstract or category. Extract title, authors, abstract, categories, DOI, dates and PDF links. For AI/ML research, literature reviews and RAG datasets.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Ben

Actor stats

Bookmarked

Total users

Monthly active users

12 days ago

Last modified

🔬 arXiv Scraper — Scientific Papers, Abstracts & PDF Links as Clean Data

Extract scientific papers from arXiv.org — over 2 million open-access preprints in physics, computer science, math, biology and economics — as clean, structured records, powered by the official arXiv API. Search by keyword, title, author, abstract or category like cs.LG, and get titles, authors, full abstracts, categories, DOIs, journal references and direct PDF links, no API key, proxy or browser required. It is ideal for AI/ML research and for building RAG and LLM datasets from topic-filtered abstracts. Export to JSON/CSV/Excel, run on a schedule, call via API, or connect to Make, Zapier or n8n.

🔬 What is the arXiv Scraper?

It turns any arXiv search into a structured dataset. Use the simple fields — keyword, title, author, abstract and category — or a raw advanced query for full arXiv syntax, and it returns every matching paper with authors, full abstract, categories, DOI, journal reference and direct links to the abstract page and PDF. Everything is flattened into clean rows so you can drop them straight into a literature review, research tracker or vector database. It parses arXiv's Atom feed, paginates automatically and waits between requests as arXiv asks, so a request for hundreds of papers just works.

What data does it extract?

Identifier — arxiv_id
Title — the paper title
Authors — authors and author_count
Abstract — full abstract text
Categories — primary_category and categories
Dates — published and updated
Publication info — doi, journal_ref and comment
Links — pdf_url and abstract_url
Run metadata — scraped_at

⬇️ Input

Use the simple fields, or a raw searchQuery for full arXiv syntax:

Field	Description
`allFields`	Keyword across title, abstract, authors and more
`title`	Only papers whose title contains this term
`author`	Filter by author name
`abstract`	Only papers whose abstract contains this term
`category`	arXiv category code, e.g. `cs.LG`, `cs.CL`, `cs.AI`, `physics.optics`, `math.PR`
`searchQuery`	Advanced raw query (overrides the simple fields above), e.g. `all:transformer AND cat:cs.LG`
`sortBy`	`relevance`, `newest` (submitted) or `updated` (recently updated)
`maxResults`	Cap the run (1–2000)

Example input

{
  "allFields": "large language models",
  "category": "cs.CL",
  "sortBy": "newest",
  "maxResults": 100
}

⬆️ Output

Every paper is one clean row (view as a table, or export JSON / CSV / Excel):

{
  "arxiv_id": "2605.30351v1",
  "title": "VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Video Generation",
  "authors": ["Hidir Yesiltepe", "Jiazhen Hu"],
  "author_count": 2,
  "abstract": "Long-rollout causal video diffusion models accumulate memory cost...",
  "primary_category": "cs.CV",
  "categories": ["cs.CV", "cs.AI"],
  "published": "2026-05-28T17:59:57Z",
  "updated": "2026-05-28T17:59:57Z",
  "doi": null,
  "journal_ref": null,
  "comment": "10 pages, 6 figures",
  "pdf_url": "https://arxiv.org/pdf/2605.30351v1",
  "abstract_url": "https://arxiv.org/abs/2605.30351v1",
  "scraped_at": "2026-06-26T15:30:00.000Z"
}

💡 Use cases

🤖 AI/ML research: track the newest papers in a field or category like cs.LG, cs.CL or cs.AI and never miss a release.
🧠 RAG & LLM datasets: build topic-filtered corpora of full abstracts plus PDF links to index in a vector database.
🔍 Literature reviews: gather, sort and export every relevant paper on a subject in one run.
📊 Research analytics: analyse output by category, author and time, and link papers to their published DOIs.

❓ FAQ

How do I search arXiv with this Actor? Fill in the simple fields — allFields, title, author, abstract or category — choose a sortBy order, then Run. You get structured papers with title, authors, full abstract, categories, DOI and PDF links.

Can I use advanced arXiv query syntax? Yes. Put a raw query in searchQuery, for example cat:cs.CL AND abs:retrieval augmented or all:transformer AND cat:cs.LG, and it overrides the simple fields for full control.

What are arXiv categories and how do I use them? Categories are arXiv's subject codes, such as cs.LG (machine learning), cs.CL (NLP), cs.AI, physics.optics or math.PR. Put one in the category field to restrict results to that area.

Do I need an API key? No. The Actor uses the free, official arXiv API with no key, login, proxy or browser. It waits a few seconds between requests, as arXiv asks clients to do.

Does it include full abstracts and PDF links? Yes. Every record contains the complete abstract text plus a pdf_url and an abstract_url, which makes it well suited to building RAG and LLM datasets.

Can I get the newest papers first? Yes. Use sortBy: newest to sort by submission date or sortBy: updated for recently revised papers; relevance uses arXiv's default ranking.

How many papers can it return? Up to your maxResults cap (1–2000). It paginates through the results automatically.

Why are some fields like doi or journal_ref empty? Many arXiv papers are preprints that have not been formally published, so they have no doi, journal_ref or comment. The Actor returns these fields when the author provides them and null otherwise.

Can I run it on a schedule or via API? Yes. Schedule recurring runs in Apify, call it via the API/SDK, or connect it to Make, Zapier or n8n to push results into your own tools.

Is scraping arXiv legal? The Actor uses the official arXiv API, which is provided for programmatic access to public papers. Use it responsibly, respect arXiv's API terms and rate limits, and follow applicable laws.

🔗 You might also like

OpenAlex Scraper — academic papers & citations
PubMed Scraper — biomedical literature & citations
GitHub Repository Intelligence — repo metadata, stars & activity

Keywords: arXiv scraper, arXiv API, scientific papers scraper, preprints, research papers, abstracts, PDF links, AI ML research, RAG dataset, LLM dataset, cs.LG, cs.CL, literature review tool, research metadata, paper search.

arXiv Scraper - Research Papers & Abstracts

antishock/arxiv-paper-scraper

Scrape arXiv research papers by keyword or category. Extract titles, authors, abstracts, dates, and DOIs. Perfect for academic research, literature reviews, and AI/ML paper discovery.

Ryan Zinburg

arXiv Papers Scraper

troy_007/arxiv-papers-scraper

Search and export arXiv research papers by query, category, or author — title, abstract, authors, categories, dates, PDF link, and DOI. Uses the official arXiv API.

Pathik Shah

arXiv Papers Scraper — AI & Research by Keyword or Category

hichemdev/arxiv-papers-scraper

Scrape arXiv research papers by keyword or category: title, authors, abstract, dates, categories, DOI and PDF link. Perfect for tracking AI/ML research.

Hichem Ben Moussa

arXiv Papers Scraper

neuton/arxiv-papers-scraper

Search arXiv papers by query, category, author, or ID. Extract titles, abstracts, authors, dates, DOI, PDF links, and categories via the official API.

Ashwin Prasad

ArXiv Paper Scraper

sheshinmcfly/arxiv-paper-scraper

Search and extract scientific papers from ArXiv.org. Returns title, authors, abstract, categories, submission date, and PDF link. Ideal for AI/ML research, RAG pipelines, academic trend monitoring, and systematic literature reviews. No API key required.

Sheshinmcfly

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

Monkey Coder

arXiv Scraper

dami_studio/arxiv-scraper

Search arXiv via the official API and return structured paper metadata as JSON: title, abstract, authors, categories, DOI, dates, and abstract + PDF links. Best for literature reviews.

Dami's Studio

5.0

arXiv Papers Scraper

resounding_diplomacy/arxiv-papers-scraper

Scrape academic papers from arXiv by category, keyword, or author. Extract titles, authors, abstracts, PDF URLs, DOIs, categories, and more. Perfect for AI/ML research datasets.

alars num

arXiv Paper Scraper

technicaldost/arxiv-paper-scraper

Search and scrape academic papers from arXiv. Extract titles, authors, abstracts, categories, PDF links and publication dates by keyword, category or author. Ideal for research, literature reviews and building ML training datasets.

Technical Dost Solutions

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

Crawler Bros