Pricing

Pay per event

Try for free

Go to Apify Store

arXiv Preprint Scraper

Try for free

Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Pull titles, authors, abstracts, categories, DOIs, journal refs, and PDF links.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

16 hours ago

Last modified

📚 arXiv Scraper

🚀 Export open-access research in seconds. Query 2M+ preprints from arXiv by keyword, author, or category, and pull titles, abstracts, authors, DOIs, and PDF URLs into a clean dataset. No API key, no registration, no XML parsing.

The arXiv Scraper queries the public arXiv API (export.arxiv.org) and returns 14 fields per paper, including arxivId, title, authors, full abstract, primary and secondary categories, DOI, journal reference, publication and update dates, and a direct PDF URL. arXiv is the world's largest open-access preprint archive for physics, mathematics, computer science, quantitative biology, statistics, and economics.

The archive spans every major quantitative discipline and 2+ million papers going back to 1991. This Actor converts arXiv query syntax into a structured dataset available as CSV, Excel, JSON, or XML in under five minutes. All filtering happens server-side, so you skip the Atom XML parser entirely.

🎯 Target Audience	💡 Primary Use Cases
Academic researchers, ML engineers, data scientists, literature review teams, citation tracking tools, competitive-intelligence analysts, journalists, educators	Literature reviews, citation graphs, trend tracking, paper discovery, LLM training corpora, author profiling, category monitoring

📋 What the arXiv Scraper does

Three filtering workflows in a single run:

🔍 Keyword search. Full-text queries across title, abstract, and metadata using arXiv query syntax.
👤 Author search. Pull every paper by a given author using the au: prefix.
📂 Category filter. Restrict by arXiv subject category (e.g., cs.LG, math.PR, physics.optics).

Each record includes the arxivId, title, author list, full abstract, primary category and all secondary categories, DOI, journal reference, comment field, publication and update timestamps, plus direct links to the abstract page and the PDF.

💡 Why it matters: arXiv is the default publication channel for machine learning, theoretical physics, and mathematics. Tracking new papers manually is slow, and the official API returns Atom XML that most teams do not want to parse. This Actor returns a flat JSON dataset ready for downstream ingestion.

📊 Data fields

Each record includes: absUrl, arxivId, authors, categories, comment, doi, journalRef, pdfUrl, primaryCategory, published, scrapedAt, summary, title, updated, version. All 15 field names come from a real production run, so what you see here is what lands in your dataset.

⚠️ Good to Know: arXiv enforces a rate limit on its public API. The Actor paces requests to stay within policy, so very large runs (10,000+ papers) naturally take longer. Plan accordingly for literature-review pipelines.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the arXiv Scraper page on the Apify Store.
🎯 Set input. Enter an arXiv query (e.g., cat:cs.LG AND all:diffusion), pick a sort order, and set maxItems.
🚀 Run it. Click Start and let the Actor collect your papers.
📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.

🔗 Recommended Actors

🤗 Hugging Face Model Scraper - ML model metadata, downloads, and benchmarks
📦 Hugging Face Datasets Scraper - Open datasets for ML and NLP research
👨‍🔬 Semantic Scholar Author Profiles Scraper - Author citations, h-index, and affiliations
🧬 PubMed Scraper - Biomedical literature from the NIH database
📖 Open Library Authors Scraper - Author records from the Internet Archive's Open Library

💡 Pro Tip: browse the complete ParseForge collection for more research and reference-data scrapers.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by arXiv, Cornell University, or the Simons Foundation. All trademarks mentioned are the property of their respective owners. Only publicly available open-access preprint metadata is collected.

🆘 Need Help?

If you hit a bug, have questions about setup, or need a scraper we haven't built yet, open our contact form or write to parseforge@protonmail.com. We also take on paid custom data projects.

For faster answers, join our Discord. It's the best place to get support and suggest new actors.

arXiv Scraper

jungle_synthesizer/arxiv-scraper

Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Returns titles, authors, abstracts, categories, and PDF links.

BowTiedRaccoon

ArXiv Preprint Paper Search

ryanclinton/arxiv-paper-search

Search and extract preprint research papers from the ArXiv open-access repository. Query over 2.4 million academic papers across physics, mathematics, computer science, biology, economics, and more with structured JSON output, no API key required.

Ryan Clinton

arXiv Paper Scraper — Abstracts, Authors & Metadata

logiover/arxiv-paper-scraper

Scrape research paper metadata from arXiv.org the worlds largest open-access repository. Search by keyword across computer science physics mathematics biology. Returns titles abstracts authors categories PDF links and DOIs. No API key required.

Logiover

arXiv Scraper

solidcode/arxiv-scraper

[💰 $2.5 / 1K] Search arXiv and extract paper metadata — titles, authors, abstracts, subject categories, DOIs, journal references, submission dates, and PDF links. Search by keyword, title, author, or category, or fetch specific papers by arXiv ID.

SolidCode

arXiv Paper Scraper

lulzasaur/arxiv-scraper

Search and scrape arXiv academic papers. Get titles, authors, abstracts, categories, PDF links, DOIs. Search by keyword, browse recent papers by category, or fetch by arXiv ID.

lulz bot

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Daniel

arXiv Papers Scraper

resounding_diplomacy/arxiv-papers-scraper

Scrape academic papers from arXiv by category, keyword, or author. Extract titles, authors, abstracts, PDF URLs, DOIs, categories, and more. Perfect for AI/ML research datasets.

alars num

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

ArXiv Preprint Paper Search

scrupulous_waterbird_m4w/arxiv-papers

Search and extract arXiv preprint papers by category, author, title, and date range. Returns title, authors, abstract, PDF URL, categories, primary category, and submission date as structured records.

Mori

Arxiv Papers Scraper

chimerical_quicklime/arxiv-papers-scraper

Search arXiv preprints via the public Atom API. Returns title, authors, abstract, categories, published date, updated date, DOI, journal reference, and PDF link. Filter by category, author, or keyword.