arXiv Scraper avatar

arXiv Scraper

Pricing

Pay per event

Go to Apify Store
arXiv Scraper

arXiv Scraper

Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Returns titles, authors, abstracts, categories, and PDF links.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Export preprints and papers from arXiv.org — the leading open-access repository for 2.5 million+ scientific papers across physics, mathematics, computer science, biology, economics, and quantitative finance.

This actor queries the official ArXiv Atom API (export.arxiv.org/api/query) — the method ArXiv officially supports for programmatic data access. No scraping, no JavaScript rendering, no account required.

What you get

Each result includes:

  • arxiv_id — the canonical short ID (e.g. 2301.12345)
  • abs_url — link to the abstract page
  • pdf_url — direct PDF download link
  • title — full paper title
  • abstract — complete abstract / summary
  • authors — comma-separated author names
  • primary_category — primary subject category (e.g. cs.AI)
  • categories — all subject categories, comma-separated
  • published — original submission date (ISO 8601)
  • updated — date of the latest version
  • comment — author notes (page count, conference, etc.) if available

Search query syntax

The searchQuery field supports ArXiv's full query language:

PatternExampleMeaning
Plain keywordmachine learningFull-text search
Titleti:attentionPapers with "attention" in the title
Authorau:HintonPapers by Hinton
Abstractabs:transformerPapers with "transformer" in abstract
Categorycat:cs.AIPapers in the cs.AI category
Booleancat:cs.LG AND ti:diffusionCategory AND title filter
Date rangesubmittedDate:[202301010000 TO 202312312359]Papers from 2023

See the ArXiv query language reference for the full syntax.

Common arXiv categories

CategoryField
cs.AIArtificial Intelligence
cs.LGMachine Learning
cs.CLComputation and Language (NLP)
cs.CVComputer Vision
physics.hep-thHigh Energy Physics Theory
math.COCombinatorics
q-bio.NCNeurons and Cognition
econ.GNGeneral Economics

Input parameters

ParameterTypeDefaultDescription
searchQuerystringrequiredArXiv query expression
maxItemsinteger50Maximum number of papers to return
sortBystringsubmittedDateSort field: relevance, lastUpdatedDate, submittedDate
sortOrderstringdescendingascending or descending

Usage examples

Fetch the 100 most recent cs.AI papers:

{
"searchQuery": "cat:cs.AI",
"maxItems": 100,
"sortBy": "submittedDate",
"sortOrder": "descending"
}

Find papers by a specific author:

{
"searchQuery": "au:LeCun",
"maxItems": 50,
"sortBy": "relevance"
}

Search for diffusion model papers from 2024:

{
"searchQuery": "ti:diffusion AND submittedDate:[202401010000 TO 202412312359]",
"maxItems": 200
}

Technical notes

  • Uses the ArXiv Atom API — ArXiv's official programmatic interface
  • Pagination is handled automatically; set maxItems to any number
  • Rate-limited to ~1 request/second per ArXiv usage guidelines
  • No authentication required
  • Results span all of arXiv's subject areas (2.5M+ papers total)