arXiv Research Papers Tracker avatar

arXiv Research Papers Tracker

Pricing

Pay per usage

Go to Apify Store
arXiv Research Papers Tracker

arXiv Research Papers Tracker

Search and extract academic papers from arXiv by category, keyword, date range. Returns paper title, authors, abstract, categories, published date, PDF URL. Ideal for AI/ML research monitoring and training data collection.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

陈俊杰

陈俊杰

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

An Apify Actor that searches and extracts academic papers from arXiv by category, keyword, and date range. Ideal for AI/ML research monitoring, literature reviews, and training-data collection.

Features

  • Category search — search one or more arXiv categories (e.g. cs.AI, cs.LG, stat.ML).
  • Keyword filtering — narrow results to papers whose title or abstract contains specific terms.
  • Pagination — automatically fetches up to 200 results with polite 3-second delays between pages.
  • Rich output — returns title, authors, abstract, categories, published/updated dates, PDF URL, and arXiv ID.

Input

FieldTypeDefaultDescription
categoriesstringcs.AI,cs.LG,stat.MLComma-separated arXiv category codes
keywordsstring(optional)Space-separated search terms (title/abstract)
max_resultsinteger50Maximum number of papers (≤ 200)
sort_byenumsubmittedDatesubmittedDate or relevance

Output

Each result is a JSON object pushed to the Apify dataset with the following fields:

FieldTypeDescription
idstringarXiv identifier (e.g. 2101.12345)
urlstringarXiv abstract page URL
titlestringPaper title
authorsstring[]List of author names
abstractstringPaper abstract / summary
categoriesstringComma-separated category codes
primary_categorystringPrimary arXiv category
publishedstringOriginal publication date (ISO‑8601)
updatedstringLast update date (ISO‑8601)
pdf_urlstringDirect link to the PDF

Common arXiv Category Codes

Computer Science (cs.*)

CodeDescription
cs.AIArtificial Intelligence
cs.ARHardware Architecture
cs.CCComputational Complexity
cs.CEComputational Engineering, Finance, and Science
cs.CLComputation and Language (NLP)
cs.CRCryptography and Security
cs.CVComputer Vision and Pattern Recognition
cs.CYComputers and Society
cs.DBDatabases
cs.DCDistributed, Parallel, and Cluster Computing
cs.DLDigital Libraries
cs.DSData Structures and Algorithms
cs.ETEmerging Technologies
cs.GLGeneral Literature
cs.GTComputer Science and Game Theory
cs.HCHuman-Computer Interaction
cs.IRInformation Retrieval
cs.ITInformation Theory
cs.LGMachine Learning
cs.LOLogic in Computer Science
cs.MAMultiagent Systems
cs.NENeural and Evolutionary Computing
cs.NINetworking and Internet Architecture
cs.PLProgramming Languages
cs.RORobotics
cs.SESoftware Engineering
cs.SISocial and Information Networks
cs.SYSystems and Control

Statistics (stat.*)

CodeDescription
stat.APApplications
stat.COComputation
stat.MEMethodology
stat.MLMachine Learning
stat.THStatistics Theory

Mathematics (math.*)

CodeDescription
math.NANumerical Analysis
math.OCOptimization and Control
math.PRProbability
math.STStatistics Theory

Physics (physics.*) & Other

CodeDescription
physics.*Various physics sub-disciplines
q-fin.*Quantitative Finance
q-bio.*Quantitative Biology
eess.*Electrical Engineering and Systems Science

See the full arXiv category list.

Local Development

# Clone / navigate to the project
cd ~/apify-actors/arxiv-papers-scraper
# Install dependencies
pip install -r requirements.txt
# Run the actor (requires Apify API token when using Apify platform features)
python -m src

To run with custom input via the Apify CLI:

$apify run

License

MIT