arXiv Paper Scraper avatar

arXiv Paper Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
arXiv Paper Scraper

arXiv Paper Scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Daniel

Daniel

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Scrapes academic papers from arXiv using the public Atom API.

Features

  • Search mode — free-text search across all arXiv papers
  • Category mode — browse papers by arXiv category (e.g. cs.AI, math.CO, physics.optics)
  • Configurable sorting (by relevance, last updated, or submission date)
  • Pagination with polite rate limiting (3s between requests)

Input

FieldTypeDefaultDescription
modestringsearchsearch or category
querystringmachine learningSearch query (for search mode)
categorystringcs.AIarXiv category code (for category mode)
max_itemsinteger10Maximum papers to scrape (1–1000)
sort_bystringrelevancerelevance, lastUpdatedDate, or submittedDate
sort_orderstringdescendingascending or descending

Output

Each result contains:

  • arxiv_id — arXiv paper ID (e.g. 2301.07041)
  • title — paper title
  • summary — abstract
  • authors — list of author names
  • categories — list of arXiv categories
  • primary_category — primary category
  • published — publication date (ISO 8601)
  • updated — last updated date (ISO 8601)
  • pdf_url — direct PDF link
  • abs_url — abstract page link
  • comment — author comment (optional)
  • journal_ref — journal reference (optional)
  • doi — DOI (optional)

Local Usage

$python -m src -i .actor/input.json

Example Input

{
"mode": "search",
"query": "transformer architecture",
"max_items": 20,
"sort_by": "submittedDate",
"sort_order": "descending"
}

Notes

  • The arXiv API has a rate limit of ~1 request per 3 seconds. The scraper respects this.
  • Maximum 100 results per API request; pagination is handled automatically.
  • arXiv categories list: https://arxiv.org/category_taxonomy