ArXiv Paper Scraper — Search and Extract Research Papers (JSON) avatar

ArXiv Paper Scraper — Search and Extract Research Papers (JSON)

Pricing

Pay per usage

Go to Apify Store
ArXiv Paper Scraper — Search and Extract Research Papers (JSON)

ArXiv Paper Scraper — Search and Extract Research Papers (JSON)

Search and scrape arXiv research papers. Get titles, abstracts, authors, categories, and PDF links. Monitor new papers by topic daily.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 hours ago

Last modified

Categories

Share

arXiv Paper Scraper — Extract Research Papers, Abstracts & Metadata

Extract research papers from arXiv.org at scale. Search by keyword or browse by category to get structured data including titles, authors, abstracts, categories, DOI references, and PDF links. Powered by the official arXiv API with built-in rate limiting and pagination.

Features

  • Keyword Search — find papers by any search term (e.g., "large language models", "quantum computing")
  • Category Browsing — extract papers from specific arXiv categories (cs.AI, cs.LG, math.CO, physics.optics)
  • Full Metadata — title, authors list, abstract, DOI, journal reference, and publication dates
  • PDF & Abstract Links — direct URLs to PDF downloads and abstract pages
  • Automatic Pagination — fetches up to 500 papers per query with configurable limits
  • Sorting Options — sort by submission date, last update, or relevance
  • Rate-Limited — respects arXiv's 3-second request delay policy
  • Default Fallback — returns latest AI papers when no input is provided

Output Example

{
"arxivId": "2403.12345v1",
"title": "Scaling Laws for Neural Language Models",
"authors": ["John Smith", "Jane Doe"],
"abstract": "We investigate the scaling behavior of Transformer language models...",
"categories": ["cs.CL", "cs.AI", "cs.LG"],
"primaryCategory": "cs.CL",
"published": "2024-03-15T18:00:00Z",
"updated": "2024-03-16T12:00:00Z",
"doi": "10.1234/example",
"journalRef": "Nature 2024",
"pdfUrl": "http://arxiv.org/pdf/2403.12345v1",
"abstractUrl": "http://arxiv.org/abs/2403.12345v1",
"source": "search:large language models",
"scrapedAt": "2026-03-18T10:00:00.000Z"
}

Use Cases

  • AI/ML Research Monitoring — track the latest papers in your field automatically
  • Literature Review — bulk-extract papers for systematic academic reviews
  • Training Data Collection — gather paper metadata for AI research tools
  • Trend Analysis — identify hot topics by analyzing publication volume across categories
  • Citation Tracking — monitor new publications from specific research groups
  • Knowledge Base Building — feed structured paper data into search engines or databases

Input Parameters

ParameterTypeDefaultDescription
searchQueriesarray[]Keywords to search (e.g., "transformer architecture")
categoriesarray[]arXiv category codes (e.g., "cs.AI", "cs.LG")
maxPapersPerSourceinteger50Max papers per query/category (1-500)
sortBystringsubmittedDateSort by: submittedDate, lastUpdatedDate, relevance
sortOrderstringdescendingascending or descending

How It Works

The scraper uses the official arXiv API to search and retrieve paper metadata in XML format. It parses each entry to extract structured fields including authors, categories, and links, then handles pagination automatically to collect up to the configured limit per source. A 3-second delay between requests ensures compliance with arXiv's rate limiting policy.