Arxiv Paper Scraper avatar

Arxiv Paper Scraper

Pricing

Pay per usage

Go to Apify Store
Arxiv Paper Scraper

Arxiv Paper Scraper

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny Nguyen

Donny Nguyen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Overview

arXiv Paper Scraper extracts academic paper metadata from arXiv.org using the official public API. It collects paper titles, complete author lists, abstracts, arXiv categories, publication dates, PDF download links, and DOI identifiers. This actor is designed for researchers, data scientists, and academic professionals who need to systematically collect and analyze research paper data from one of the largest open-access preprint repositories in the world.

Features

  • Search arXiv papers by multiple keywords or topics simultaneously
  • Filter results by arXiv categories (cs.AI, cs.CL, math.CO, etc.)
  • Extract comprehensive paper metadata including abstracts and author lists
  • Direct links to PDF downloads and abstract pages
  • Configurable result limits per search term
  • Uses the official arXiv API for reliable and structured data extraction
  • Built-in rate limiting to respect arXiv API guidelines
  • Fallback data ensures results are always returned

Input Parameters

ParameterTypeDefaultDescription
searchTermsarray["large language models", "transformer architecture"]Keywords to search
categoriesarray["cs.AI", "cs.CL"]arXiv category filters
maxResultsinteger200Maximum number of papers to extract

Output Format

Each paper in the dataset includes:

  • title - Paper title
  • authors - Comma-separated list of authors
  • abstract - Paper abstract
  • categories - arXiv categories assigned to the paper
  • publishedDate - Original publication date
  • updatedDate - Last update date
  • pdfUrl - Direct link to PDF download
  • doi - Digital Object Identifier (if available)
  • arxivId - arXiv paper identifier
  • absUrl - Link to abstract page
  • searchTerm - The search term that found this paper
  • scrapedAt - Timestamp of data extraction

Use Cases

This scraper is perfect for conducting systematic literature reviews across research domains, tracking publication trends in AI and machine learning, building citation databases for academic projects, monitoring new research in specific arXiv categories, creating reading lists for research groups, and aggregating paper metadata for bibliometric analysis. The structured output enables easy import into reference managers and research databases.

Pricing

This actor uses pay-per-event pricing at $0.30 per 1,000 papers scraped. Since it uses the free arXiv API, costs are very low. No subscription fees or minimum commitments required. A typical run extracting 200 papers costs just a fraction of a cent in data delivery charges.

Limitations

  • arXiv API has rate limits requiring 3-second delays between requests
  • Search results are limited by the arXiv API capabilities
  • Full-text content is not extracted, only metadata and abstracts
  • Some older papers may have incomplete metadata or missing DOIs

Built by consummate_mandala on Apify.