Arxiv Paper Scraper
Pricing
Pay per usage
Arxiv Paper Scraper
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Donny Nguyen
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Overview
arXiv Paper Scraper extracts academic paper metadata from arXiv.org using the official public API. It collects paper titles, complete author lists, abstracts, arXiv categories, publication dates, PDF download links, and DOI identifiers. This actor is designed for researchers, data scientists, and academic professionals who need to systematically collect and analyze research paper data from one of the largest open-access preprint repositories in the world.
Features
- Search arXiv papers by multiple keywords or topics simultaneously
- Filter results by arXiv categories (cs.AI, cs.CL, math.CO, etc.)
- Extract comprehensive paper metadata including abstracts and author lists
- Direct links to PDF downloads and abstract pages
- Configurable result limits per search term
- Uses the official arXiv API for reliable and structured data extraction
- Built-in rate limiting to respect arXiv API guidelines
- Fallback data ensures results are always returned
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| searchTerms | array | ["large language models", "transformer architecture"] | Keywords to search |
| categories | array | ["cs.AI", "cs.CL"] | arXiv category filters |
| maxResults | integer | 200 | Maximum number of papers to extract |
Output Format
Each paper in the dataset includes:
title- Paper titleauthors- Comma-separated list of authorsabstract- Paper abstractcategories- arXiv categories assigned to the paperpublishedDate- Original publication dateupdatedDate- Last update datepdfUrl- Direct link to PDF downloaddoi- Digital Object Identifier (if available)arxivId- arXiv paper identifierabsUrl- Link to abstract pagesearchTerm- The search term that found this paperscrapedAt- Timestamp of data extraction
Use Cases
This scraper is perfect for conducting systematic literature reviews across research domains, tracking publication trends in AI and machine learning, building citation databases for academic projects, monitoring new research in specific arXiv categories, creating reading lists for research groups, and aggregating paper metadata for bibliometric analysis. The structured output enables easy import into reference managers and research databases.
Pricing
This actor uses pay-per-event pricing at $0.30 per 1,000 papers scraped. Since it uses the free arXiv API, costs are very low. No subscription fees or minimum commitments required. A typical run extracting 200 papers costs just a fraction of a cent in data delivery charges.
Limitations
- arXiv API has rate limits requiring 3-second delays between requests
- Search results are limited by the arXiv API capabilities
- Full-text content is not extracted, only metadata and abstracts
- Some older papers may have incomplete metadata or missing DOIs
Built by consummate_mandala on Apify.