arXiv Scraper
Pricing
Pay per event
arXiv Scraper
Comprehensive arXiv scraper for extracting scholarly article data across physics, math, CS, biology, finance, statistics, engineering, and economics. Automates access to arXivβs large preprint archive, providing structured metadata for researchers, academics, and data scientists.
Pricing
Pay per event
Rating
5.0
(1)
Developer

ParseForge
Actor stats
0
Bookmarked
7
Total users
4
Monthly active users
2 days ago
Last modified
Categories
Share

π arXiv Scraper
Collect research papers from arXiv.org, the world's largest preprint repository with 2.4 million scholarly articles. Whether you're building a literature review database, monitoring research trends, or conducting academic research, this tool gets you the data you need without coding. It's the easiest way to download arXiv papers data CSV, scrape arXiv research by keyword without technical skills, or collect academic preprints for analysis.
The arXiv Scraper collects up to 1 million research papers with titles, abstracts, authors, submission dates, and PDFs in one run, with flexible search filters for any research domain.
β¨ What Does It Do
- π Paper Title and arXiv ID - identify unique papers and track them by official identifiers for citation and reference
- π₯ Author Names and Affiliations - build researcher profiles, track prolific contributors, and identify co-author networks for collaboration insights
- π Full Abstract Text - analyze research summaries to filter papers by methodology, findings, or relevance before downloading full PDFs
- π·οΈ Subject Categories and Classifications - organize papers by field (physics, computer science, mathematics, etc.) to focus on your domain of interest
- π Submission and Update Dates - monitor when papers were first published and revised to track research velocity and stay current with trends
- π Direct PDF Links - access download URLs to automatically retrieve full papers for archiving or reading
- π Comments and Journal References - capture publication status, peer review information, and journal placement to verify research impact
π§ Input
- Search Query - Enter keywords like "machine learning", "quantum computing", or "protein folding" to find papers in seconds
- Search Category - Choose which archive to search (All, Physics, Mathematics, Computer Science, Quantitative Biology, Finance, Statistics, Electrical Engineering, or Economics)
- Sort Results - Order papers by announcement date, submission date, or relevance to match your research workflow
- Search Field - Narrow results to specific fields like title, author, abstract, or full text for precise searches
- Show Abstracts - Include or exclude paper abstracts in results to reduce data size for faster downloads
- Sort By - Choose between relevance, submission date, or last updated date sorting
- Max Items - Collect anywhere from 1 to 1,000,000 papers (free users limited to 100 per run)
- Start URL - Paste a custom arXiv.org search URL directly to use your own filters and parameters for advanced users
Example input:
{"searchQuery": "machine learning","searchFor": "cs","maxItems": 50,"sortBy": "submittedDate","showAbstracts": true}
π Output
Each paper includes up to 18 data fields. Download as JSON, CSV, or Excel.
| π Paper Title | π arXiv ID | π Abstract |
|---|---|---|
| π₯ Author Names | π’ Author Affiliations | π·οΈ Subject Categories |
| π Submission Date | π Last Updated Date | π Comments |
| π Detail Page URL | π₯ PDF Download URL | π Journal Reference |
| π DOI Identifier | π License Type | π― Subject Classifications |
| π Related Papers | πΌοΈ Thumbnail Image | β° Scraped Timestamp |
| β οΈ Error Messages |
π Why Choose the arXiv Scraper?
| Feature | Our Actor | Similar Tools |
|---|---|---|
| Direct search by keyword, category, and field | βοΈ | β |
| Collect up to 1 million papers in one run | βοΈ | β |
| Export to CSV, Excel, or JSON | βοΈ | Partial |
| Sort by relevance, submission date, or update date | βοΈ | β |
| Extract full abstracts for analysis | βοΈ | Partial |
| Collect author names and affiliations | βοΈ | Partial |
| Get direct PDF download links | βοΈ | Partial |
| Retrieve DOI and journal references | βοΈ | β |
| Filter by multiple subject categories | βοΈ | β |
| Built-in duplicate detection | βοΈ | β |
| Free tier available (up to 100 papers) | βοΈ | βοΈ |
| Residential proxy support | βοΈ | β |
π How to Use
No technical skills required. Follow these simple steps:
- Sign Up: Create a free account with $5 credit
- Find the Tool: Search for "arXiv Scraper" in the Apify Store and configure your search query or URL
- Run It: Click "Start" and watch your papers appear
That's it. No coding, no setup, no complicated configuration. Now you can export your data in CSV, Excel, or JSON format.
π― Business Use Cases
- π Research Scientist - Monitor new papers in your field every week to stay current with breakthroughs and adapt your research direction before competitors
- π Literature Review Manager - Collect 500+ papers on a topic with abstracts to build a searchable database and eliminate days of manual paper hunting
- π€ AI/ML Engineer - Track transformer architecture papers, fine-tuning techniques, and benchmark improvements to benchmark your models against state-of-the-art approaches
β FAQ
π How does the scraper work? It submits your search query with filters to arXiv and collects paper metadata including titles, authors, abstracts, and download links. Each paper's information is formatted into structured data ready for analysis.
π How accurate is the data? 100% accurate. We pull data directly from arXiv, so you get exactly what appears on the website including submission dates, author lists, and abstracts.
π Can I schedule this to run automatically? Yes. Use the scheduling feature in your Apify account to collect papers daily, weekly, or monthly. You can also integrate it with Make or Zapier for automated workflows.
βοΈ Is scraping arXiv legal? Yes. arXiv is a public repository designed for open access and research sharing. The data is publicly available and designed to be accessed. However, you're responsible for complying with arXiv's terms of service and any local data laws in your country.
π‘οΈ Will arXiv block my IP? Unlikely. arXiv allows automated access for research purposes and does not actively block scrapers. We recommend using residential proxies if you're collecting large volumes (100,000+) to avoid any potential rate limiting.
β‘ How long does a run take? It depends on how many papers you're collecting. Expect roughly 1-2 seconds per paper. Collecting 100 papers takes 2-3 minutes, while 1,000 papers takes 20-30 minutes.
β οΈ Are there any limits? Free users can collect up to 100 results per run. Paid users can collect up to 1,000,000 results per run.
π Integrate arXiv Scraper with any app
- Make - Automate workflows
- Zapier - Connect 5000+ apps
- GitHub - Version control integration
- Slack - Get notifications
- Airbyte - Data pipelines
- Google Drive - Export to spreadsheets
π‘ More ParseForge Actors
- Google News Scraper - Monitor the news automatically with flexible date filtering and multi-language support
- Open Library Authors Scraper - Discover rich author profiles and book lists from Open Library
- Indeed Scraper - Collect detailed job listings for market research and recruiting intelligence
- Fisher Scientific Product Scraper - Collect product data from Fisher Scientific for procurement teams
- Huawei App Gallery Scraper - Explore app downloads, ratings, and trends across multiple regions
Browse our complete collection of data extraction tools for more.
π Ready to Start?
Create a free account with $5 credit and collect your first 100 results for free. No coding, no setup.
π Need Help?
- Check the FAQ section above for common questions
- Visit the Apify support page for documentation and tutorials
- Contact us to request a new scraper, propose a custom project, or report an issue at Tally contact form
β οΈ Disclaimer
This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by arXiv or Cornell University. All trademarks mentioned are the property of their respective owners.