arXiv Scraper avatar

arXiv Scraper

Pricing

Pay per event

Go to Apify Store
arXiv Scraper

arXiv Scraper

Comprehensive arXiv scraper for extracting scholarly article data across physics, math, CS, biology, finance, statistics, engineering, and economics. Automates access to arXiv’s large preprint archive, providing structured metadata for researchers, academics, and data scientists.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

4

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

πŸ“š arXiv Scraper

Collect research papers from arXiv.org, the world's largest preprint repository with 2.4 million scholarly articles. Whether you're building a literature review database, monitoring research trends, or conducting academic research, this tool gets you the data you need without coding. It's the easiest way to download arXiv papers data CSV, scrape arXiv research by keyword without technical skills, or collect academic preprints for analysis.

The arXiv Scraper collects up to 1 million research papers with titles, abstracts, authors, submission dates, and PDFs in one run, with flexible search filters for any research domain.

✨ What Does It Do

  • πŸ“„ Paper Title and arXiv ID - identify unique papers and track them by official identifiers for citation and reference
  • πŸ‘₯ Author Names and Affiliations - build researcher profiles, track prolific contributors, and identify co-author networks for collaboration insights
  • πŸ“– Full Abstract Text - analyze research summaries to filter papers by methodology, findings, or relevance before downloading full PDFs
  • 🏷️ Subject Categories and Classifications - organize papers by field (physics, computer science, mathematics, etc.) to focus on your domain of interest
  • πŸ“… Submission and Update Dates - monitor when papers were first published and revised to track research velocity and stay current with trends
  • πŸ”— Direct PDF Links - access download URLs to automatically retrieve full papers for archiving or reading
  • πŸ“ Comments and Journal References - capture publication status, peer review information, and journal placement to verify research impact

πŸ”§ Input

  • Search Query - Enter keywords like "machine learning", "quantum computing", or "protein folding" to find papers in seconds
  • Search Category - Choose which archive to search (All, Physics, Mathematics, Computer Science, Quantitative Biology, Finance, Statistics, Electrical Engineering, or Economics)
  • Sort Results - Order papers by announcement date, submission date, or relevance to match your research workflow
  • Search Field - Narrow results to specific fields like title, author, abstract, or full text for precise searches
  • Show Abstracts - Include or exclude paper abstracts in results to reduce data size for faster downloads
  • Sort By - Choose between relevance, submission date, or last updated date sorting
  • Max Items - Collect anywhere from 1 to 1,000,000 papers (free users limited to 100 per run)
  • Start URL - Paste a custom arXiv.org search URL directly to use your own filters and parameters for advanced users

Example input:

{
"searchQuery": "machine learning",
"searchFor": "cs",
"maxItems": 50,
"sortBy": "submittedDate",
"showAbstracts": true
}

πŸ“Š Output

Each paper includes up to 18 data fields. Download as JSON, CSV, or Excel.

πŸ“„ Paper TitleπŸ†” arXiv IDπŸ“– Abstract
πŸ‘₯ Author Names🏒 Author Affiliations🏷️ Subject Categories
πŸ“… Submission DateπŸ”„ Last Updated DateπŸ“ Comments
πŸ”— Detail Page URLπŸ“₯ PDF Download URLπŸŽ“ Journal Reference
πŸ“Š DOI IdentifierπŸ“œ License Type🎯 Subject Classifications
πŸ”— Related PapersπŸ–ΌοΈ Thumbnail Image⏰ Scraped Timestamp
⚠️ Error Messages

πŸ’Ž Why Choose the arXiv Scraper?

FeatureOur ActorSimilar Tools
Direct search by keyword, category, and fieldβœ”οΈβŒ
Collect up to 1 million papers in one runβœ”οΈβŒ
Export to CSV, Excel, or JSONβœ”οΈPartial
Sort by relevance, submission date, or update dateβœ”οΈβŒ
Extract full abstracts for analysisβœ”οΈPartial
Collect author names and affiliationsβœ”οΈPartial
Get direct PDF download linksβœ”οΈPartial
Retrieve DOI and journal referencesβœ”οΈβŒ
Filter by multiple subject categoriesβœ”οΈβŒ
Built-in duplicate detectionβœ”οΈβŒ
Free tier available (up to 100 papers)βœ”οΈβœ”οΈ
Residential proxy supportβœ”οΈβŒ

πŸ“‹ How to Use

No technical skills required. Follow these simple steps:

  1. Sign Up: Create a free account with $5 credit
  2. Find the Tool: Search for "arXiv Scraper" in the Apify Store and configure your search query or URL
  3. Run It: Click "Start" and watch your papers appear

That's it. No coding, no setup, no complicated configuration. Now you can export your data in CSV, Excel, or JSON format.

🎯 Business Use Cases

  • πŸ“Š Research Scientist - Monitor new papers in your field every week to stay current with breakthroughs and adapt your research direction before competitors
  • πŸ“ˆ Literature Review Manager - Collect 500+ papers on a topic with abstracts to build a searchable database and eliminate days of manual paper hunting
  • πŸ€– AI/ML Engineer - Track transformer architecture papers, fine-tuning techniques, and benchmark improvements to benchmark your models against state-of-the-art approaches

❓ FAQ

πŸ” How does the scraper work? It submits your search query with filters to arXiv and collects paper metadata including titles, authors, abstracts, and download links. Each paper's information is formatted into structured data ready for analysis.

πŸ“Š How accurate is the data? 100% accurate. We pull data directly from arXiv, so you get exactly what appears on the website including submission dates, author lists, and abstracts.

πŸ“… Can I schedule this to run automatically? Yes. Use the scheduling feature in your Apify account to collect papers daily, weekly, or monthly. You can also integrate it with Make or Zapier for automated workflows.

βš–οΈ Is scraping arXiv legal? Yes. arXiv is a public repository designed for open access and research sharing. The data is publicly available and designed to be accessed. However, you're responsible for complying with arXiv's terms of service and any local data laws in your country.

πŸ›‘οΈ Will arXiv block my IP? Unlikely. arXiv allows automated access for research purposes and does not actively block scrapers. We recommend using residential proxies if you're collecting large volumes (100,000+) to avoid any potential rate limiting.

⚑ How long does a run take? It depends on how many papers you're collecting. Expect roughly 1-2 seconds per paper. Collecting 100 papers takes 2-3 minutes, while 1,000 papers takes 20-30 minutes.

⚠️ Are there any limits? Free users can collect up to 100 results per run. Paid users can collect up to 1,000,000 results per run.

πŸ”— Integrate arXiv Scraper with any app

πŸ’‘ More ParseForge Actors

Browse our complete collection of data extraction tools for more.

πŸš€ Ready to Start?

Create a free account with $5 credit and collect your first 100 results for free. No coding, no setup.

πŸ†˜ Need Help?

  • Check the FAQ section above for common questions
  • Visit the Apify support page for documentation and tutorials
  • Contact us to request a new scraper, propose a custom project, or report an issue at Tally contact form

⚠️ Disclaimer

This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by arXiv or Cornell University. All trademarks mentioned are the property of their respective owners.