arXiv Scraper
Pricing
Pay per event
arXiv Scraper
Comprehensive arXiv scraper for extracting scholarly article data across physics, math, CS, biology, finance, statistics, engineering, and economics. Automates access to arXivβs large preprint archive, providing structured metadata for researchers, academics, and data scientists.
Pricing
Pay per event
Rating
5.0
(1)
Developer

ParseForge
Actor stats
0
Bookmarked
7
Total users
4
Monthly active users
7 days ago
Last modified
Categories
Share
π Research Paper Data Made Easy
Collect comprehensive research paper data from arXiv.org in seconds without any technical expertise. Whether you're a researcher tracking developments in your field, a business analyst monitoring scientific trends, or an academic gathering data for literature reviews, this tool extracts complete article metadata automatically.
Search by keyword, research archive (Computer Science, Physics, Mathematics, etc.), or provide a direct URL. The tool handles everything else, delivering structured data with titles, authors, abstracts, PDFs, submission dates, categories, and citation information ready to download as CSV, Excel, or JSON.
Perfect for market research, competitive intelligence, academic research, and content analysis. No proxy needed. No coding required.
β¨ What Does It Do
- Extracts Core Article Information - Title, authors, arXiv ID, and direct PDF links so you can build literature databases without manual copy-paste
- Captures Complete Metadata - Submission dates, update dates, DOI numbers, and license information so you can track research timelines and citation accuracy
- Delivers Full Abstracts - Complete paper summaries without visiting each page, saving hours of manual browsing for literature reviews
- Collects Citation Details - Subject classifications, author comments, and formatted citation strings ready to drop into reference managers
- Supports Flexible Searching - Search 9 archives by keyword, author, title, or abstract so you find exactly the papers relevant to your work
- Scales to Any Project Size - Extract thousands of papers per run so large research datasets and competitive intelligence reports are fast to produce
- Exports to Any Tool - Download as CSV, Excel, or JSON so results flow directly into spreadsheets, databases, or analysis pipelines
π¬ Demo Video
Demo video coming soon.
π§ Input
Configure your search using these fields:
How to Search (choose one):
- startUrl - Paste a direct arXiv search URL. Example:
https://arxiv.org/search/?query=quantum+computing&searchtype=all. This overrides all filters below. - searchQuery - Enter keywords to find. Example: "machine learning" or "neural networks"
Refine Your Search (optional):
- Search For (Archive) - Narrow to a specific archive: All, Physics, Mathematics, Quantitative Biology, Computer Science, Quantitative Finance, Statistics, Electrical Engineering, or Economics. Default: All
- Search Field - Search within a specific field: All Fields, Title, Author(s), Abstract, Comments, Journal Reference, ACM Classification, MSC Classification, Report Number, arXiv ID, DOI, ORCID, License, or Author ID. Default: All Fields
- Sort Order - Sort results by: Announcement date (newest first), Announcement date (oldest first), Submission date (newest first), Submission date (oldest first), or Relevance. Default: Relevance
- Show Abstracts - Display abstracts in search results pages. Abstracts are always extracted from detail pages. Default: Off
Limits:
- Max Items - Number of articles to collect. Leave empty for unlimited. Default: 10
Example Input
{"searchQuery": "machine learning","searchFor": "cs","maxItems": 50}
π Output
The tool returns structured data with each article containing:
{"arxivId": "2401.12345","title": "Deep Learning Advances in Computer Vision","authors": ["John Smith", "Jane Doe", "Alex Chen"],"abstract": "This paper presents novel approaches to object detection using transformer-based architectures. We demonstrate significant improvements over existing methods...","submissionDate": "2024-01-15T12:30:00Z","lastUpdatedDate": "2024-01-20T08:15:00Z","categories": ["cs.CV", "cs.AI"],"detailUrl": "https://arxiv.org/abs/2401.12345","pdfUrl": "https://arxiv.org/pdf/2401.12345.pdf","comments": "14 pages, 8 figures, submitted to CVPR 2024","doi": "https://doi.org/10.48550/arXiv.2401.12345","license": "http://creativecommons.org/licenses/by/4.0/","subjectClassifications": ["cs.CV", "cs.LG", "cs.AI"],"submittedBy": "john.smith@university.edu","citeAs": "arXiv:2401.12345 [cs.CV] (2024-01-15)","scrapedTimestamp": "2024-02-15T10:45:30Z"}
What Each Field Means:
- arxivId - Unique identifier for the paper on arXiv
- title - The article's full title
- authors - List of all authors who contributed
- abstract - Complete summary of the research and findings
- submissionDate - When the paper was first submitted
- lastUpdatedDate - Most recent revision date
- categories - Primary research categories (e.g., cs.CV for Computer Vision)
- detailUrl - Link to the arXiv abstract page
- pdfUrl - Direct link to download the PDF
- comments - Author notes about the paper
- doi - Digital Object Identifier for permanent citation
- license - The paper's usage license
- subjectClassifications - All subject categories the paper belongs to
- submittedBy - Who submitted the paper
- citeAs - Formatted citation string for academic references
- scrapedTimestamp - When this data was collected
Download your results as CSV, Excel, or JSON through the Apify platform.
π Why Choose the arXiv Scraper?
No Technical Barriers - Point, click, and collect. No programming skills or command-line tools needed. Works completely in your browser.
Comprehensive Research Data - Get titles, authors, abstracts, PDFs, publication dates, categories, and citation information. All the metadata researchers and analysts need in one place.
Fast and Reliable - arXiv.org has no bot protection, so the scraper runs at full speed without proxies or delays. Results are consistent and accurate.
Flexible Searching - Search 9 different archives across Computer Science, Physics, Mathematics, and beyond. Filter by title, author, abstract, DOI, and more. Sort by date or relevance.
Export Anywhere - Download as CSV for spreadsheets, Excel for analysis, or JSON for integration with other tools. Get your data in whatever format you need.
Batch Processing - Collect thousands of papers in a single run, scaling up to 1 million articles at once for large research projects.
Always Current - Access papers published today. arXiv is constantly updated with new research across all fields.
π How to Use
No Technical Skills Required.
-
Sign Up - Create a free account with $5 credit
-
Find the Tool - Search for "arXiv Scraper" in the Apify Actors marketplace
-
Set Your Search - Enter your search query, choose your archive, and set the maximum number of articles to collect
-
Run - Click Start and let the tool collect all matching articles with their complete metadata
-
Download - When complete, download your results as CSV, Excel, or JSON. Import directly into spreadsheets or databases.
That's it. No code. No installation. Less than 5 minutes from start to downloaded data.
π― Business Use Cases
Academic Researchers and Scholars
- Track the latest research in your field automatically without daily website visits
- Build literature review datasets with complete citation information automatically formatted
- Monitor competing research groups by extracting all their recent publications
- Identify research trends by analyzing abstracts and publication dates across thousands of papers
Market and Competitive Intelligence Teams
- Monitor technology adoption trends by tracking research in emerging fields (AI, quantum computing, blockchain)
- Identify academic research backing new innovations before they reach the market
- Track research affiliations to understand which universities and companies lead in specific domains
- Analyze publication velocity to gauge the maturity and momentum of research areas
Business Analysts and Strategy Teams
- Extract research data for market opportunity analysis in emerging technologies
- Analyze author affiliations to identify key opinion leaders and research institutions
- Build datasets for predicting which research areas will drive future business growth
- Create competitive intelligence reports showing academic backing for different technology approaches
Data Scientists and ML Engineers
- Quickly assemble datasets of paper metadata for natural language processing or text analysis projects
- Build training data for machine learning models that work with academic literature
- Access comprehensive paper information including PDFs for automated document processing
- Extract author networks and collaboration patterns for social network analysis
Content Teams and Publishers
- Identify trending research topics to pitch to your audience
- Gather metadata for creating content aggregators or research newsletters
- Extract publication dates and author information for editorial planning
- Monitor specific research areas for new story angles and content opportunities
β FAQ
How does the tool work? The arXiv Scraper connects to arXiv.org's search system, finds papers matching your criteria, and extracts detailed metadata including titles, authors, abstracts, PDF links, and submission information. It runs on Apify's servers, so you don't need to install anything. Just set your search parameters and the tool handles everything.
How accurate is the data? The data comes directly from arXiv.org, so it's as accurate as the information researchers submit when posting their papers. All extracted fields match what appears on the official arXiv website. Dates, authors, abstracts, and PDFs are all verified from the source.
Can I schedule regular runs? Yes. Set up scheduled runs on Apify to automatically collect the latest papers matching your criteria daily, weekly, or monthly. Perfect for tracking new research or building growing datasets over time.
What's the difference between startUrl and searchQuery? Use searchQuery if you want the tool to build the search for you (simpler). Use startUrl if you've already created a search on arXiv.org and want to scrape those exact results - just copy and paste the URL.
Can I download the PDFs automatically? This tool extracts the PDF links for all papers. You get a direct URL to each paper's PDF, which you can download manually or integrate with other tools to automate PDF collection.
Which archives can I search? You can search across all arXiv categories: Computer Science, Physics, Mathematics, Quantitative Biology, Quantitative Finance, Statistics, Electrical Engineering, and Economics. Or search all archives at once.
What if I need help? Check the FAQ above or visit the Apify documentation. For technical issues or custom scraping projects, contact us using the form below.
π Integrate arXiv Scraper with any app
Connect this actor with your favorite tools:
- Make - Automate workflows
- Zapier - Connect 5000+ apps
- GitHub - Version control integration
- Slack - Get notifications
- Airbyte - Data pipelines
- Google Drive - Export to spreadsheets
π Recommended Actors
Other data extraction tools from ParseForge you may find useful:
- Hugging Face Model Scraper - Extract AI and ML model metadata
- PR Newswire Scraper - Collect press releases and news
- AWS Marketplace Scraper - Gather AWS solution data
- Stripe App Marketplace Scraper - Extract app and integration information
Browse our complete collection of data extraction tools for more research and business intelligence scrapers.
π Need Help?
Have questions about using the arXiv Scraper? Check the FAQ section above for answers to common questions. Visit the Apify documentation for platform guides and tutorials. Need a custom scraping solution or have a technical issue? Contact us using the form below.
π Contact
Contact us to request a new scraper, propose a custom data project, or report a technical issue with this actor at https://tally.so/r/BzdKgA
β οΈ Disclaimer
This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by arXiv.org or its parent organization. All trademarks mentioned are the property of their respective owners.