arXiv Daily Digest Scraper
Pricing
Pay per usage
Go to Apify Store

arXiv Daily Digest Scraper
Scrape arXiv papers by search query or category. Extract titles, authors, abstracts, and PDF links from recent submissions.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Donny Nguyen
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 hours ago
Last modified
Categories
Share
arXiv Daily Digest
Monitor arXiv for new papers by topic, author, or keyword. This actor extracts comprehensive paper metadata from arXiv.org including titles, authors, abstracts, categories, and PDF links.
Features
- Search by Keywords: Query arXiv using any search terms
- Monitor Categories: Track specific arXiv categories (cs.AI, cs.LG, physics, etc.)
- Custom URLs: Provide your own arXiv search or list page URLs
- Date Filtering: Only include papers from the last N days
- Comprehensive Metadata: Extract title, authors, abstract, categories, arXiv ID, PDF URL, and publication date
Input Parameters
- Search Queries (stringList): Keywords or phrases to search on arXiv (e.g., "machine learning", "quantum computing")
- arXiv Categories (stringList): Category codes to monitor (e.g., "cs.AI", "cs.LG", "physics.gen-ph")
- Start URLs (requestListSources): Custom arXiv URLs to scrape
- Max Results (integer, default: 50): Maximum number of papers to extract per URL
- Days Back (integer, default: 7): Only include papers from the last N days (0 = no filter)
- Use Residential Proxy (boolean, default: false): Use residential proxies for better reliability
Output
Each paper includes:
{"title": "Paper Title","authors": ["Author 1", "Author 2"],"abstract": "Paper abstract text...","categories": ["cs.AI", "cs.LG"],"arxivId": "2401.12345","pdfUrl": "https://arxiv.org/pdf/2401.12345.pdf","publishedDate": "1 Jan 2024","url": "https://arxiv.org/abs/2401.12345","scrapedAt": "2024-01-15T10:30:00.000Z"}
Use Cases
- Stay updated on research in your field
- Track specific research topics or authors
- Build research paper databases
- Monitor new publications in specific categories
- Automated literature review workflows
Example Configuration
{"searchQueries": ["machine learning", "deep learning"],"categories": ["cs.AI", "cs.LG"],"maxResults": 50,"daysBack": 7}
Notes
- arXiv is an open-access repository, so this actor respects their terms of use
- Date filtering is based on the submission/announcement date
- The actor handles both search result pages and category list pages
- Results are limited per URL to avoid excessive scraping
Performance
- Speed: Fast (Cheerio-based, no browser overhead)
- Cost: Low (datacenter proxies sufficient for arXiv)
- Memory: 256-512 MB recommended
Built with Apify SDK and Crawlee using CheerioCrawler for efficient scraping.