arXiv Metadata Collector— Metadata, PDF, Authors & Abstract
Pricing
$12.99/month + usage
arXiv Metadata Collector— Metadata, PDF, Authors & Abstract
Scrape arXiv research papers with metadata including title, authors, abstract, PDF links, DOI, and categories. Supports keyword search, proxy integration, and structured dataset output for AI, ML, and academic research use
Pricing
$12.99/month + usage
Rating
0.0
(0)
Developer
Scrape Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
📚 arXiv Research Paper Scraper
Search and extract real academic research papers from arXiv.org — the world's largest preprint repository with 2.3+ million papers across Computer Science, Physics, Mathematics, Statistics, Economics, Biology and more.
Uses the official arXiv API — free, public, always up to date. No scraping, no bot detection.
🎯 Quick Start — Demo Mode
Get 10 landmark AI papers in 2 seconds:
- Toggle Demo Mode → ON
- Click Run Actor
- Get real papers: Transformers, BERT, GPT-3, LLaMA, GAN, ViT, DDPM, RAG
📦 Output Fields
| Field | Type | Description | Example |
|---|---|---|---|
title | string | Full paper title | Attention Is All You Need |
authors | array | All author names | ["Vaswani", "Shazeer", ...] |
abstract | string | Full paper abstract | We propose a new simple network... |
pdf_url | string | Direct PDF download URL | https://arxiv.org/pdf/1706.03762 |
arxiv_url | string | arXiv abstract page URL | https://arxiv.org/abs/1706.03762 |
arxiv_id | string | arXiv paper ID | 1706.03762 |
published | string | First submission date | 2017-06-12 |
updated | string | Last update date | 2023-08-02 |
primary_category | string | Main arXiv category | cs.CL |
categories | array | All subject categories | ["cs.CL","cs.LG"] |
doi | string | DOI (if published in journal) | 10.48550/arXiv.1706.03762 |
journal_ref | string | Journal/conference reference | NeurIPS 2017 |
comment | string | Author comments / notes | 15 pages, 5 figures |
source | string | Data source | arXiv |
📋 Example Output
{"title": "Attention Is All You Need","authors": ["Ashish Vaswani", "Noam Shazeer", "Niki Parmar"],"abstract": "The dominant sequence transduction models are based on complex recurrent...","pdf_url": "https://arxiv.org/pdf/1706.03762","arxiv_url": "https://arxiv.org/abs/1706.03762","arxiv_id": "1706.03762","published": "2017-06-12","updated": "2023-08-02","primary_category": "cs.CL","categories": ["cs.CL", "cs.LG"],"doi": "10.48550/arXiv.1706.03762","journal_ref": "Advances in Neural Information Processing Systems 30, 2017","comment": "15 pages, 5 figures","source": "arXiv"}
🗂️ Supported Categories
| Code | Subject |
|---|---|
cs.LG | Machine Learning |
cs.AI | Artificial Intelligence |
cs.CL | Natural Language Processing |
cs.CV | Computer Vision |
cs.RO | Robotics |
stat.ML | Statistics / ML |
math | Mathematics |
physics | Physics |
econ | Economics |
q-bio | Quantitative Biology |
q-fin | Quantitative Finance |
⚙️ Input Examples
Search by keyword
{"search_query": "large language models", "max_results": 50}
Search by category + date
{"category": "cs.LG", "date_from": "2024-01-01", "sort_by": "date", "max_results": 100}
Search by author
{"author": "Geoffrey Hinton", "max_results": 30}
Search in titles only
{"search_query": "transformer", "title_only": true, "max_results": 20}
🎯 Use Cases
| Use Case | Example |
|---|---|
| Academic Research | Track latest papers in your field |
| Literature Review | Build bibliography for thesis or paper |
| AI/ML Monitoring | Stay updated on new model releases |
| Citation Analysis | Collect paper metadata for analysis |
| Dataset Building | Create research paper datasets |
| Competitor Research | Track papers from specific institutions |
🔑 Keywords
arXiv scraper, academic paper scraper, research paper extractor, arXiv API, scientific paper scraper, PDF downloader academic, machine learning papers, AI papers dataset, arxiv.org scraper, preprint scraper, citation extractor
💰 Pricing
$12.99/month — Unlimited arXiv paper extraction.