arXiv Metadata Collector— Metadata, PDF, Authors & Abstract avatar

arXiv Metadata Collector— Metadata, PDF, Authors & Abstract

Pricing

$12.99/month + usage

Go to Apify Store
arXiv Metadata Collector— Metadata, PDF, Authors & Abstract

arXiv Metadata Collector— Metadata, PDF, Authors & Abstract

Scrape arXiv research papers with metadata including title, authors, abstract, PDF links, DOI, and categories. Supports keyword search, proxy integration, and structured dataset output for AI, ML, and academic research use

Pricing

$12.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

📚 arXiv Research Paper Scraper

Search and extract real academic research papers from arXiv.org — the world's largest preprint repository with 2.3+ million papers across Computer Science, Physics, Mathematics, Statistics, Economics, Biology and more.

Uses the official arXiv API — free, public, always up to date. No scraping, no bot detection.


🎯 Quick Start — Demo Mode

Get 10 landmark AI papers in 2 seconds:

  1. Toggle Demo ModeON
  2. Click Run Actor
  3. Get real papers: Transformers, BERT, GPT-3, LLaMA, GAN, ViT, DDPM, RAG

📦 Output Fields

FieldTypeDescriptionExample
titlestringFull paper titleAttention Is All You Need
authorsarrayAll author names["Vaswani", "Shazeer", ...]
abstractstringFull paper abstractWe propose a new simple network...
pdf_urlstringDirect PDF download URLhttps://arxiv.org/pdf/1706.03762
arxiv_urlstringarXiv abstract page URLhttps://arxiv.org/abs/1706.03762
arxiv_idstringarXiv paper ID1706.03762
publishedstringFirst submission date2017-06-12
updatedstringLast update date2023-08-02
primary_categorystringMain arXiv categorycs.CL
categoriesarrayAll subject categories["cs.CL","cs.LG"]
doistringDOI (if published in journal)10.48550/arXiv.1706.03762
journal_refstringJournal/conference referenceNeurIPS 2017
commentstringAuthor comments / notes15 pages, 5 figures
sourcestringData sourcearXiv

📋 Example Output

{
"title": "Attention Is All You Need",
"authors": ["Ashish Vaswani", "Noam Shazeer", "Niki Parmar"],
"abstract": "The dominant sequence transduction models are based on complex recurrent...",
"pdf_url": "https://arxiv.org/pdf/1706.03762",
"arxiv_url": "https://arxiv.org/abs/1706.03762",
"arxiv_id": "1706.03762",
"published": "2017-06-12",
"updated": "2023-08-02",
"primary_category": "cs.CL",
"categories": ["cs.CL", "cs.LG"],
"doi": "10.48550/arXiv.1706.03762",
"journal_ref": "Advances in Neural Information Processing Systems 30, 2017",
"comment": "15 pages, 5 figures",
"source": "arXiv"
}

🗂️ Supported Categories

CodeSubject
cs.LGMachine Learning
cs.AIArtificial Intelligence
cs.CLNatural Language Processing
cs.CVComputer Vision
cs.RORobotics
stat.MLStatistics / ML
mathMathematics
physicsPhysics
econEconomics
q-bioQuantitative Biology
q-finQuantitative Finance

⚙️ Input Examples

Search by keyword

{"search_query": "large language models", "max_results": 50}

Search by category + date

{"category": "cs.LG", "date_from": "2024-01-01", "sort_by": "date", "max_results": 100}

Search by author

{"author": "Geoffrey Hinton", "max_results": 30}

Search in titles only

{"search_query": "transformer", "title_only": true, "max_results": 20}

🎯 Use Cases

Use CaseExample
Academic ResearchTrack latest papers in your field
Literature ReviewBuild bibliography for thesis or paper
AI/ML MonitoringStay updated on new model releases
Citation AnalysisCollect paper metadata for analysis
Dataset BuildingCreate research paper datasets
Competitor ResearchTrack papers from specific institutions

🔑 Keywords

arXiv scraper, academic paper scraper, research paper extractor, arXiv API, scientific paper scraper, PDF downloader academic, machine learning papers, AI papers dataset, arxiv.org scraper, preprint scraper, citation extractor


💰 Pricing

$12.99/month — Unlimited arXiv paper extraction.