Pricing

from $2.00 / 1,000 results

ArXiv Paper Scraper

Search and extract scientific papers from ArXiv.org. Returns title, authors, abstract, categories, submission date, and PDF link. Ideal for AI research, RAG pipelines, and academic trend monitoring.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Sheshinmcfly

Actor stats

Bookmarked

Total users

Monthly active users

10 days ago

Last modified

What data does it extract?

Field	Description	Example
`arxivId`	ArXiv paper ID	`"2604.18584"`
`title`	Full paper title	`"MathNet: a Global Multimodal Benchmark..."`
`authors`	List of authors	`["Shaden Alshammari", "Kevin Wen"]`
`abstract`	Full abstract text	`"Mathematical problem solving remains..."`
`categories`	ArXiv subject tags	`["cs.AI", "cs.LG", "cs.IR"]`
`primaryCategory`	Primary category	`"cs.AI"`
`submittedDate`	Submission date	`"2026-04-20T00:00:00Z"`
`updatedDate`	Last update date	`"2026-04-21T00:00:00Z"`
`publishedDate`	Publication date	`"2026-04-20T00:00:00Z"`
`comments`	Author comments	`"ICLR 2026; 30 pages"`
`journalRef`	Journal reference	`"Proceedings of ICLR, 2026"`
`doi`	DOI if available	`"10.48550/arXiv.2604.18584"`
`pdfUrl`	Direct PDF link	`"https://arxiv.org/pdf/2604.18584"`
`url`	ArXiv abstract page	`"https://arxiv.org/abs/2604.18584"`
`citationCount`	Total citations (Semantic Scholar)	`142`
`influentialCitationCount`	Influential citations	`18`
`tldr`	AI-generated 1-sentence summary	`"A new benchmark for math reasoning..."`
`relevanceScore`	Computed relevance 0–100	`72`
`query`	Search query used	`"large language models"`
`extractedAt`	Extraction timestamp	`"2026-04-21T12:00:00Z"`

Use cases

RAG pipelines: Feed domain-specific papers into retrieval-augmented AI systems
AI research monitoring: Track the latest publications in LLMs, computer vision, NLP
Academic trend analysis: Identify hot topics and emerging research areas
Literature review automation: Collect papers for a specific topic at scale
LLM fine-tuning data: High-quality scientific text for model training
Competitive intelligence: Monitor what research competitors are publishing

How to use

Open the actor and configure:
- Mode: search (keyword), category (e.g. cs.AI), or id (specific paper IDs)
- Search queries: One or more search terms (e.g. "diffusion models", "reinforcement learning")
- Search field: All fields, title only, abstract only, or author
- Sort by: Newest first or by relevance
- Max results: Number of papers to return (up to 500 per run)
- Semantic Scholar enrichment: Enable to add citation counts and AI-generated TLDRs
Click Start
Download results as JSON, CSV, or Excel

Agent-ready via x402: AI agents can run this actor directly with USDC on Base — no Apify account needed. See x402 protocol docs.

Input parameters

Parameter	Type	Default	Description
`mode`	string	`"search"`	`search`, `category`, or `id`
`queries`	array	`["large language models"]`	Search terms (mode=search)
`searchField`	string	`"all"`	`all`, `title`, `abstract`, or `author`
`categories`	array	`["cs.AI"]`	ArXiv category codes (mode=category)
`idList`	array	`[]`	ArXiv IDs to fetch (mode=id)
`sortBy`	string	`"submittedDate"`	`submittedDate`, `relevance`, or `lastUpdatedDate`
`sortOrder`	string	`"descending"`	`descending` or `ascending`
`maxResults`	integer	`50`	Max papers to return (up to 500)
`minYear`	integer	`0`	Filter: exclude papers before this year
`maxYear`	integer	`0`	Filter: exclude papers after this year
`includeSemanticScholar`	boolean	`true`	Enrich with citation counts and TLDRs

Example output (JSON)

{
  "arxivId": "2604.18584",
  "title": "MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval",
  "authors": ["Shaden Alshammari", "Kevin Wen", "Antonio Torralba"],
  "abstract": "Mathematical problem solving remains a challenging test of reasoning...",
  "categories": ["cs.AI", "cs.DL", "cs.IR", "cs.LG"],
  "primaryCategory": "cs.AI",
  "submittedDate": "2026-04-20T00:00:00Z",
  "publishedDate": "2026-04-20T00:00:00Z",
  "comments": "ICLR 2026; Website: http://mathnet.mit.edu",
  "journalRef": "Proceedings of ICLR, 2026",
  "doi": "10.48550/arXiv.2604.18584",
  "pdfUrl": "https://arxiv.org/pdf/2604.18584",
  "url": "https://arxiv.org/abs/2604.18584",
  "citationCount": 142,
  "influentialCitationCount": 18,
  "tldr": "A new multimodal benchmark for mathematical reasoning and retrieval across diverse problem types.",
  "relevanceScore": 72,
  "query": "large language models",
  "extractedAt": "2026-04-21T12:00:00.000Z"
}

Pricing

This actor charges $0.002 USD per paper extracted. Extracting 100 papers costs approximately $0.20 USD.

FAQ

What is the difference between search, category, and id mode? search finds papers by keywords across all fields (or title/abstract/author). category returns the latest papers in a specific ArXiv subject (e.g. cs.AI, math.ST). id fetches exact papers by their ArXiv ID (e.g. 2312.00752).

How many papers can I extract per run? Up to 500 papers per run. For larger batches, run multiple times with different date ranges using minYear/maxYear.

What are citation counts and TLDRs? When includeSemanticScholar is enabled, each paper is enriched with citation counts and an AI-generated one-sentence summary (TLDR) from the Semantic Scholar API. This adds ~0.4s per paper.

Does it require an API key or login? No. ArXiv's public API (export.arxiv.org) is free and unauthenticated. Semantic Scholar's free tier is used for enrichment.

What ArXiv categories are supported? All ArXiv categories: cs.* (Computer Science), math.*, physics.*, stat.*, q-bio.*, econ.*, and more. Full list at arxiv.org/category_taxonomy.

Other actors you may like

StackOverflow Scraper — search developer Q&A by keyword or tag.
Trustpilot Reviews Scraper — extract reviews and ratings from Trustpilot.
SEC EDGAR Scraper — company filings and financial disclosures from the SEC.
FinViz Stock Screener — stock screener with gainers, losers, and sector filters.

Keywords

arxiv scraper, scientific paper extractor, research paper scraper, arxiv API, AI paper scraper, academic data extractor, preprint scraper, NLP research data, LLM training data, arxiv search scraper

Legal Disclaimer

This actor extracts publicly available open-access data only from ArXiv.org, in compliance with Chilean Law 19.628 on the Protection of Private Life (Ley 19.628 sobre Protección de la Vida Privada).

ArXiv is an open-access repository operated by Cornell University. All papers and metadata extracted are freely and publicly accessible without authentication.

What this actor does NOT collect:

Names or personal data of any private individuals
User accounts, submissions portals, or private information
Any data not freely visible to anonymous visitors

What this actor collects:

Paper titles, abstracts, and author names (public academic data)
Subject categories and submission dates
Public URLs and PDF links

Users are solely responsible for ensuring their use of this data complies with applicable laws and ArXiv's terms of use.

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

Monkey Coder

arXiv Search & Paper Scraper

scrapeworks/arxiv-search

Search arXiv and get clean structured JSON for each paper: title, authors, abstract, categories, DOI, PDF link, and dates. Built for research, datasets, and AI pipelines.

Nicolas van Arkens

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

Crawler Bros

arXiv Papers Scraper

troy_007/arxiv-papers-scraper

Search and export arXiv research papers by query, category, or author — title, abstract, authors, categories, dates, PDF link, and DOI. Uses the official arXiv API.

Pathik Shah

ArXiv Paper Search MCP

reverberant_equality/mcp-arxiv-search

Search ArXiv papers and retrieve paper details. AI agents can discover academic research, abstracts, authors, categories, and PDF links.

Jordan C

ArXiv Academic Paper Scraper

fortuitous_pirate/arxiv-scraper

Scrape academic papers from ArXiv. Extract titles, authors, abstracts, categories, and PDF links. Essential for research and literature reviews.

Fortuitous Pirate

🔬 arXiv Scraper - Scientific Papers, Abstracts & PDFs

benthepythondev/arxiv-scraper

arXiv Scraper for the official arXiv API. Search 2M+ scientific papers in CS, physics, math and biology by keyword, title, author, abstract or category. Extract title, authors, abstract, categories, DOI, dates and PDF links. For AI/ML research, literature reviews and RAG datasets.

Ben

arXiv Papers Scraper

crawlerbros/arxiv-papers-scraper

Scrape academic preprints from arXiv.org by keyword, author, or category. Returns clean records with title, authors, abstract, categories, PDF URL, DOI. HTTP-only via the public arXiv API. No login, no proxy.

Crawler Bros