Pricing

$1.00 / 1,000 paper scrapeds

arXiv Scraper: Search Research Preprints

Scrape arXiv research papers by keyword, category (cs.AI, cs.LG, quant-ph) or author. Returns titles, abstracts, authors, dates, DOIs & PDF links as clean JSON. No API key. Use it as an MCP server in Claude, ChatGPT & AI agents for research monitoring.

Pricing

$1.00 / 1,000 paper scrapeds

Rating

0.0

(0)

Developer

The Mine Works

Actor stats

Bookmarked

Total users

Monthly active users

4 days ago

Last modified

📚 arXiv Paper Scraper: Search Research Preprints, No API Key

Overview

arXiv Paper Scraper turns any arXiv search into structured JSON. Query by keyword, arXiv category (cs.AI, cs.LG, cs.CL, quant-ph, q-bio.GN), author, or fielded expression like ti:transformer AND cat:cs.AI. Get back title, abstract, authors, categories, submitted and updated dates, DOI where available, and direct PDF links for every matching preprint. No API key, no rate-limit surprises.

It's the fastest way to build a research monitoring feed, seed a RAG system with primary literature, or run a daily digest for a research team.

Reliability posture: blocked, empty, or failed runs are never charged. You only pay for a paper record that was actually delivered.

✅ No API key required | ✅ Full abstracts | ✅ PDF links included | ✅ MCP-ready for AI agents

Features

Full-text & fielded search. Free text or arXiv prefixes (ti:, au:, abs:, cat:). Category filter. Any arXiv category (cs.AI, math.AG, q-bio.GN, stat.ML). Sort control. Newest first by submission or by relevance. Date range. Filter by submission date for monitoring pipelines. Direct PDFs. PDF URL on every record for downstream ingestion.

How it works

The actor calls the public arXiv search API using either a free-text query or a fielded expression (ti: title, au: author, abs: abstract, cat: category). Results are sorted by your choice (newest first by submission date, or by relevance) and paginated up to your maxResults budget.

Every returned entry is flattened to a clean JSON row with the arXiv ID, full abstract, author list, category tags, submission and last-updated dates, DOI where the paper has been published, and the direct PDF URL for ingestion.

🧾 Input configuration

{
  "query": "large language models",
  "category": "cs.CL",
  "sortBy": "submittedDate",
  "dateFrom": "20250101",
  "maxResults": 100
}

📤 Output format

{
  "arxiv_id": "2607.13034v1",
  "title": "Do AI Agents Know When a Task Is Simple? Toward Complexity-Aware Reasoning and Execution",
  "abstract": "Large language model (LLM) agents increasingly automate multi-step engineering and informatics workflows, yet they rarely ask how much effort a task actually requires. They often follow a maximum-context-first strategy, re-reading files and dependencies they have already seen, turning a one-line edit into a small code-base audit. We argue the missing capability is task-aware execution-scope estimation: judging a task's difficulty, the information it truly needs, and the shortest reliable path before committing budget. We formalize minimum-sufficient execution and the Agent Cognitive Redundancy Ratio (ACRR), and propose E3 (Estimate, Execute, Expand): the agent estimates an initial operating point, executes a minimum viable path, and expands scope only when verification fails. On MSE-Bench, a deterministic benchmark of 121 edits in a capability-controlled simulator, E3 matches the strongest baseline's 100% success while cutting cost by 85%, tokens by 91%, and inspected files by 92% ...",
  "authors": ["Junjie Yin", "Xinyu Feng"],
  "categories": ["cs.AI", "cs.CL", "cs.SE", "eess.SY"],
  "primary_category": "cs.AI",
  "published_date": "2026-07-14T17:59:31Z",
  "updated_date": "2026-07-14T17:59:31Z",
  "pdf_url": "https://arxiv.org/pdf/2607.13034v1",
  "url": "https://arxiv.org/abs/2607.13034v1",
  "scraped_at": "2026-07-15T04:16:16.482Z"
}

Every paper record contains these fields:

Field	Description
🆔 `arxiv_id`	arXiv paper identifier, including version suffix (e.g. `2607.13034v1`)
📄 `title`	Paper title
📝 `abstract`	Full abstract text
👥 `authors`	Array of author names
🏷️ `categories`	arXiv category codes (e.g. `cs.AI`, `quant-ph`)
🎯 `primary_category`	Primary arXiv category
📅 `published_date`	Original submission timestamp, ISO 8601
🔁 `updated_date`	Last-updated timestamp if the paper was revised
🔗 `doi`	Digital Object Identifier, present only once the paper has been formally published elsewhere
📥 `pdf_url`	Direct PDF download URL
🌐 `url`	Human-readable abstract page URL
🕒 `scraped_at`	ISO 8601 timestamp of when the record was captured

💼 Common use cases

Research monitoring Pull every new cs.CL paper each morning and email a digest to the team. Track a specific author or lab by author search on a schedule.

RAG & AI research tools Seed a retrieval system with abstracts and PDFs across a topic or category. Feed a coding assistant with the latest methods papers for a domain.

Literature reviews Build a corpus of relevant preprints for a survey or systematic review. Cluster papers by category and date for a market or method landscape.

Competitive AI intel Watch which methods a specific lab publishes and how frequently. Spot new benchmarks and datasets before they show up in blog posts.

🚀 Getting started

Open the actor and enter a query (free text or a fielded expression).
Optionally set an arXiv category (e.g. cs.CL, cs.LG, quant-ph).
Pick sortBy: submittedDate for newest first, or relevance.
Set dateFrom (YYYYMMDD) for a monitoring window and maxResults.
Click Start. Records stream to the dataset as pages parse.

FAQ

Do I need an arXiv API key? No. arXiv's search API is fully public and requires no credentials.

How do fielded queries work? Use ti: for title, au: for author, abs: for abstract, cat: for category, combined with AND / OR. Example: ti:transformer AND cat:cs.AI.

How much does it cost? Pay per paper returned, pay as you go. No subscription, no monthly minimum.

Can I use it in an AI agent? Yes. It's exposed as an MCP tool. See below.

Use in Claude, ChatGPT & any MCP agent

https://mcp.apify.com/?tools=themineworks/arxiv-preprint-search

Or call it programmatically with the Apify client:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('themineworks/arxiv-preprint-search').call({
  query: 'large language models',
  category: 'cs.CL',
  sortBy: 'submittedDate',
  maxResults: 25,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

🛠️ Complete your research intel pipeline

Got the preprints. Now widen the corpus:

Crossref Scholarly Metadata: pull the published-journal version and citations.
PubMed NCBI Scraper: pull biomedical literature on the same topic.
Google Trends Scraper: check public interest for the topic over time.

Typical flow: arxiv surfaces the newest preprints, crossref links to the published version, pubmed adds biomedical coverage.

Found a bug or have a feature request? Open an issue on the actor's Apify Console page or reach out through the Apify profile.

arXiv Paper Tracker - Free Academic API

wiry_kingdom/arxiv-paper-tracker

Track new arXiv papers. Filter by category (cs.AI, cs.LG, q-bio.NC...), author, keyword, date. Returns title, authors, abstract, PDF link, DOI. Free official arXiv API. For AI/ML researchers, biotech analysts, journalists.

Mohieldin Mohamed

arXiv CS Papers Scraper

gio21/arxiv-cs-scraper

arXiv CS Papers Scraper — auto-scaffolded

Gio

arXiv Scraper - Research Papers & Abstracts

antishock/arxiv-paper-scraper

Search and scrape arXiv research papers by keyword, category (cs.AI, math, physics, etc.), date range or author. Returns title, abstract, authors, PDF link and citation data. Bulk export ready.

Ryan Zinburg

arXiv Papers Scraper

thriftykiwi/arxiv-scraper

Extract academic paper metadata from arXiv via the official public API. Search by keyword, browse categories (cs.AI, cs.LG, stat.ML, etc.), or fetch specific papers by arXiv ID. Parses Atom XML into clean JSON with title, authors, abstract, DOI, PDF link, and categories. No authentication required.

Thrifty Kiwi

arXiv Papers Scraper Pro — Research Papers, Authors, Citations

diverse_venture/arxiv-papers-scraper

Search and scrape arXiv research papers. Returns titles, abstracts, authors, categories, DOIs, and PDF download links. Filter by keywords (cat:cs.LG, all:transformer, au:author_name). Up to 500 papers per run. No auth required. Ideal for AI researchers and academic data mining.

Chak Man Fung

arXiv Papers Scraper

resounding_diplomacy/arxiv-papers-scraper

Scrape academic papers from arXiv by category, keyword, or author. Extract titles, authors, abstracts, PDF URLs, DOIs, categories, and more. Perfect for AI/ML research datasets.

alars num

arXiv Paper Scraper

lulzasaur/arxiv-scraper

Search and scrape arXiv academic papers. Get titles, authors, abstracts, categories, PDF links, DOIs. Search by keyword, browse recent papers by category, or fetch by arXiv ID.

lulz bot

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Daniel

arXiv Research Paper Scraper

seeb/arxiv-research-paper-scraper

Scrape arXiv papers by keyword or category and return research titles, abstracts, authors, dates, links, and topic signals.

Techionik

arXiv Machine Learning Papers Scraper

zenolvepro/arxiv-cs-lg

Latest cs.LG (Machine Learning) preprints from arXiv — title, first author and primary subject as JSON. For AI labs, research tools and investors monitoring the field daily. Pay per result.