Pricing

from $3.50 / 1,000 arxiv paper saveds

arXiv Papers Monitor for Research Alerts

Monitor arXiv papers by query, category, author, or date. Export titles, abstracts, authors, links, PDFs, categories, and agent-friendly summaries for research monitoring, literature review, and AI paper workflows.

Pricing

from $3.50 / 1,000 arxiv paper saveds

Rating

0.0

(0)

Developer

Skootle

Actor stats

Bookmarked

Total users

Monthly active users

7 days ago

Last modified

TL;DR

AI engineers and ML researchers waste 30+ minutes a day refreshing arxiv.org/list/cs.AI/recent and copy-pasting abstracts into spreadsheets. This delivers a clean daily diff of new arXiv papers in your tracked categories (cs.AI, cs.LG, cs.CL, cs.CV, stat.ML, math.OC, q-bio, more), deduplicated by arxivId, with full abstract, every author, PDF URL, DOI, and an LLM-ready markdown card per record. Watchlist mode emits only papers new since the last run, so a daily schedule feeds a RAG pipeline, vector DB, weekly research email, or Slack digest with zero duplicates and ISO 8601 timestamps your downstream sort logic can trust.

Try it on a small dataset (the 10-paper default fits the free $5 trial credit), then let us know what you think in a review.

What does arXiv Papers Monitor do?

It calls the public arXiv API on your behalf and turns the raw Atom feed into clean JSON your code can use immediately. Each paper record includes:

arxivId (e.g. 2304.12345) and the version-aware arxivIdVersion (2304.12345v2)
Full title and full abstract
Every authors[].name and (when arXiv provides it) authors[].affiliation
primaryCategory plus the full categories[] list, with an isCrossListed flag
submittedDate and updatedDate in ISO 8601
Direct pdfUrl and abstractPageUrl
doi, journalRef, and the author's own comment (often "NeurIPS 2026 spotlight" or page count)
agentMarkdown: a 5-line markdown card formatted for Claude / Codex / Slack / a CRM ticket

One API call replaces the manual workflow of opening arxiv.org, choosing a category, paging through 50 abstracts at a time, copy-pasting fields into a spreadsheet, and chasing PDF links. We collapse that to a JSON dataset you can pipe into a vector DB, an LLM agent, an alerting system, or a research dashboard.

Why scrape arXiv?

arXiv is where every AI, ML, vision, and NLP paper lands first, often weeks or months before peer review. If your job is "what was published this week in X," refreshing arxiv.org/list/cs.AI/recent and copy-pasting abstracts into a spreadsheet eats 30+ minutes a day.

Feed a RAG pipeline, drive a weekly research newsletter, watch a specific lab or topic, or build training corpora, all from one daily diff. The buyers here are AI engineers wiring research retrieval, ML researchers tracking sub-fields, and editors of weekly AI newsletters who need a clean "what's new since yesterday" feed.

Who needs this?

AI agent builders wiring research-paper retrieval into RAG pipelines and need clean text plus PDF URLs without writing an Atom parser
ML researchers tracking three or four sub-fields and wanting a daily digest of new submissions in their categories
AI journalists chasing weekly stories who need to spot trending architectures, models, and lab outputs as they appear
M&A and corp-dev analysts profiling AI startups by tracking which authors and labs are publishing what
Recruiters sourcing ML talent by pulling first-author lists from hot subfields (RLHF, MoE, agents, vision-language)
Data scientists at LLM labs building reproduction pipelines who need full abstracts and DOIs, not titles
Conference reviewers and editors who want a structured, per-category submission feed for trend analysis

If your job involves "what was published on arXiv this week in X," you are the buyer.

How to use arXiv Papers Monitor

Open the actor in Apify Console.
Pick your categories (e.g. ["cs.AI","cs.CL"]) or type a query ("retrieval augmented generation").
Optionally set submittedAfter to limit to recent papers, or flip watchlistMode on for a daily-new feed.
Click Start. The default (maxItems: 10) returns about 30 seconds of work.
Download the dataset as JSON, CSV, or Excel, or pull it via the API at https://api.apify.com/v2/acts/skootle~arxiv-papers/runs/last/dataset/items.

How much will scraping arXiv cost?

Pay-per-result pricing. You only pay for papers actually saved, plus a one-time start fee per run.

Plan	Per paper	Run start
FREE	$0.005	$0.005
BRONZE	$0.0045	$0.005
SILVER	$0.004	$0.005
GOLD	$0.0035	$0.005
PLATINUM	$0.003	$0.005
DIAMOND	$0.003	$0.005

Typical daily watchlist run for one researcher (50 new papers across cs.AI + cs.CL): about $0.26 on FREE, $0.16 on PLATINUM. A weekly bulk pull of 1000 papers is about $5 on FREE, $3 on PLATINUM. The $5 free Apify credit covers roughly 1000 records on the FREE tier.

Is it legal to scrape arXiv?

arXiv runs an official, public, unauthenticated query API explicitly intended for programmatic access. We honor their published rate limit (1 request per 3 seconds) and identify ourselves with a descriptive User-Agent header. arXiv's Terms of Use cover non-commercial use directly; for commercial redistribution of paper content, follow up with arXiv directly and consult your own counsel.

This actor pulls only the metadata + abstract that arXiv exposes through the public API. It does not download PDFs, does not bypass any auth, and does not touch withdrawn papers.

Examples

1. Daily new cs.AI papers

{
  "categories": ["cs.AI"],
  "sortBy": "submittedDate",
  "sortOrder": "descending",
  "maxItems": 50,
  "watchlistMode": true
}

Schedule daily, point the dataset webhook at Slack or a vector DB.

2. RAG-themed papers from the last 30 days

{
  "query": "retrieval augmented generation",
  "submittedAfter": "2026-04-09",
  "submittedBefore": "2026-05-09",
  "maxItems": 200
}

3. NLP + ML cross-listed papers

{
  "categories": ["cs.CL", "cs.LG"],
  "sortBy": "submittedDate",
  "maxItems": 100
}

4. Specific lab tracking via author keyword in title

{
  "query": "DeepMind OR Anthropic",
  "categories": ["cs.AI", "cs.LG"],
  "maxItems": 100
}

5. Diffusion-model survey

{
  "query": "diffusion model",
  "sortBy": "relevance",
  "maxItems": 100
}

6. Math optimization for ML

{
  "categories": ["math.OC", "stat.ML"],
  "submittedAfter": "2026-01-01",
  "maxItems": 200
}

7. Computational neuroscience

{
  "categories": ["q-bio.NC", "cs.NE"],
  "maxItems": 50
}

8. Title-only feed for fast indexing

{
  "categories": ["cs.CV"],
  "includeAbstract": false,
  "maxItems": 1000
}

Input parameters

Field	Type	Description
`query`	string	Free-text search across title + abstract
`categories`	string[]	arXiv category codes (cs.AI, cs.LG, cs.CL, cs.CV, stat.ML, math.OC, physics., q-bio., more)
`submittedAfter`	string (ISO date)	Earliest submission date
`submittedBefore`	string (ISO date)	Latest submission date
`sortBy`	enum	submittedDate, lastUpdatedDate, or relevance
`sortOrder`	enum	descending or ascending
`maxItems`	int	Max papers per run (default 10, max 2000)
`includeAbstract`	bool	Toggle full abstract vs title-only (default true)
`watchlistMode`	bool	Emit only new papers since the last run
`proxyConfiguration`	object	Optional residential proxy for very large bulk runs

arXiv output format

`arxiv_paper` record

Field	Type	Notes
`recordType`	string	Always `"arxiv_paper"`
`outputSchemaVersion`	string	`"2026-05-10"`. Bumps on schema change.
`arxivId`	string	`"2304.12345"` (no version)
`arxivIdVersion`	string	`"2304.12345v2"`
`doi`	string \| null	DOI when assigned
`title`	string	Full title
`abstract`	string	Full abstract, whitespace-normalized
`authors`	object[]	`{ name, affiliation }` per author
`authorCount`	int	Length of `authors`
`primaryCategory`	string	e.g. `"cs.AI"`
`categories`	string[]	All assigned categories
`submittedDate`	string	ISO 8601
`updatedDate`	string	ISO 8601
`pdfUrl`	string	Direct PDF URL
`abstractPageUrl`	string	arxiv.org abs page
`journalRef`	string \| null	"Nature 612, 2026" style reference if accepted
`comment`	string \| null	Author note ("NeurIPS 2026 spotlight", page count, etc.)
`estimatedReadMinutes`	int	Abstract word count / 200
`isCrossListed`	bool	True when `categories.length > 1`
`agentMarkdown`	string	LLM-ready 5-line card
`fieldCompletenessScore`	int	0-100, 10 fields evaluated
`scrapedAt`	string	ISO 8601

Sample record

{
  "recordType": "arxiv_paper",
  "outputSchemaVersion": "2026-05-10",
  "arxivId": "2605.06667",
  "arxivIdVersion": "2605.06667v1",
  "doi": null,
  "title": "ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation",
  "abstract": "For artistic applications, video generation requires fine-grained control...",
  "authors": [
    { "name": "Omar El Khalifi", "affiliation": null },
    { "name": "Thomas Rossi", "affiliation": null }
  ],
  "authorCount": 9,
  "primaryCategory": "cs.CV",
  "categories": ["cs.CV", "cs.AI", "cs.LG"],
  "submittedDate": "2026-05-07T17:59:58Z",
  "updatedDate": "2026-05-07T17:59:58Z",
  "pdfUrl": "https://arxiv.org/pdf/2605.06667v1",
  "abstractPageUrl": "https://arxiv.org/abs/2605.06667v1",
  "journalRef": null,
  "comment": "SIGGRAPH 2026",
  "estimatedReadMinutes": 2,
  "isCrossListed": true,
  "agentMarkdown": "📄 ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation (2605.06667)\n👥 Omar El Khalifi + 8 more\n📅 Submitted 2026-05-07 · Category cs.CV\n📊 2 min read · Cross-listed\n🔗 https://arxiv.org/pdf/2605.06667v1",
  "fieldCompletenessScore": 80,
  "scrapedAt": "2026-05-09T20:50:00Z"
}

During the actor run

No authentication required. The actor honors arXiv's published 1-request-per-3-seconds rate limit and identifies itself with a descriptive User-Agent, so the source stays available for everyone. A 1000-paper pull typically completes in about 30 seconds.

A run summary lands at the OUTPUT key, a markdown digest of the top 5 papers at AGENT_BRIEFING, and (with watchlistMode: true) the rolling 50,000-id dedupe window at WATCHLIST_STATE.

FAQ

How is this different from arXiv's free API?

The free API returns raw Atom XML with namespaced tags. You write an XML parser, you write a paginator that respects the 3-second rate limit, you write a normalizer for affiliation / journal_ref / comment fields, and you write the watchlist diff yourself. Then you maintain it. We give you typed JSON, idempotent IDs, watchlist mode, an agent-ready markdown card per record, and a versioned schema so your downstream pipeline does not silently break.

What about HuggingFace papers or PapersWithCode?

Different sources, different scope. HuggingFace Papers is curated and lags arXiv. PapersWithCode focuses on code-attached papers. Use this actor for the firehose, then enrich with HF / PWC if you need code-availability signals. We will likely ship companion actors for both in v0.2.

Can I track only papers from specific universities or labs?

Indirectly. arXiv's API does not expose a clean "affiliation" filter, but you can query by lab keywords ("Anthropic", "DeepMind", "Stanford NLP") and the term will match titles and abstracts. For author-list filtering, post-process the dataset (authors[].name) downstream.

How does watchlist mode work?

Flip watchlistMode: true. The actor reads WATCHLIST_STATE from the key-value store, runs the search, and emits only papers whose arxivId it has not delivered before. After each run it appends the newly seen IDs back to state (rolling window of 50,000). Pair with a daily Apify schedule for a clean "what's new" feed.

Can I use this with Python?

Yes. pip install apify-client, call client.actor("skootle/arxiv-papers").call(run_input=...), then iterate client.dataset(run["defaultDatasetId"]).iterate_items().

Can I integrate with Make / Zapier / n8n / Slack?

Yes. Apify exposes webhook triggers on dataset items and run completion. n8n and Make have native Apify connectors; Zapier works through the standard webhook bridge.

Why does this cost more than free arXiv scrapers?

If you are wiring this into a customer-facing product or a daily AI-agent pipeline, the per-record cost ($0.003 at GOLD) buys you reliability free actors do not provide: versioned schema, idempotent IDs, watchlist diff, daily Apify auto-test reliability, and a maintenance commitment. Free actors break monthly when the source changes a tag name, you do not get notified, and your pipeline silently goes empty.

What rate limits should I worry about?

arXiv asks for at most 1 request per 3 seconds. We honor that automatically. With 100 papers per page, a 1000-paper pull takes roughly 30 seconds plus arXiv processing time.

Does this download the full PDF?

No, only metadata and abstract. The pdfUrl field gives you the direct PDF link if your downstream needs the full text.

Why choose arXiv Papers Monitor

Monitor mode emits only what's new since last run. A rolling 50,000-id window means your RAG pipeline ingests each paper exactly once.
Reliability free actors can't deliver. Free arXiv scrapers break monthly when source tags change. You don't get notified, your pipeline silently goes empty. The per-record cost ($0.003 at GOLD) buys daily auto-test reliability and 24-48 hour fix turnaround.
Sub-minute runtime, no rate-limit babysitting. Pure HTTP against the official arXiv API, no HTML parsing, no headless browser, 1000 papers in about 30 seconds.
Drop-in for LLM agents. agentMarkdown card baked into every record, plus a per-run AGENT_BRIEFING.md digest of the top 5 papers ready for Slack or a daily LLM context window.
Schema doesn't break your pipeline, versioned and bumped on every breaking change.
Re-runs are safe to dedupe by ID, arxivId-keyed records upsert cleanly across runs.
AI agents can self-filter sparse rows via fieldCompletenessScore (0-100, 10 fields evaluated).

Your feedback

Hit a bug or want a feature? Open an issue on the Issues tab rather than the reviews page, and we will fix it fast (typically within 48 hours).

Other Skootle actors you might want to check

skootle/hackernews-watchlist, watchlist new HN stories matching keywords or domains
skootle/github-trending, daily trending repos by language with stargazer + commit signals
skootle/reddit-subreddit-monitor, new posts in any subreddit with watchlist diff
skootle/sec-edgar-filings, public SEC filings normalized for AI agents

Support and contact

Found a bug or need a new field? Open an issue. For commercial use questions, email jamie.kester@gmail.com.

ArXiv Papers Scraper — Research Paper API

fast_api/arxiv-papers-scraper

Search and extract ArXiv research papers as structured JSON: titles, authors, abstracts, categories, dates, PDFs, and metadata. Built for AI research monitoring, literature review, RAG datasets, and academic intelligence.

Fast API

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Daniel

arXiv Paper Scraper

skystone_labs/arxiv-scraper

Extract research papers from arXiv using the official API. Get titles, authors, abstracts, PDF URLs, categories, and more. Perfect for research datasets and literature reviews.

Skystone

arXiv Research Paper Scraper

seeb/arxiv-research-paper-scraper

Scrape arXiv papers by keyword or category and return research titles, abstracts, authors, dates, links, and topic signals.

Techionik

arXiv Paper Scraper - AI ML Research Papers

openclawmara/arxiv-paper-scraper

Scrape arXiv research papers by keyword, category, or author. Extracts titles, abstracts, authors, citations, and metadata. Perfect for AI/ML research monitoring, literature reviews, and LLM training data collection.

OpenClaw Mara

arXiv Paper Scraper

technicaldost/arxiv-paper-scraper

Search and scrape academic papers from arXiv. Extract titles, authors, abstracts, categories, PDF links and publication dates by keyword, category or author. Ideal for research, literature reviews and building ML training datasets.

Technical Dost Solutions

arXiv Paper Scraper

lulzasaur/arxiv-scraper

Search and scrape arXiv academic papers. Get titles, authors, abstracts, categories, PDF links, DOIs. Search by keyword, browse recent papers by category, or fetch by arXiv ID.

lulz bot

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

ArXiv Academic Paper Scraper

fortuitous_pirate/arxiv-scraper

Scrape academic papers from ArXiv. Extract titles, authors, abstracts, categories, and PDF links. Essential for research and literature reviews.

Fortuitous Pirate

arXiv Paper Scraper — Search Academic Papers & Abstracts

puskin/arxiv-scraper

Search and retrieve academic papers from arXiv by keyword, author, or category. Extracts titles, authors, abstracts, and download links via the free arXiv API — no authentication needed.

Giovanni Bucci

arXiv Papers Monitor for Research Alerts

TL;DR

What does arXiv Papers Monitor do?

Why scrape arXiv?

Who needs this?

How to use arXiv Papers Monitor

How much will scraping arXiv cost?

Is it legal to scrape arXiv?

Examples

1. Daily new cs.AI papers

2. RAG-themed papers from the last 30 days

3. NLP + ML cross-listed papers

4. Specific lab tracking via author keyword in title

5. Diffusion-model survey

6. Math optimization for ML

7. Computational neuroscience

8. Title-only feed for fast indexing

Input parameters

arXiv output format

arxiv_paper record

Sample record

During the actor run

FAQ

How is this different from arXiv's free API?

What about HuggingFace papers or PapersWithCode?

Can I track only papers from specific universities or labs?

How does watchlist mode work?

Can I use this with Python?

Can I integrate with Make / Zapier / n8n / Slack?

Why does this cost more than free arXiv scrapers?

What rate limits should I worry about?

Does this download the full PDF?

Why choose arXiv Papers Monitor

Your feedback

Other Skootle actors you might want to check

Support and contact

You might also like

ArXiv Papers Scraper — Research Paper API

arXiv Paper Scraper

arXiv Paper Scraper

arXiv Research Paper Scraper

arXiv Paper Scraper - AI ML Research Papers

arXiv Paper Scraper

arXiv Paper Scraper

arXiv Paper Scraper

ArXiv Academic Paper Scraper

arXiv Paper Scraper — Search Academic Papers & Abstracts

`arxiv_paper` record