Pricing

$1.00 / 1,000 result items

arXiv Scraper: Papers, Authors, Categories & Search

Scrape arxiv.org via the official Atom API. Full-text search, by author / title / category, paper detail by id, latest in any category. Returns title, abstract, authors, DOI, PDF link. No auth, no proxies. Pay only per result item.

Pricing

$1.00 / 1,000 result items

Rating

0.0

(0)

Developer

Perconey

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What does arXiv Scraper do?

arXiv Scraper pulls research papers from arxiv.org via the official Atom API. Latest papers in any category, free-text search, by author / title / id, with full title, abstract, authors, DOI, journal reference, PDF link. arxiv.org is the canonical preprint server for AI / ML / CS / math / physics / quant-bio - over 2.4 million papers. The actor calls the documented public API directly: no browser, no proxies, no auth.

Try it instantly: pick getLatestPapers, leave category cs.AI, click Start. You get the 30 newest AI papers (title, abstract, authors, PDF link) in under 5 seconds for $0.03.

Why use arXiv Scraper?

AI / ML researchers: Daily digest of new papers in your category. Schedule getLatestPapers for cs.AI / cs.CL / cs.LG and never miss a release.
Trend analysts: Track which sub-fields are accelerating. Combine getPapersByCategory with sortBy=submittedDate to see week-over-week paper-count deltas.
Recruiters / scouts: getPapersByAuthor returns everything a researcher published, with publication dates and co-authors. Ideal for hiring pipelines.
Content marketers in tech: Pull abstracts of trending papers and remix into blog content / newsletters. The summary field is rich and license-friendly.
AI agent developers: Wire the actor into your knowledge pipeline so your agent always has the latest research summaries to ground on.
Academic librarians: Bulk-export your institution's authors. The actor paginates politely (3 s between batches per arXiv guidelines) so multi-thousand-result exports are safe.

How to use arXiv Scraper

Open the Input tab.
Pick an action from the dropdown. getLatestPapers is the simplest starting point.
For getLatestPapers, set category (default cs.AI). Use any arXiv category code like cs.CL, cs.LG, stat.ML, math.OC, q-bio.QM.
For search / by-author / by-title / by-category / paper-detail actions, fill queries.
Tune maxItems (default 30).
Click Start.

Query format by action

Action	Query format
getLatestPapers	leave empty (use category field)
searchPapers	free-text (e.g. `large language model`)
getPapersByAuthor	author surname (e.g. `Bengio`, `LeCun`, `Hinton`)
getPapersByCategory	arXiv category code (e.g. `cs.AI`)
getPapersByTitle	exact title phrase (e.g. `attention is all you need`)
getPaperDetail	arXiv id (e.g. `2501.00001` or `2501.00001v2`)

Input

Field	Required	Description
`action`	yes	Which lookup. Six options.
`queries`	sometimes	Required for all actions except getLatestPapers.
`category`	no	getLatestPapers only. arXiv category code. Default `cs.AI`.
`maxItems`	no	Max items per query. Default 30. arXiv API caps a single call at 30,000 - we paginate in batches of 100 with the recommended 3 s delay.
`sortBy`	no	`submittedDate` (default), `relevance`, or `lastUpdatedDate`.

Output

Every item carries _type=paper (or error) plus _action.

{
    "_type": "paper",
    "_action": "getLatestPapers",
    "arxiv_id": "2501.00001v1",
    "version": 1,
    "title": "Toward Foundation Models for Cell-Level Biology",
    "summary": "We present a new family of foundation models for single-cell genomics ...",
    "authors": ["Jane Doe", "John Smith", "Alex Researcher"],
    "author_count": 3,
    "categories": ["q-bio.QM", "cs.LG"],
    "primary_category": "q-bio.QM",
    "published": "2026-01-02T15:30:00Z",
    "updated":   "2026-01-08T09:12:00Z",
    "doi": null,
    "journal_ref": null,
    "comment": "https://github.com/lab/foundation-cells",
    "pdf_url": "https://arxiv.org/pdf/2501.00001v1",
    "abs_url": "https://arxiv.org/abs/2501.00001v1"
}

You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab.

Data fields

Type	Key fields
`paper`	arxiv_id, version, title, summary, authors, author_count, categories, primary_category, published, updated, doi, journal_ref, comment, pdf_url, abs_url

Pricing

Pay-per-result: $0.001 per paper. No flat monthly fee.

Cost examples:

Daily 30 newest cs.AI papers: $0.03
1,000 papers by an author: $1.00
5,000 cs.CL papers from the last year for a literature review: $5.00
One paper detail lookup: $0.001

Tips

Proxy is enabled by default. arxiv aggressively rate-limits per outbound IP and the Apify cloud egress pool is shared across many users - hitting arxiv from a single IP gets you a 429 within seconds. The actor uses the Apify proxy by default to rotate IPs per request. Disable via proxyConfiguration.useApifyProxy: false only if you're sure of your own IP.
Pagination is rate-limited. arXiv asks for 3 s between requests, so 30,000 papers take ~15 minutes wall-clock minimum. Plan timeouts accordingly.
Category codes are case-sensitive. Use the arXiv taxonomy: https://arxiv.org/category_taxonomy. Common ones: cs.AI, cs.CL (NLP), cs.CV (Vision), cs.LG (ML), stat.ML.
Author search matches surnames. Bengio returns Yoshua + Samy + others. Use full names with quotes for disambiguation: "Yoshua Bengio".
Comment field often has GitHub links. arxiv:comment is where authors typically paste their code-repo URL. Useful for crawling implementations.
Versions matter. A paper id like 2501.00001 returns the latest version. Pin to a specific revision with 2501.00001v2.

FAQ, disclaimers, support

Is this legal? The actor calls arxiv.org's official documented public API, identifies itself with a clear User-Agent, and honors the recommended 3 s inter-request delay. arXiv explicitly supports automated access.

Why is pagination slow? arXiv asks API clients to wait 3 s between requests. We honor that. For large pulls, schedule the actor overnight.

What about citation counts? arXiv does not expose citation counts via its API. For citation metrics you would need Semantic Scholar or Google Scholar (no public API). Open an issue if this matters for your use case.

What about the full paper text? The actor returns the abstract plus a PDF link. To get the full text, download the PDF via the pdf_url field.

Bug or feature request? Open an Issue on the actor's Issues tab. I usually respond within a day.

Need a scraper for Hacker News, Stack Overflow, dev.to, Lemmy, Mastodon, Bluesky, Substack? See my other actors at https://apify.com/perconey.

arXiv Paper Scraper — AI Research, Abstracts & PDF Links

bovi/arxiv-scraper

Search arXiv papers by keyword, ID list, or category. Returns title, authors, abstract, categories, PDF URL, DOI, publish dates, and parse_confidence. Official Atom XML API — no proxy, no auth. Pay per result.

Vitalii Bondarev

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

Crawler Bros

arXiv Papers Scraper

troy_007/arxiv-papers-scraper

Search and export arXiv research papers by query, category, or author — title, abstract, authors, categories, dates, PDF link, and DOI. Uses the official arXiv API.

Pathik Shah

arXiv Papers Scraper

crawlerbros/arxiv-papers-scraper

Scrape academic preprints from arXiv.org by keyword, author, or category. Returns clean records with title, authors, abstract, categories, PDF URL, DOI. HTTP-only via the public arXiv API. No login, no proxy.

Crawler Bros

arXiv Papers Scraper: AI & Science Research Tracker

scrapemint/arxiv-papers-scraper

Track new research papers on arXiv by keyword, category, or author. One clean JSON row per paper: title, abstract, authors, categories, dates, PDF link, and DOI. Official open API, no key, no browser. Pay per paper.

Ken M

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

Monkey Coder

Arxiv Papers Scraper

chimerical_quicklime/arxiv-papers-scraper

Search arXiv preprints via the public Atom API. Returns title, authors, abstract, categories, published date, updated date, DOI, journal reference, and PDF link. Filter by category, author, or keyword.

Khrystyna Skotte

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Daniel

arXiv Papers Scraper — AI & Research by Keyword or Category

hichemdev/arxiv-papers-scraper

Scrape arXiv research papers by keyword or category: title, authors, abstract, dates, categories, DOI and PDF link. Perfect for tracking AI/ML research.

Hichem Ben Moussa

arXiv Scraper

dami_studio/arxiv-scraper

Search arXiv via the official API and return structured paper metadata as JSON: title, abstract, authors, categories, DOI, dates, and abstract + PDF links. Best for literature reviews.