Pricing

from $2.50 / 1,000 results

arXiv Scraper

[💰 $2.5 / 1K] Search arXiv and extract paper metadata — titles, authors, abstracts, subject categories, DOIs, journal references, submission dates, and PDF links. Search by keyword, title, author, or category, or fetch specific papers by arXiv ID.

Pricing

from $2.50 / 1,000 results

Rating

0.0

(0)

Developer

SolidCode

Actor stats

Bookmarked

Total users

Monthly active users

23 days ago

Last modified

Why This Scraper?

~40 subject categories across 8 disciplines — pick from a labeled list spanning computer science, statistics, mathematics, physics, quantitative biology, quantitative finance, economics, and electrical engineering. Select cs.LG, stat.ML, math.PR, quant-ph and more with a checkbox — no codes to memorize.
Field-specific search, not just keywords — match words in the title, the author name, or the abstract as separate inputs, then combine them. Find "transformer" in the title by Vaswani in the cs.CL category in one run.
Direct arXiv-ID lookup, including legacy IDs — paste a list of IDs to fetch exact papers. Handles both modern (2310.06825) and legacy slash-style (cond-mat/0011267) identifiers, so decades-old preprints come back just as cleanly as last week's.
Full author affiliations — every author arrives as a structured record with name and institutional affiliation when the paper lists one, ready for co-author and institution analysis.
DOI and journal reference for published-version cross-linking — when authors register a DOI or cite the published venue, both fields land in the row, letting you join preprints to their peer-reviewed counterparts.
Direct PDF and abstract-page links on every paper — a pdfUrl for the full text and an absUrl for the human-readable landing page, so downstream tools can fetch or link without rebuilding URLs.
Sort by relevance, submission date, or last-updated date — newest-first or oldest-first, so you can surface the freshest preprints or build a chronological corpus.
Up to 50,000 papers per run — set the result cap to zero to sweep an entire topic, with a built-in safety ceiling so a broad query never runs away.

Use Cases

Academic Literature & Systematic Review

Assemble a complete reading list for a topic, sorted by relevance or recency
Narrow a survey to a single subject category to cut cross-field noise
Pull every preprint by a specific author for a focused author study
Track the latest submissions in a field by sorting on submission date

Research Trend & Citation Analysis

Measure publication volume in an emerging sub-field over time
Map which institutions are most active via author affiliations
Detect bursts of activity by sweeping recent submissions in a category
Build a chronological corpus to chart how terminology shifts year over year

Competitive R&D Intelligence

Monitor what a competing lab or research group is publishing on a topic
Benchmark output across institutions using affiliation data
Spot new directions before they reach peer-reviewed journals
Watch a category daily for the newest preprints in your space

ML & AI Dataset Building

Harvest abstracts at scale to train or fine-tune domain models
Build a labeled corpus by subject category for classification tasks
Collect title-abstract pairs for summarization and retrieval datasets
Gather a topic-specific text set for embeddings and semantic search

Bibliographic Database Enrichment

Cross-reference preprints to published versions via DOI and journal reference
Fill in missing abstracts, categories, and dates in an existing catalog
Resolve legacy slash-style IDs to current metadata
Enrich a reference manager export with affiliations and revision dates

Grant & Patent Prior-Art Search

Surface the earliest preprints describing a technique for prior-art review
Document the state of the art in a field for a grant proposal
Trace an idea back to its first submission date on arXiv
Compile a dated evidence trail across multiple subject categories

Getting Started

Basic Keyword Search

The simplest possible run — one topic, 50 papers:

{
    "searchQuery": "large language models",
    "maxResults": 50
}

Field-Specific Search by Category

Find recent computer-vision papers whose title mentions diffusion, newest first:

{
    "title": "diffusion",
    "categories": ["cs.CV", "cs.LG"],
    "sortBy": "submittedDate",
    "sortOrder": "descending",
    "maxResults": 200
}

Fetch Specific Papers by ID

Pull exact papers — modern and legacy IDs together — ignoring all search fields:

{
    "arxivIds": ["2310.06825", "1706.03762", "cond-mat/0011267"]
}

Author and Abstract Search Combined

Every author preprint mentioning reinforcement learning in the abstract:

{
    "author": "Yann LeCun",
    "abstract": "reinforcement learning",
    "categories": ["cs.AI", "cs.LG", "stat.ML"],
    "sortBy": "lastUpdatedDate",
    "maxResults": 500
}

Input Reference

Search

Combine any of these fields, or paste arXiv IDs to fetch exact papers.

Parameter	Type	Default	Description
`searchQuery`	string	`"large language models"`	Free-text search across the whole paper (title, abstract, authors). Advanced users can use field prefixes like `ti:`, `au:`, `abs:`, `cat:` and boolean operators.
`title`	string	null	Only include papers whose title contains these words.
`author`	string	null	Only include papers by this author (e.g. "Yann LeCun" or "Hinton").
`abstract`	string	null	Only include papers whose abstract contains these words.
`categories`	array	`[]`	Restrict results to selected arXiv subject areas. Choose from ~40 labeled categories across 8 disciplines; leave empty to search all subjects.
`arxivIds`	array	`[]`	Fetch specific papers by arXiv ID (e.g. `2310.06825` or legacy `cond-mat/0011267`). When set, the search fields above are ignored.

Results

Parameter	Type	Default	Description
`maxResults`	integer	`50`	Maximum papers to return. Set to `0` to fetch all matches, with a safety cap of 50,000 so very broad searches don't run indefinitely. Ignored when fetching by ID.
`sortBy`	select	`Relevance`	Order results by Relevance, Submission date, or Last updated date.
`sortOrder`	select	`Newest first (descending)`	Newest first (descending) or Oldest first (ascending). Most useful when sorting by date.

Output

Each paper is one flat row in the dataset. Here is a representative result:

{
    "arxivId": "1706.03762",
    "version": 7,
    "title": "Attention Is All You Need",
    "authors": [
        { "name": "Ashish Vaswani", "affiliation": "Google Brain" },
        { "name": "Noam Shazeer", "affiliation": "Google Brain" },
        { "name": "Niki Parmar", "affiliation": "Google Research" }
    ],
    "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder...",
    "primaryCategory": "cs.CL",
    "categories": ["cs.CL", "cs.LG"],
    "publishedDate": "2017-06-12T17:57:34Z",
    "updatedDate": "2023-08-02T00:41:18Z",
    "doi": "10.48550/arXiv.1706.03762",
    "journalRef": "Advances in Neural Information Processing Systems 30 (2017)",
    "comments": "15 pages, 5 figures",
    "pdfUrl": "https://arxiv.org/pdf/1706.03762v7",
    "absUrl": "https://arxiv.org/abs/1706.03762v7"
}

Core Fields

Field	Type	Description
`title`	string	Paper title, whitespace-normalized
`authors`	object[]	One record per author: `{ name, affiliation }` (affiliation included when the paper lists it)
`abstract`	string	Full abstract text
`primaryCategory`	string	Primary arXiv subject code (e.g. `cs.CL`)
`categories`	string[]	All subject codes on the paper
`comments`	string\|null	Author comments (e.g. "15 pages, 5 figures")

Identifiers & Cross-References

Field	Type	Description
`arxivId`	string	arXiv identifier without version (e.g. `1706.03762`)
`version`	integer	Version number (`v7` → `7`)
`doi`	string\|null	DOI when the authors registered one
`journalRef`	string\|null	Journal reference / citation when the paper is published

Dates & Links

Field	Type	Description
`publishedDate`	string	First-submitted timestamp (ISO 8601)
`updatedDate`	string	Last-updated timestamp (ISO 8601)
`pdfUrl`	string	Direct link to the full-text PDF
`absUrl`	string	Link to the arXiv abstract landing page

Tips for Best Results

Use field prefixes for precision. In searchQuery you can write ti:transformer to match only titles or cat:cs.CL to scope a subject — power users can build advanced boolean queries like ti:transformer AND abs:translation in a single field.
Narrow by category to cut noise. A broad term like "networks" spans biology, physics, and computer science. Selecting one or two subject categories sharpens results dramatically and lowers your result count.
Sort by submission date for the freshest preprints. Set sortBy to Submission date with Newest first to surface the very latest work in a field — ideal for daily monitoring and trend tracking.
Fetch by ID when you know exactly what you want. Pasting arXiv IDs is the fastest, most precise path — it skips search entirely and returns those exact papers, legacy slash-style IDs included.
Start small, then scale. Run with maxResults of 25–50 to confirm the data matches your needs, then raise the cap or set it to 0 to sweep a whole topic.
Keep DOI and journal reference for cross-linking. When present, these fields let you match a preprint to its peer-reviewed version — invaluable for bibliographic enrichment and citation work.
Combine title, author, and abstract for laser-focused queries. The three field inputs are AND-joined, so a name in author plus a phrase in abstract returns only papers that satisfy both.

Pricing

From $2.50 per 1,000 results — a flat per-result rate that undercuts comparable arXiv extractors, with no hidden surcharges. Bronze, Silver, and Gold subscribers pay progressively less; the table below shows total cost at each discount tier.

Results	No discount	Bronze	Silver	Gold
100	$0.30	$0.28	$0.265	$0.25
1,000	$3.00	$2.80	$2.65	$2.50
10,000	$30.00	$28.00	$26.50	$25.00
100,000	$300.00	$280.00	$265.00	$250.00

A "result" is any paper row in the output dataset. No compute or time-based charges — you pay per result, plus a small fixed per-run start fee.

Integrations

Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:

Zapier / Make / n8n — Workflow automation
Google Sheets — Direct spreadsheet export
Slack / Email — Notifications on new results
Webhooks — Trigger custom workflows on run completion
Apify API — Full programmatic access

Legal & Ethical Use

arXiv content is openly accessible, and this actor is designed for legitimate academic research, literature review, bibliometrics, and dataset building. Each paper on arXiv is distributed under its own license chosen by the authors — respect those individual licenses when reusing abstracts or full text. Users are responsible for complying with applicable laws and arXiv's terms of use, including making reasonable-rate requests. Do not use extracted data for spam, harassment, or any illegal purpose.

arXiv Paper Scraper

lulzasaur/arxiv-scraper

Search and scrape arXiv academic papers. Get titles, authors, abstracts, categories, PDF links, DOIs. Search by keyword, browse recent papers by category, or fetch by arXiv ID.

lulz bot

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Daniel

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

Monkey Coder

arXiv Research Paper Scraper

techionik9993/arxiv-research-paper-scraper

Scrape arXiv papers by keyword or category and return research titles, abstracts, authors, dates, links, and topic signals.

Techionik

arXiv Papers Scraper

resounding_diplomacy/arxiv-papers-scraper

Scrape academic papers from arXiv by category, keyword, or author. Extract titles, authors, abstracts, PDF URLs, DOIs, categories, and more. Perfect for AI/ML research datasets.

alars num

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.

Crawler Bros

arXiv Paper Scraper — Search Academic Papers & Abstracts

puskin/arxiv-scraper

Search and retrieve academic papers from arXiv by keyword, author, or category. Extracts titles, authors, abstracts, and download links via the free arXiv API — no authentication needed.

Giovanni Bucci

arXiv Paper Search Scraper

fetch_cat/arxiv-paper-search-scraper

Search arXiv papers by keyword, author, category, and date using public paper metadata.

Hanna Nosova

arXiv Papers Scraper

troy_007/arxiv-papers-scraper

Search and export arXiv research papers by query, category, or author — title, abstract, authors, categories, dates, PDF link, and DOI. Uses the official arXiv API.