Deprecated

Pricing

Pay per usage

See alternative Actors

Go to Apify Store

arXiv Paper & Author Scraper

Deprecated

See alternative Actors

Extract academic papers, abstracts, and author details from arXiv using the official API. Ideal for research monitoring, literature reviews, and building academic datasets.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Automly

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Why use this actor?

Official API reliability — Uses the arXiv export API for stable, structured data without scraping complexity.
Research monitoring — Track new papers in specific fields or by keyword.
Literature reviews — Collect abstracts, authors, and categories for systematic analysis.
Academic lead generation — Build lists of researchers and their affiliations by topic.
RAG & AI pipelines — Feed paper abstracts and metadata into vector databases for semantic search.

Features

Search papers by free-text query or arXiv category codes
Filter by date range (last week, last month, last year, or custom range)
Sort by relevance, submission date, or last updated date
Extract full abstracts and author lists with affiliations
Output authors as separate records for easy analysis
Respects arXiv polite usage policy with built-in rate limiting

Input

Field	Type	Default	Description
searchQuery	string	—	arXiv search query, e.g. `machine learning` or `cat:cs.AI`
categories	array	—	List of arXiv category codes, e.g. `["cs.AI", "cs.LG"]`
dateRange	string	—	`lastWeek`, `lastMonth`, `lastYear`, or `YYYY-MM-DD TO YYYY-MM-DD`
maxResults	integer	100	Maximum papers to return (1–500)
extractAuthors	boolean	true	Include author records as separate rows
extractAbstract	boolean	true	Include paper abstracts
sortBy	string	relevance	`relevance`, `lastUpdatedDate`, or `submittedDate`
sortOrder	string	descending	`ascending` or `descending`

Example input

{
  "searchQuery": "large language models",
  "categories": ["cs.CL", "cs.AI"],
  "dateRange": "lastMonth",
  "maxResults": 50,
  "extractAuthors": true,
  "sortBy": "submittedDate",
  "sortOrder": "descending"
}

Output

Each record includes a type field to distinguish entities.

Paper

Field	Type	Description
type	string	`paper`
arxivId	string	arXiv identifier
url	string	arXiv abstract page URL
pdfUrl	string	Direct PDF URL
title	string	Paper title
abstract	string	Paper abstract
publishedAt	string	ISO 8601 submission date
updatedAt	string	ISO 8601 last update date
authors	array	List of `{name, affiliation}` objects
categories	array	arXiv category codes
primaryCategory	string	Primary arXiv category

Author

Field	Type	Description
type	string	`author`
arxivId	string	Associated paper identifier
paperTitle	string	Associated paper title
name	string	Author name
affiliation	string	Author affiliation

Limits and caveats

arXiv API returns up to 100 results per request; the actor paginates automatically.
A 3-second delay is enforced between requests to respect arXiv's polite usage policy.
Only publicly available papers are returned.
Author affiliations are only available when provided by the submitter.

Pricing

This actor uses Pay Per Event pricing. You are charged only for successfully extracted data.

Event	Price	Description
Paper scraped	$0.003	Each paper successfully extracted
Author scraped	$0.001	Each author record successfully extracted

Tiered discounts apply based on your Apify subscription level. A small actor-start fee may also apply.

FAQ

Do I need an arXiv account? No. The arXiv API is completely open and requires no authentication.

Can I download the full PDF? The actor returns direct PDF URLs in the pdfUrl field. You can download them separately.

What categories are available? arXiv uses codes like cs.AI (Artificial Intelligence), cs.LG (Machine Learning), cs.CL (Computation and Language), physics.gen-ph, math.ST, etc. See the full list at arxiv.org.

How recent is the data? Data reflects the current arXiv index at the time of the run. New papers are typically available within minutes of submission.

arXiv Scraper: Papers, Authors, Categories & Search

perconey/arxiv-scraper

Scrape arxiv.org via the official Atom API. Full-text search, by author / title / category, paper detail by id, latest in any category. Returns title, abstract, authors, DOI, PDF link. No auth, no proxies. Pay only per result item.

Perconey

arXiv Paper Scraper — Abstracts, Authors & Metadata

logiover/arxiv-paper-scraper

Scrape research paper metadata from arXiv.org the worlds largest open-access repository. Search by keyword across computer science physics mathematics biology. Returns titles abstracts authors categories PDF links and DOIs. No API key required.

Logiover

arXiv Paper Scraper — Citations, Authors, ORCID, Analytics

brilliant_gum/arxiv-scraper

Scrape academic papers from arXiv via the official Atom API. Filter by category, date, query, or author. Includes citation data, ORCID IDs from Semantic Scholar, citation network graph, and built-in analytics (authors, categories, timeline). Four output formats. Proxies included.

Yuliia Kulakova

arXiv Search & Paper Scraper

scrapeworks/arxiv-search

Search arXiv and get clean structured JSON for each paper: title, authors, abstract, categories, DOI, PDF link, and dates. Built for research, datasets, and AI pipelines.

Nicolas van Arkens

ArXiv Research Paper Scraper

datapilot/arxiv-research-paper-scraper

arXiv Research Paper Scraper retrieves academic paper metadata from the arXiv API based on a keyword. It extracts titles, abstracts, authors with affiliations, DOI, categories, submission dates, and PDF links. Supports proxy usage and outputs structured JSON results for research and data analysis.

Data Pilot

arXiv Paper Scraper — Search Academic Papers & Abstracts

puskin/arxiv-scraper

Search and retrieve academic papers from arXiv by keyword, author, or category. Extracts titles, authors, abstracts, and download links via the free arXiv API — no authentication needed.

Giovanni Bucci

arXiv Scraper

artificially/arxiv-scraper

Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.

Artificially

Arxiv Paper Intelligence

viralanalyzer/arxiv-paper-intelligence

Search and extract ArXiv papers, abstracts, authors, and citations. Track research trends across any scientific field. AI-powered analysis.

viralanalyzer

5.0

(3)

URL to BibTeX Converter

crawlerbros/url-to-bibtex-converter

Convert any URL (academic papers, articles, books, web pages) to properly formatted BibTeX citations. Automatically extracts metadata from arXiv, PubMed, IEEE, ACM, and general web pages. Supports multiple citation types.

Crawler Bros

5.0

(2)

arXiv Papers Scraper Pro — Research Papers, Authors, Citations

diverse_venture/arxiv-papers-scraper

Search and scrape arXiv research papers. Returns titles, abstracts, authors, categories, DOIs, and PDF download links. Filter by keywords (cat:cs.LG, all:transformer, au:author_name). Up to 500 papers per run. No auth required. Ideal for AI researchers and academic data mining.