arXiv Papers Scraper with AI Topic Tags avatar

arXiv Papers Scraper with AI Topic Tags

Pricing

Pay per usage

Go to Apify Store
arXiv Papers Scraper with AI Topic Tags

arXiv Papers Scraper with AI Topic Tags

Search arXiv.org for academic papers by keyword, author, or category. Get clean structured data with optional AI topic tagging via Claude. Perfect for literature reviews, research monitoring, and academic datasets.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Andrei

Andrei

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

arXiv Papers Scraper with AI Tags

Search arXiv.org for academic papers by keyword, author, or category. Get clean structured data with optional AI-powered topic tagging via Claude. Perfect for literature reviews, research monitoring, and building academic datasets.

What this actor does

arXiv has 2M+ papers but their search interface is clunky and there's no direct way to export results. This actor solves that:

  • Full arXiv search syntax — search by keyword, title, abstract, authors, or category
  • Category filter — restrict to specific fields (cs.AI, math.PR, physics.bio-ph, etc.)
  • AI topic tagging — Claude reads each abstract and assigns 3-5 relevant tags (optional, BYOK)
  • Citation extraction — pulls cited references from paper metadata when available
  • Retry logic — handles arXiv API rate limits and transient errors gracefully

Quick start

Just search for something:

{
"searchQuery": "transformer attention mechanism",
"maxResults": 20
}

That's it. The actor will return up to 20 papers matching your query with full metadata.

Input fields

  • searchQuery (required) — Search terms (keyword, author, title, or arXiv ID)
  • category — Filter by arXiv category code (cs.AI, math.ST, etc., leave empty for all)
  • maxResults — Number of papers to fetch (default 50, max 1000)
  • sortBy — Sort by relevance, lastUpdatedDate, or submittedDate (default relevance)
  • enableAiTags — Generate AI topic tags for each paper (default false)
  • anthropicApiKey — Your Anthropic API key (BYOK, required if AI tags enabled)
  • extractCitations — Pull cited references metadata when available (default true)

Output format

Each item in the dataset:

{
"id": "2412.01234",
"title": "Attention Is All You Need: A Survey",
"authors": ["Vaswani A.", "Shazeer N."],
"abstract": "The dominant sequence transduction models...",
"publishedDate": "2024-12-01",
"updatedDate": "2024-12-15",
"pdfUrl": "https://arxiv.org/pdf/2412.01234.pdf",
"absUrl": "https://arxiv.org/abs/2412.01234",
"categories": ["cs.LG", "cs.AI"],
"primaryCategory": "cs.LG",
"doi": "10.xxxx/yyyy",
"comment": "Accepted at NeurIPS 2024",
"journalRef": null,
"aiTags": ["transformer architecture", "attention mechanism", "survey paper"],
"citationCount": 12
}

Field aiTags appears only with AI tagging enabled.

Use cases

Literature review — Pull all papers on your research topic from the last 6 months in one query.

Research monitoring — Schedule daily runs to track new arXiv submissions in your field.

Dataset building — Collect abstracts and metadata for training NLP models on academic text.

Trend analysis — Aggregate AI tags across thousands of papers to spot emerging research topics.

Citation tracking — Build citation graphs from extracted references for bibliometric studies.

Technical notes

  • Uses arXiv's official Atom API — fully ToS-compliant, no scraping
  • Automatic retry with exponential backoff for rate limits (arXiv allows ~3 req/sec)
  • AI tagging uses Claude Haiku 4.5 (fast and cheap, ~$0.001 per paper)
  • All abstracts and metadata are public domain (arXiv license)
  • Citation extraction works only for papers with structured reference metadata

Pricing

Currently free during early access. Pay-per-paper pricing will be enabled later.

Support

Found a bug? Have feature requests? Contact the developer through the actor's page on Apify.