arXiv Scraper avatar
arXiv Scraper

Pricing

from $0.70 / 1,000 results

Go to Apify Store
arXiv Scraper

arXiv Scraper

Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.

Pricing

from $0.70 / 1,000 results

Rating

0.0

(0)

Developer

Artificially

Artificially

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

5 days ago

Last modified

Share

arXiv Papers Scraper - Enhanced

Search and extract academic papers from arXiv.org with citation analysis, author profiles, and impact metrics via Semantic Scholar integration.

Features

  • Full-text Search: Search across all arXiv papers
  • Category Filtering: Filter by arXiv category (cs.AI, physics, math, etc.)
  • Sorting Options: Sort by relevance, submission date, or update date
  • Complete Metadata: Title, authors, abstract, categories, dates

Citation Analysis (NEW)

  • Citation Counts: Total citations from Semantic Scholar
  • Influential Citations: Citations that significantly impacted the field
  • Citation Velocity: Recent citation momentum
  • Citations Per Year: Historical citation distribution
  • Highly Influential Flag: Identify breakthrough papers

Author Profiles (NEW)

  • h-Index: Author's impact metric
  • Total Citations: Lifetime citation count
  • Paper Count: Publication volume
  • Affiliations: Current institutional affiliations
  • Semantic Scholar Links: Direct profile links
  • References: Papers cited by each result
  • Related Papers: AI-recommended similar papers
  • Venue Information: Publication venue if applicable
  • Fields of Study: Semantic Scholar topic classification

Impact Scoring (NEW)

  • Calculated Impact Score: Combined metric considering citations, author h-index, and momentum
  • Results sorted by impact: Most influential papers first

Use Cases

  • Build research paper datasets with citation metrics
  • Identify high-impact papers in your field
  • Find influential authors and their work
  • Track citation trends over time
  • Literature review with impact analysis
  • Research team evaluation

Input

FieldTypeRequiredDefaultDescription
searchQuerystringYes-Search terms
categorystringNo-arXiv category filter
maxPapersnumberNo100Maximum papers
sortBystringNosubmittedDateSort order
includeCitationsbooleanNotrueFetch citation metrics
includeAuthorProfilesbooleanNotrueFetch author h-index and stats
includeReferencesbooleanNofalseFetch paper bibliography
maxReferencesnumberNo10References per paper
includeRelatedPapersbooleanNofalseFetch similar papers
maxRelatedPapersnumberNo5Related papers per result

Example Input

{
"searchQuery": "large language models",
"category": "cs.CL",
"maxPapers": 50,
"includeCitations": true,
"includeAuthorProfiles": true,
"includeRelatedPapers": true,
"sortBy": "submittedDate"
}

Output

Each paper produces a result with:

{
"arxivId": "2401.12345",
"title": "Advances in Large Language Models: A Survey",
"authors": ["John Smith", "Jane Doe"],
"authorProfiles": [
{
"name": "John Smith",
"authorId": "12345678",
"hIndex": 45,
"citationCount": 15000,
"paperCount": 120,
"affiliations": ["Stanford University"],
"url": "https://www.semanticscholar.org/author/12345678"
}
],
"abstract": "This paper surveys recent advances...",
"categories": ["cs.CL", "cs.AI"],
"categoryDescriptions": ["Computation and Language (NLP)", "Artificial Intelligence"],
"citations": {
"totalCitations": 1250,
"influentialCitations": 89,
"citationVelocity": 125.5,
"citationsPerYear": {
"2023": 450,
"2024": 800
},
"isHighlyInfluential": true
},
"references": [
{
"title": "Attention Is All You Need",
"authors": ["Ashish Vaswani"],
"citationCount": 75000,
"arxivId": "1706.03762"
}
],
"relatedPapers": [
{
"title": "GPT-4 Technical Report",
"citationCount": 5000,
"url": "https://arxiv.org/abs/2303.08774"
}
],
"impactScore": 85.3,
"venue": "NeurIPS 2024",
"fieldsOfStudy": ["Computer Science", "Linguistics"],
"pdfUrl": "https://arxiv.org/pdf/2401.12345.pdf",
"arxivUrl": "https://arxiv.org/abs/2401.12345",
"scrapedAt": "2024-01-20T12:00:00Z"
}

Cost

This actor uses pay-per-result pricing:

Cost TypeAmount
Start fee$0.05 per run
Per paper$0.001

No API key required - Uses arXiv and Semantic Scholar public APIs.

Example Cost Calculation

  • 100 papers: $0.05 + (100 x $0.001) = $0.15
  • 1,000 papers: $0.05 + (1000 x $0.001) = $1.05

Tips

  1. Impact sorting: Results are automatically sorted by calculated impact score

  2. Highly influential papers: Look for isHighlyInfluential: true for breakthrough papers

  3. Author quality: Check author h-index to identify papers from established researchers

  4. Citation velocity: High velocity indicates trending/hot papers

  5. Related papers: Enable includeRelatedPapers for comprehensive literature discovery

Rate Limits

  • arXiv: 3-second delay between requests (handled automatically)
  • Semantic Scholar: 1-second delay (handled automatically)

Support

  • Built by: Artificially
  • Issues: Report bugs or request features via Apify Console