Deprecated

Pricing

from $0.70 / 1,000 results

See alternative Actors

Go to Apify Store

arXiv Scraper

Deprecated

See alternative Actors

Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.

Pricing

from $0.70 / 1,000 results

Rating

0.0

(0)

Developer

Artificially

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

arXiv Papers Scraper - Enhanced

Search and extract academic papers from arXiv.org with citation analysis, author profiles, and impact metrics via Semantic Scholar integration.

Features

Core Search

Full-text Search: Search across all arXiv papers
Category Filtering: Filter by arXiv category (cs.AI, physics, math, etc.)
Sorting Options: Sort by relevance, submission date, or update date
Complete Metadata: Title, authors, abstract, categories, dates

Citation Analysis (NEW)

Citation Counts: Total citations from Semantic Scholar
Influential Citations: Citations that significantly impacted the field
Citation Velocity: Recent citation momentum
Citations Per Year: Historical citation distribution
Highly Influential Flag: Identify breakthrough papers

Author Profiles (NEW)

h-Index: Author's impact metric
Total Citations: Lifetime citation count
Paper Count: Publication volume
Affiliations: Current institutional affiliations
Semantic Scholar Links: Direct profile links

References: Papers cited by each result
Related Papers: AI-recommended similar papers
Venue Information: Publication venue if applicable
Fields of Study: Semantic Scholar topic classification

Impact Scoring (NEW)

Calculated Impact Score: Combined metric considering citations, author h-index, and momentum
Results sorted by impact: Most influential papers first

Use Cases

Build research paper datasets with citation metrics
Identify high-impact papers in your field
Find influential authors and their work
Track citation trends over time
Literature review with impact analysis
Research team evaluation

Input

Field	Type	Required	Default	Description
`searchQuery`	string	Yes	-	Search terms
`category`	string	No	-	arXiv category filter
`maxPapers`	number	No	`100`	Maximum papers
`sortBy`	string	No	`submittedDate`	Sort order
`includeCitations`	boolean	No	`true`	Fetch citation metrics
`includeAuthorProfiles`	boolean	No	`true`	Fetch author h-index and stats
`includeReferences`	boolean	No	`false`	Fetch paper bibliography
`maxReferences`	number	No	`10`	References per paper
`includeRelatedPapers`	boolean	No	`false`	Fetch similar papers
`maxRelatedPapers`	number	No	`5`	Related papers per result

Example Input

{
    "searchQuery": "large language models",
    "category": "cs.CL",
    "maxPapers": 50,
    "includeCitations": true,
    "includeAuthorProfiles": true,
    "includeRelatedPapers": true,
    "sortBy": "submittedDate"
}

Output

Each paper produces a result with:

{
    "arxivId": "2401.12345",
    "title": "Advances in Large Language Models: A Survey",
    "authors": ["John Smith", "Jane Doe"],
    "authorProfiles": [
        {
            "name": "John Smith",
            "authorId": "12345678",
            "hIndex": 45,
            "citationCount": 15000,
            "paperCount": 120,
            "affiliations": ["Stanford University"],
            "url": "https://www.semanticscholar.org/author/12345678"
        }
    ],
    "abstract": "This paper surveys recent advances...",
    "categories": ["cs.CL", "cs.AI"],
    "categoryDescriptions": ["Computation and Language (NLP)", "Artificial Intelligence"],
    "citations": {
        "totalCitations": 1250,
        "influentialCitations": 89,
        "citationVelocity": 125.5,
        "citationsPerYear": {
            "2023": 450,
            "2024": 800
        },
        "isHighlyInfluential": true
    },
    "references": [
        {
            "title": "Attention Is All You Need",
            "authors": ["Ashish Vaswani"],
            "citationCount": 75000,
            "arxivId": "1706.03762"
        }
    ],
    "relatedPapers": [
        {
            "title": "GPT-4 Technical Report",
            "citationCount": 5000,
            "url": "https://arxiv.org/abs/2303.08774"
        }
    ],
    "impactScore": 85.3,
    "venue": "NeurIPS 2024",
    "fieldsOfStudy": ["Computer Science", "Linguistics"],
    "pdfUrl": "https://arxiv.org/pdf/2401.12345.pdf",
    "arxivUrl": "https://arxiv.org/abs/2401.12345",
    "scrapedAt": "2024-01-20T12:00:00Z"
}

Cost

This actor uses pay-per-result pricing:

Cost Type	Amount
Start fee	$0.05 per run
Per paper	$0.001

No API key required - Uses arXiv and Semantic Scholar public APIs.

Example Cost Calculation

100 papers: $0.05 + (100 x $0.001) = $0.15
1,000 papers: $0.05 + (1000 x $0.001) = $1.05

Tips

Impact sorting: Results are automatically sorted by calculated impact score
Highly influential papers: Look for isHighlyInfluential: true for breakthrough papers
Author quality: Check author h-index to identify papers from established researchers
Citation velocity: High velocity indicates trending/hot papers
Related papers: Enable includeRelatedPapers for comprehensive literature discovery

Rate Limits

arXiv: 3-second delay between requests (handled automatically)
Semantic Scholar: 1-second delay (handled automatically)

Support

Built by: Artificially
Issues: Report bugs or request features via Apify Console

arXiv Search Scraper 📚

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. 🎓📚

EasyApi

5.0

(1)

arXiv Scraper

parseforge/arxiv-scraper

Comprehensive arXiv scraper for extracting scholarly article data across physics, math, CS, biology, finance, statistics, engineering, and economics. Automates access to arXiv’s large preprint archive, providing structured metadata for researchers, academics, and data scientists.

ParseForge

5.0

(1)

ArXiv Paper Scraper

nexgendata/arxiv-scraper

Extract research papers, abstracts, authors, and citations from arXiv.org. Perfect for academic research monitoring, literature reviews, and scientific trend analysis.

Stephan Corbeil

Arxiv Paper Scraper

technicaldost/arxiv-paper-scraper

Technical Dost Solutions

ArXiv Scraper

automation-lab/arxiv-scraper

Scrape ArXiv research papers — titles, authors, abstracts, subjects, submission dates, and PDF links.

Stas Persiianenko

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

arXiv Pro Scraper - API & Full Text

exuberant_promotion/arxiv-pro-scraper

A professional, low-cost arXiv scraper that uses the official API to find papers, then downloads, cleans, and chunks the full PDF text—creating AI-ready datasets in one click.

Lukas

ArXiv Academic Paper Scraper

fortuitous_pirate/arxiv-scraper

Scrape academic papers from ArXiv. Extract titles, authors, abstracts, categories, and PDF links. Essential for research and literature reviews.

Fortuitous Pirate

Arxiv Paper Intelligence

viralanalyzer/arxiv-paper-intelligence

Search and extract ArXiv papers, abstracts, authors, and citations. Track research trends across any scientific field. AI-powered analysis.

viralanalyzer

5.0

(1)

Article Content Extractor 📄

easyapi/article-content-extractor

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. 🔍📄

EasyApi

5.0

(1)