arXiv Scraper
Pricing
from $0.70 / 1,000 results
arXiv Scraper
Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.
Pricing
from $0.70 / 1,000 results
Rating
0.0
(0)
Developer

Artificially
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
5 days ago
Last modified
Categories
Share
arXiv Papers Scraper - Enhanced
Search and extract academic papers from arXiv.org with citation analysis, author profiles, and impact metrics via Semantic Scholar integration.
Features
Core Search
- Full-text Search: Search across all arXiv papers
- Category Filtering: Filter by arXiv category (cs.AI, physics, math, etc.)
- Sorting Options: Sort by relevance, submission date, or update date
- Complete Metadata: Title, authors, abstract, categories, dates
Citation Analysis (NEW)
- Citation Counts: Total citations from Semantic Scholar
- Influential Citations: Citations that significantly impacted the field
- Citation Velocity: Recent citation momentum
- Citations Per Year: Historical citation distribution
- Highly Influential Flag: Identify breakthrough papers
Author Profiles (NEW)
- h-Index: Author's impact metric
- Total Citations: Lifetime citation count
- Paper Count: Publication volume
- Affiliations: Current institutional affiliations
- Semantic Scholar Links: Direct profile links
Related Content (NEW)
- References: Papers cited by each result
- Related Papers: AI-recommended similar papers
- Venue Information: Publication venue if applicable
- Fields of Study: Semantic Scholar topic classification
Impact Scoring (NEW)
- Calculated Impact Score: Combined metric considering citations, author h-index, and momentum
- Results sorted by impact: Most influential papers first
Use Cases
- Build research paper datasets with citation metrics
- Identify high-impact papers in your field
- Find influential authors and their work
- Track citation trends over time
- Literature review with impact analysis
- Research team evaluation
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
searchQuery | string | Yes | - | Search terms |
category | string | No | - | arXiv category filter |
maxPapers | number | No | 100 | Maximum papers |
sortBy | string | No | submittedDate | Sort order |
includeCitations | boolean | No | true | Fetch citation metrics |
includeAuthorProfiles | boolean | No | true | Fetch author h-index and stats |
includeReferences | boolean | No | false | Fetch paper bibliography |
maxReferences | number | No | 10 | References per paper |
includeRelatedPapers | boolean | No | false | Fetch similar papers |
maxRelatedPapers | number | No | 5 | Related papers per result |
Example Input
{"searchQuery": "large language models","category": "cs.CL","maxPapers": 50,"includeCitations": true,"includeAuthorProfiles": true,"includeRelatedPapers": true,"sortBy": "submittedDate"}
Output
Each paper produces a result with:
{"arxivId": "2401.12345","title": "Advances in Large Language Models: A Survey","authors": ["John Smith", "Jane Doe"],"authorProfiles": [{"name": "John Smith","authorId": "12345678","hIndex": 45,"citationCount": 15000,"paperCount": 120,"affiliations": ["Stanford University"],"url": "https://www.semanticscholar.org/author/12345678"}],"abstract": "This paper surveys recent advances...","categories": ["cs.CL", "cs.AI"],"categoryDescriptions": ["Computation and Language (NLP)", "Artificial Intelligence"],"citations": {"totalCitations": 1250,"influentialCitations": 89,"citationVelocity": 125.5,"citationsPerYear": {"2023": 450,"2024": 800},"isHighlyInfluential": true},"references": [{"title": "Attention Is All You Need","authors": ["Ashish Vaswani"],"citationCount": 75000,"arxivId": "1706.03762"}],"relatedPapers": [{"title": "GPT-4 Technical Report","citationCount": 5000,"url": "https://arxiv.org/abs/2303.08774"}],"impactScore": 85.3,"venue": "NeurIPS 2024","fieldsOfStudy": ["Computer Science", "Linguistics"],"pdfUrl": "https://arxiv.org/pdf/2401.12345.pdf","arxivUrl": "https://arxiv.org/abs/2401.12345","scrapedAt": "2024-01-20T12:00:00Z"}
Cost
This actor uses pay-per-result pricing:
| Cost Type | Amount |
|---|---|
| Start fee | $0.05 per run |
| Per paper | $0.001 |
No API key required - Uses arXiv and Semantic Scholar public APIs.
Example Cost Calculation
- 100 papers: $0.05 + (100 x $0.001) = $0.15
- 1,000 papers: $0.05 + (1000 x $0.001) = $1.05
Tips
-
Impact sorting: Results are automatically sorted by calculated impact score
-
Highly influential papers: Look for
isHighlyInfluential: truefor breakthrough papers -
Author quality: Check author h-index to identify papers from established researchers
-
Citation velocity: High velocity indicates trending/hot papers
-
Related papers: Enable
includeRelatedPapersfor comprehensive literature discovery
Rate Limits
- arXiv: 3-second delay between requests (handled automatically)
- Semantic Scholar: 1-second delay (handled automatically)
Support
- Built by: Artificially
- Issues: Report bugs or request features via Apify Console