arXiv Scraper
Pricing
Pay per event
arXiv Scraper
Comprehensive arXiv scraper for extracting scholarly article data across physics, math, CS, biology, finance, statistics, engineering, and economics. Automates access to arXivβs large preprint archive, providing structured metadata for researchers, academics, and data scientists.
Pricing
Pay per event
Rating
5.0
(1)
Developer
ParseForge
Actor stats
0
Bookmarked
8
Total users
2
Monthly active users
21 hours ago
Last modified
Categories
Share

π ArXiv Citations Network Scraper
Collect citation networks from academic papers on ArXiv without complex setup. This scraper returns complete metadata, full author lists, and citation relationships with zero authentication needed. Researchers and analysts use this to discover research trends, map paper networks, and audit citation impact across your dataset.
The ArXiv Citations Network Scraper collects citation metadata for academic papers, citations, and references from ArXiv and Semantic Scholar, returning up to 22 data fields per paper, with zero authentication required.
β¨ What Does It Do
- π Title - Organize papers by publication title for quick reference in research summaries
- π Citation Count - Track how many times a paper was cited to measure research impact
- π Authors - Review all contributors on each paper to identify collaborators and subject experts
- π Publication Date - Sort papers chronologically to spot emerging topics in your research area
- π Abstract - Read paper summaries to assess relevance before deep dives
- π― Depth - Identify how far into the citation network each paper sits for analysis layering
- πΌοΈ PDF URL - Download papers directly for offline reading and note-taking
π§ Input
- ArXiv Paper IDs - Comma-separated list of arXiv IDs (e.g., 1706.03762, 2301.07041) to use instead of startUrl
- Start URL - Single ArXiv paper URL (e.g., https://arxiv.org/abs/1706.03762) to parse for the paper ID
- Max Items - Maximum papers to collect, free users capped at 100, paid users can request up to 1 million
- Citation Graph Depth - How many levels to traverse the citation network, 1 = direct citations only, 2 = citations of citations, higher values increase run time
- Include References - Fetch papers cited by your seed papers (outgoing references)
- Include Citations - Fetch papers that cite your seed papers (incoming citations)
- Max Concurrency - Parallel requests to Semantic Scholar, keep 1-2 to avoid rate limits
- Request Delay (ms) - Milliseconds to pause between requests, increase if rate-limited
Example JSON:
{"arxivIds": ["1706.03762"],"maxItems": 100,"maxDepth": 2,"includeReferences": true,"includeCitations": true,"maxConcurrency": 1,"requestDelayMs": 1000}
π Output
Each paper record includes up to 22 data fields. Download as JSON, CSV, or Excel.
| π Paper ID | π ArXiv ID | π Title |
|---|---|---|
| π Abstract | π― Venue | π Publication Year |
| π Citation Count | π Reference Count | π DOI |
| π PDF URL | π₯ Authors | π Depth |
| π Relation Type | π Source ArXiv ID | π¦ Citations Array |
| π References Array | π Scraped At | β οΈ Error Status |
π Why Choose the ArXiv Scraper?
| Feature | ArXiv Scraper | Similar Tools |
|---|---|---|
| Citation and reference network traversal | βοΈ | β |
| Configurable depth for multi-level citation maps | βοΈ | β |
| Zero authentication required | βοΈ | Partial |
| Access to Semantic Scholar metadata | βοΈ | β |
| Automatic PDF URL generation | βοΈ | β |
| Full author name and count data | βοΈ | Partial |
| Rate limit handling with auto-retry | βοΈ | Partial |
| Export to JSON, CSV, Excel | βοΈ | βοΈ |
| Configurable request delays | βοΈ | β |
| Duplicate paper deduplication | βοΈ | β |
π How to Use
No technical skills required. Follow these simple steps:
- Sign Up: Create a free account with $5 credit
- Find the Tool: Search for "ArXiv Citations Network Scraper" in the Apify Store and configure your input
- Run It: Click "Start" and watch your results appear
That's it. No coding, no setup, no complicated configuration. Now you can export your data in CSV, Excel, or JSON format.
π― Business Use Cases
- π Academic Researchers - Map citation networks across papers in your field to identify foundational works and emerging research clusters
- πΌ Literature Review Analysts - Audit citation counts and publication venues to prioritize papers for systematic reviews
- π¬ Data Scientists - Collect metadata on thousands of papers to train models for citation prediction and research trend detection
β FAQ
π How does it work? The scraper sends queries to Semantic Scholar using arXiv IDs, collects paper metadata and citation relationships, then traverses the citation graph to the depth you specify.
π How accurate is the citation data? Semantic Scholar covers millions of academic papers, but arXiv papers may have incomplete citation records if they are very new or less frequently cited. The scraper returns all available data.
π Can I schedule runs? Yes, you can schedule tasks on Apify to run hourly, daily, or on any custom interval. Use the Apify platform scheduler for recurring research updates.
βοΈ Is scraping ArXiv and Semantic Scholar legal? ArXiv and Semantic Scholar publish metadata openly and encourage research use. However, you are responsible for complying with their terms of service and respecting rate limits.
π‘οΈ Will ArXiv or Semantic Scholar block me? The scraper respects rate limits and includes automatic delays. Keep concurrency low (1-2) and delays at 1000ms or higher to avoid blocking.
β‘ How long does a run take? A single paper with citations takes 5-15 seconds depending on depth and network size. Collecting 100 papers at depth 2 typically takes 1-2 minutes.
β οΈ Are there any limits? Free users can collect up to 100 results per run. Paid users can collect up to 1,000,000 results per run.
π Integrate ArXiv Citations Network Scraper with any app
- Make - Automate workflows
- Zapier - Connect 5000+ apps
- GitHub - Version control integration
- Slack - Get notifications
- Airbyte - Data pipelines
- Google Drive - Export to spreadsheets
π‘ More ParseForge Actors
- medRxiv Scraper - Collect medical research papers and citations from medRxiv
- Houzz Scraper - Extract home design and renovation project data
- Carparts.com Scraper - Download automotive parts and availability data
- Revzilla Scraper - Collect motorcycle gear and apparel listings
- Indexmundi Scraper - Extract country statistics and economic indicators
Browse our complete collection of data extraction tools for more.
π Ready to Start?
Create a free account with $5 credit and collect your first 100 results for free. No coding, no setup.
π Need Help?
- Check the FAQ section above for common questions
- Visit the Apify support page for documentation and tutorials
- Contact us to request a new scraper, propose a custom project, or report an issue at Tally contact form
β οΈ Disclaimer
This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by ArXiv, Semantic Scholar, or any of their subsidiaries. All trademarks mentioned are the property of their respective owners.