arXiv Scraper avatar

arXiv Scraper

Pricing

Pay per event

Go to Apify Store
arXiv Scraper

arXiv Scraper

Comprehensive arXiv scraper for extracting scholarly article data across physics, math, CS, biology, finance, statistics, engineering, and economics. Automates access to arXiv’s large preprint archive, providing structured metadata for researchers, academics, and data scientists.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

8

Total users

2

Monthly active users

21 hours ago

Last modified

Share

ParseForge Banner

πŸ“š ArXiv Citations Network Scraper

Collect citation networks from academic papers on ArXiv without complex setup. This scraper returns complete metadata, full author lists, and citation relationships with zero authentication needed. Researchers and analysts use this to discover research trends, map paper networks, and audit citation impact across your dataset.

The ArXiv Citations Network Scraper collects citation metadata for academic papers, citations, and references from ArXiv and Semantic Scholar, returning up to 22 data fields per paper, with zero authentication required.

✨ What Does It Do

  • πŸ“ Title - Organize papers by publication title for quick reference in research summaries
  • πŸ“Š Citation Count - Track how many times a paper was cited to measure research impact
  • πŸ”— Authors - Review all contributors on each paper to identify collaborators and subject experts
  • πŸ“… Publication Date - Sort papers chronologically to spot emerging topics in your research area
  • πŸ“„ Abstract - Read paper summaries to assess relevance before deep dives
  • 🎯 Depth - Identify how far into the citation network each paper sits for analysis layering
  • πŸ–ΌοΈ PDF URL - Download papers directly for offline reading and note-taking

πŸ”§ Input

  • ArXiv Paper IDs - Comma-separated list of arXiv IDs (e.g., 1706.03762, 2301.07041) to use instead of startUrl
  • Start URL - Single ArXiv paper URL (e.g., https://arxiv.org/abs/1706.03762) to parse for the paper ID
  • Max Items - Maximum papers to collect, free users capped at 100, paid users can request up to 1 million
  • Citation Graph Depth - How many levels to traverse the citation network, 1 = direct citations only, 2 = citations of citations, higher values increase run time
  • Include References - Fetch papers cited by your seed papers (outgoing references)
  • Include Citations - Fetch papers that cite your seed papers (incoming citations)
  • Max Concurrency - Parallel requests to Semantic Scholar, keep 1-2 to avoid rate limits
  • Request Delay (ms) - Milliseconds to pause between requests, increase if rate-limited

Example JSON:

{
"arxivIds": ["1706.03762"],
"maxItems": 100,
"maxDepth": 2,
"includeReferences": true,
"includeCitations": true,
"maxConcurrency": 1,
"requestDelayMs": 1000
}

πŸ“Š Output

Each paper record includes up to 22 data fields. Download as JSON, CSV, or Excel.

πŸ“ Paper IDπŸ“š ArXiv IDπŸ“– Title
πŸ“„ Abstract🎯 VenueπŸ“… Publication Year
πŸ“Š Citation CountπŸ“– Reference CountπŸ“Ž DOI
πŸ”— PDF URLπŸ‘₯ AuthorsπŸ“ Depth
πŸ”„ Relation Type🌐 Source ArXiv IDπŸ“¦ Citations Array
πŸ“š References ArrayπŸ• Scraped At⚠️ Error Status

πŸ’Ž Why Choose the ArXiv Scraper?

FeatureArXiv ScraperSimilar Tools
Citation and reference network traversalβœ”οΈβŒ
Configurable depth for multi-level citation mapsβœ”οΈβŒ
Zero authentication requiredβœ”οΈPartial
Access to Semantic Scholar metadataβœ”οΈβŒ
Automatic PDF URL generationβœ”οΈβŒ
Full author name and count dataβœ”οΈPartial
Rate limit handling with auto-retryβœ”οΈPartial
Export to JSON, CSV, Excelβœ”οΈβœ”οΈ
Configurable request delaysβœ”οΈβŒ
Duplicate paper deduplicationβœ”οΈβŒ

πŸ“‹ How to Use

No technical skills required. Follow these simple steps:

  1. Sign Up: Create a free account with $5 credit
  2. Find the Tool: Search for "ArXiv Citations Network Scraper" in the Apify Store and configure your input
  3. Run It: Click "Start" and watch your results appear

That's it. No coding, no setup, no complicated configuration. Now you can export your data in CSV, Excel, or JSON format.

🎯 Business Use Cases

  • πŸ“Š Academic Researchers - Map citation networks across papers in your field to identify foundational works and emerging research clusters
  • πŸ’Ό Literature Review Analysts - Audit citation counts and publication venues to prioritize papers for systematic reviews
  • πŸ”¬ Data Scientists - Collect metadata on thousands of papers to train models for citation prediction and research trend detection

❓ FAQ

πŸ” How does it work? The scraper sends queries to Semantic Scholar using arXiv IDs, collects paper metadata and citation relationships, then traverses the citation graph to the depth you specify.

πŸ“Š How accurate is the citation data? Semantic Scholar covers millions of academic papers, but arXiv papers may have incomplete citation records if they are very new or less frequently cited. The scraper returns all available data.

πŸ“… Can I schedule runs? Yes, you can schedule tasks on Apify to run hourly, daily, or on any custom interval. Use the Apify platform scheduler for recurring research updates.

βš–οΈ Is scraping ArXiv and Semantic Scholar legal? ArXiv and Semantic Scholar publish metadata openly and encourage research use. However, you are responsible for complying with their terms of service and respecting rate limits.

πŸ›‘οΈ Will ArXiv or Semantic Scholar block me? The scraper respects rate limits and includes automatic delays. Keep concurrency low (1-2) and delays at 1000ms or higher to avoid blocking.

⚑ How long does a run take? A single paper with citations takes 5-15 seconds depending on depth and network size. Collecting 100 papers at depth 2 typically takes 1-2 minutes.

⚠️ Are there any limits? Free users can collect up to 100 results per run. Paid users can collect up to 1,000,000 results per run.

πŸ”— Integrate ArXiv Citations Network Scraper with any app

πŸ’‘ More ParseForge Actors

Browse our complete collection of data extraction tools for more.

πŸš€ Ready to Start?

Create a free account with $5 credit and collect your first 100 results for free. No coding, no setup.

πŸ†˜ Need Help?

  • Check the FAQ section above for common questions
  • Visit the Apify support page for documentation and tutorials
  • Contact us to request a new scraper, propose a custom project, or report an issue at Tally contact form

⚠️ Disclaimer

This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by ArXiv, Semantic Scholar, or any of their subsidiaries. All trademarks mentioned are the property of their respective owners.