arXiv Preprint Scraper avatar

arXiv Preprint Scraper

Pricing

Pay per event

Go to Apify Store
arXiv Preprint Scraper

arXiv Preprint Scraper

Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Pull titles, authors, abstracts, categories, DOIs, journal refs, and PDF links.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

12

Total users

4

Monthly active users

4 hours ago

Last modified

Share

ParseForge Banner

📚 ArXiv Citation Scraper

🚀 Collect citation networks from ArXiv papers in minutes. Enter paper IDs or URLs and get citation trees with references and citing papers. Export paper metadata, authors, abstracts, and citation links. No coding, no API key required.

🕒 Last updated: 2026-04-16 · 📊 15+ fields per paper · 🔍 ID + URL input · 🔗 Citation + reference trees · 🚫 No auth required

The ArXiv Citation Scraper builds citation networks from ArXiv papers, returning 15+ fields per paper: title, authors, abstract, ArXiv ID, publication date, categories, PDF URL, and lists of references and citing papers. Configure citation depth and toggle references vs citations.

ArXiv hosts over 2 million preprints. This Actor traverses citation links to build structured networks for bibliometric analysis, literature reviews, or research discovery.

🎯 Target Audience💡 Primary Use Cases
Academic researchers, data scientists, bibliometric analysts, ML engineers, science journalistsCitation network analysis, literature reviews, research discovery, impact tracking, bibliometrics

📋 What the ArXiv Citation Scraper does

Citation tree traversal:

  • 🆔 ArXiv ID input. Enter paper IDs (e.g., "2311.09735").
  • 🔗 URL input. Paste ArXiv paper URLs.
  • 📚 References. Papers cited BY the input paper.
  • 📊 Citations. Papers that CITE the input paper.
  • 🌳 Depth control. Configure how many levels deep to traverse.

Each paper record includes title, authors, abstract, ArXiv ID, categories, publication date, PDF URL, reference list, and citation list.

💡 Why it matters: building citation networks manually means clicking through each paper's reference list one by one. This Actor traverses citation trees automatically and returns structured data for network analysis.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to build a citation network.


⚙️ Input

InputTypeDefaultBehavior
arxivIdsarray[]ArXiv paper IDs (e.g., "2311.09735").
startUrlstring""ArXiv paper URL.
maxItemsinteger10Max papers in the network.
maxDepthinteger1Citation tree depth (1 = direct, 2 = two levels).
includeReferencesbooleantrueInclude papers cited by the input.
includeCitationsbooleantrueInclude papers citing the input.

Example: citation network for the GEO paper.

{
"arxivIds": ["2311.09735"],
"maxItems": 50,
"maxDepth": 1,
"includeReferences": true,
"includeCitations": true
}

Example: deep reference tree.

{
"startUrl": "https://arxiv.org/abs/1706.03762",
"maxItems": 100,
"maxDepth": 2,
"includeReferences": true,
"includeCitations": false
}

⚠️ Good to Know: deeper citation trees (maxDepth > 1) grow exponentially. Start with depth 1 and increase if needed.


📊 Output

Each paper record contains 15+ fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 arxivIdstring"2311.09735"
📝 titlestring"GEO: Generative Engine Optimization"
👤 authorsarray["Pranjal Aggarwal", "Vishvak Murahari"]
📄 abstractstring"We introduce GEO, a novel..."
📅 publishedDatestring"2023-11-16"
📂 categoriesarray["cs.IR", "cs.CL"]
📄 pdfUrlstring"https://arxiv.org/pdf/2311.09735"
📚 referencesarray["1706.03762", "2305.14314"]
📊 citationsarray["2401.12345"]
🌳 depthnumber0
🔗 urlstring"https://arxiv.org/abs/2311.09735"
🕒 scrapedAtISO 8601"2026-04-16T00:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
📚2M+ ArXiv papers. Full ArXiv preprint archive.
🌳Citation tree traversal. Configurable depth for network building.
🔗References + citations. Both directions of the citation graph.
📄Full metadata. Title, authors, abstract, categories, PDF URL.
🆔ID and URL input. ArXiv IDs or full URLs.
Scalable. From single-paper lookups to deep network traversals.
🚫No authentication. Public ArXiv data.

📊 ArXiv hosts over 2 million open-access preprints. Structured citation network data powers every bibliometric analysis, literature review, and research discovery workflow.


📈 How it compares to alternatives

ApproachCostCoverageCitation depthPDF linksSetup
⭐ ArXiv Citation Scraper (this Actor)$5 free credit, then pay-per-useFull ArXivConfigurableYes⚡ 2 min
ArXiv API (direct)FreeFull metadataNo citationsYes⏳ Hours
Semantic Scholar APIFree with limitsMulti-sourceYesSome⏳ Hours
Manual ArXiv browsingFreeOne at a timeManualYes🕒 Hours

Pick this Actor when you want ArXiv citation networks with configurable depth, without writing API client code.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the ArXiv Citation Scraper page on the Apify Store.
  3. 🎯 Set input. Enter ArXiv IDs or URLs, set depth and direction.
  4. 🚀 Run it. Click Start.
  5. 📥 Download. Grab results in the Dataset tab.

⏱️ Total time: 3-5 minutes. No coding required.


💼 Business use cases

📊 Bibliometric Analysis

  • Build citation networks by topic
  • Track paper impact over time
  • Analyze author collaboration patterns
  • Study cross-field citation flows

📚 Literature Reviews

  • Map the reference landscape of a paper
  • Find related work through citations
  • Build reading lists from citation trees
  • Identify foundational papers by depth

🤖 ML & AI Research

  • Track model architecture citations
  • Map benchmark paper dependencies
  • Study methodology evolution
  • Build prior-art databases

🏢 R&D Intelligence

  • Monitor competitor publications
  • Track emerging research directions
  • Build technology landscape maps
  • Identify key researchers by citation count

🔌 Automating ArXiv Citation Scraper

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

❓ Frequently Asked Questions


🔌 Integrate with any app


💡 Pro Tip: browse the complete ParseForge collection for more research and academic scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by ArXiv or Cornell University. All trademarks mentioned are the property of their respective owners. Only publicly available preprint metadata is collected.