arXiv Preprint Scraper
Pricing
Pay per event
arXiv Preprint Scraper
Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Pull titles, authors, abstracts, categories, DOIs, journal refs, and PDF links.
Pricing
Pay per event
Rating
5.0
(1)
Developer
ParseForge
Actor stats
0
Bookmarked
12
Total users
4
Monthly active users
4 hours ago
Last modified
Categories
Share

📚 ArXiv Citation Scraper
🚀 Collect citation networks from ArXiv papers in minutes. Enter paper IDs or URLs and get citation trees with references and citing papers. Export paper metadata, authors, abstracts, and citation links. No coding, no API key required.
🕒 Last updated: 2026-04-16 · 📊 15+ fields per paper · 🔍 ID + URL input · 🔗 Citation + reference trees · 🚫 No auth required
The ArXiv Citation Scraper builds citation networks from ArXiv papers, returning 15+ fields per paper: title, authors, abstract, ArXiv ID, publication date, categories, PDF URL, and lists of references and citing papers. Configure citation depth and toggle references vs citations.
ArXiv hosts over 2 million preprints. This Actor traverses citation links to build structured networks for bibliometric analysis, literature reviews, or research discovery.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Academic researchers, data scientists, bibliometric analysts, ML engineers, science journalists | Citation network analysis, literature reviews, research discovery, impact tracking, bibliometrics |
📋 What the ArXiv Citation Scraper does
Citation tree traversal:
- 🆔 ArXiv ID input. Enter paper IDs (e.g., "2311.09735").
- 🔗 URL input. Paste ArXiv paper URLs.
- 📚 References. Papers cited BY the input paper.
- 📊 Citations. Papers that CITE the input paper.
- 🌳 Depth control. Configure how many levels deep to traverse.
Each paper record includes title, authors, abstract, ArXiv ID, categories, publication date, PDF URL, reference list, and citation list.
💡 Why it matters: building citation networks manually means clicking through each paper's reference list one by one. This Actor traverses citation trees automatically and returns structured data for network analysis.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to build a citation network.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
arxivIds | array | [] | ArXiv paper IDs (e.g., "2311.09735"). |
startUrl | string | "" | ArXiv paper URL. |
maxItems | integer | 10 | Max papers in the network. |
maxDepth | integer | 1 | Citation tree depth (1 = direct, 2 = two levels). |
includeReferences | boolean | true | Include papers cited by the input. |
includeCitations | boolean | true | Include papers citing the input. |
Example: citation network for the GEO paper.
{"arxivIds": ["2311.09735"],"maxItems": 50,"maxDepth": 1,"includeReferences": true,"includeCitations": true}
Example: deep reference tree.
{"startUrl": "https://arxiv.org/abs/1706.03762","maxItems": 100,"maxDepth": 2,"includeReferences": true,"includeCitations": false}
⚠️ Good to Know: deeper citation trees (maxDepth > 1) grow exponentially. Start with depth 1 and increase if needed.
📊 Output
Each paper record contains 15+ fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🆔 arxivId | string | "2311.09735" |
📝 title | string | "GEO: Generative Engine Optimization" |
👤 authors | array | ["Pranjal Aggarwal", "Vishvak Murahari"] |
📄 abstract | string | "We introduce GEO, a novel..." |
📅 publishedDate | string | "2023-11-16" |
📂 categories | array | ["cs.IR", "cs.CL"] |
📄 pdfUrl | string | "https://arxiv.org/pdf/2311.09735" |
📚 references | array | ["1706.03762", "2305.14314"] |
📊 citations | array | ["2401.12345"] |
🌳 depth | number | 0 |
🔗 url | string | "https://arxiv.org/abs/2311.09735" |
🕒 scrapedAt | ISO 8601 | "2026-04-16T00:00:00.000Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 📚 | 2M+ ArXiv papers. Full ArXiv preprint archive. |
| 🌳 | Citation tree traversal. Configurable depth for network building. |
| 🔗 | References + citations. Both directions of the citation graph. |
| 📄 | Full metadata. Title, authors, abstract, categories, PDF URL. |
| 🆔 | ID and URL input. ArXiv IDs or full URLs. |
| ⚡ | Scalable. From single-paper lookups to deep network traversals. |
| 🚫 | No authentication. Public ArXiv data. |
📊 ArXiv hosts over 2 million open-access preprints. Structured citation network data powers every bibliometric analysis, literature review, and research discovery workflow.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Citation depth | PDF links | Setup |
|---|---|---|---|---|---|
| ⭐ ArXiv Citation Scraper (this Actor) | $5 free credit, then pay-per-use | Full ArXiv | Configurable | Yes | ⚡ 2 min |
| ArXiv API (direct) | Free | Full metadata | No citations | Yes | ⏳ Hours |
| Semantic Scholar API | Free with limits | Multi-source | Yes | Some | ⏳ Hours |
| Manual ArXiv browsing | Free | One at a time | Manual | Yes | 🕒 Hours |
Pick this Actor when you want ArXiv citation networks with configurable depth, without writing API client code.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the ArXiv Citation Scraper page on the Apify Store.
- 🎯 Set input. Enter ArXiv IDs or URLs, set depth and direction.
- 🚀 Run it. Click Start.
- 📥 Download. Grab results in the Dataset tab.
⏱️ Total time: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating ArXiv Citation Scraper
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
❓ Frequently Asked Questions
🔌 Integrate with any app
- Make - Automate workflows
- Zapier - Connect 5,000+ apps
- Slack - Get notifications
- Airbyte - Data pipelines
- GitHub - Trigger from commits
- Google Drive - Export to Sheets
🔗 Recommended Actors
- 📚 Semantic Scholar Scraper - Academic paper metadata
- 🏥 ClinicalTrials.gov Scraper - Clinical trial data
- 🤖 Hugging Face Model Scraper - AI model metadata
- 📊 FRED Scraper - Economic data
- 📊 Indexmundi Scraper - Global indicators
💡 Pro Tip: browse the complete ParseForge collection for more research and academic scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by ArXiv or Cornell University. All trademarks mentioned are the property of their respective owners. Only publicly available preprint metadata is collected.