Pricing

Pay per event

Try for free

Go to Apify Store

arXiv Preprint Scraper

Try for free

Export preprints from arXiv.org. Search 2.5M+ open-access papers across physics, mathematics, computer science, biology, economics, and quantitative finance. Query by keyword, author, category, or date range. Pull titles, authors, abstracts, categories, DOIs, journal refs, and PDF links.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

4 hours ago

Last modified

📚 ArXiv Citation Scraper

🚀 Collect citation networks from ArXiv papers in minutes. Enter paper IDs or URLs and get citation trees with references and citing papers. Export paper metadata, authors, abstracts, and citation links. No coding, no API key required.

🕒 Last updated: 2026-04-16 · 📊 15+ fields per paper · 🔍 ID + URL input · 🔗 Citation + reference trees · 🚫 No auth required

The ArXiv Citation Scraper builds citation networks from ArXiv papers, returning 15+ fields per paper: title, authors, abstract, ArXiv ID, publication date, categories, PDF URL, and lists of references and citing papers. Configure citation depth and toggle references vs citations.

ArXiv hosts over 2 million preprints. This Actor traverses citation links to build structured networks for bibliometric analysis, literature reviews, or research discovery.

🎯 Target Audience	💡 Primary Use Cases
Academic researchers, data scientists, bibliometric analysts, ML engineers, science journalists	Citation network analysis, literature reviews, research discovery, impact tracking, bibliometrics

📋 What the ArXiv Citation Scraper does

Citation tree traversal:

🆔 ArXiv ID input. Enter paper IDs (e.g., "2311.09735").
🔗 URL input. Paste ArXiv paper URLs.
📚 References. Papers cited BY the input paper.
📊 Citations. Papers that CITE the input paper.
🌳 Depth control. Configure how many levels deep to traverse.

Each paper record includes title, authors, abstract, ArXiv ID, categories, publication date, PDF URL, reference list, and citation list.

💡 Why it matters: building citation networks manually means clicking through each paper's reference list one by one. This Actor traverses citation trees automatically and returns structured data for network analysis.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to build a citation network.

⚙️ Input

Input	Type	Default	Behavior
`arxivIds`	array	`[]`	ArXiv paper IDs (e.g., "2311.09735").
`startUrl`	string	`""`	ArXiv paper URL.
`maxItems`	integer	`10`	Max papers in the network.
`maxDepth`	integer	`1`	Citation tree depth (1 = direct, 2 = two levels).
`includeReferences`	boolean	`true`	Include papers cited by the input.
`includeCitations`	boolean	`true`	Include papers citing the input.

Example: citation network for the GEO paper.

{
    "arxivIds": ["2311.09735"],
    "maxItems": 50,
    "maxDepth": 1,
    "includeReferences": true,
    "includeCitations": true
}

Example: deep reference tree.

{
    "startUrl": "https://arxiv.org/abs/1706.03762",
    "maxItems": 100,
    "maxDepth": 2,
    "includeReferences": true,
    "includeCitations": false
}

⚠️ Good to Know: deeper citation trees (maxDepth > 1) grow exponentially. Start with depth 1 and increase if needed.

📊 Output

Each paper record contains 15+ fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

Field	Type	Example
🆔 `arxivId`	string	`"2311.09735"`
📝 `title`	string	`"GEO: Generative Engine Optimization"`
👤 `authors`	array	`["Pranjal Aggarwal", "Vishvak Murahari"]`
📄 `abstract`	string	`"We introduce GEO, a novel..."`
📅 `publishedDate`	string	`"2023-11-16"`
📂 `categories`	array	`["cs.IR", "cs.CL"]`
📄 `pdfUrl`	string	`"https://arxiv.org/pdf/2311.09735"`
📚 `references`	array	`["1706.03762", "2305.14314"]`
📊 `citations`	array	`["2401.12345"]`
🌳 `depth`	number	`0`
🔗 `url`	string	`"https://arxiv.org/abs/2311.09735"`
🕒 `scrapedAt`	ISO 8601	`"2026-04-16T00:00:00.000Z"`

📦 Sample records

✨ Why choose this Actor

	Capability
📚	2M+ ArXiv papers. Full ArXiv preprint archive.
🌳	Citation tree traversal. Configurable depth for network building.
🔗	References + citations. Both directions of the citation graph.
📄	Full metadata. Title, authors, abstract, categories, PDF URL.
🆔	ID and URL input. ArXiv IDs or full URLs.
⚡	Scalable. From single-paper lookups to deep network traversals.
🚫	No authentication. Public ArXiv data.

📊 ArXiv hosts over 2 million open-access preprints. Structured citation network data powers every bibliometric analysis, literature review, and research discovery workflow.

📈 How it compares to alternatives

Approach	Cost	Coverage	Citation depth	PDF links	Setup
⭐ ArXiv Citation Scraper (this Actor)	$5 free credit, then pay-per-use	Full ArXiv	Configurable	Yes	⚡ 2 min
ArXiv API (direct)	Free	Full metadata	No citations	Yes	⏳ Hours
Semantic Scholar API	Free with limits	Multi-source	Yes	Some	⏳ Hours
Manual ArXiv browsing	Free	One at a time	Manual	Yes	🕒 Hours

Pick this Actor when you want ArXiv citation networks with configurable depth, without writing API client code.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the ArXiv Citation Scraper page on the Apify Store.
🎯 Set input. Enter ArXiv IDs or URLs, set depth and direction.
🚀 Run it. Click Start.
📥 Download. Grab results in the Dataset tab.

⏱️ Total time: 3-5 minutes. No coding required.

💼 Business use cases

📊 Bibliometric Analysis

Build citation networks by topic
Track paper impact over time
Analyze author collaboration patterns
Study cross-field citation flows

📚 Literature Reviews

Map the reference landscape of a paper
Find related work through citations
Build reading lists from citation trees
Identify foundational papers by depth

🤖 ML & AI Research

Track model architecture citations
Map benchmark paper dependencies
Study methodology evolution
Build prior-art databases

🏢 R&D Intelligence

Monitor competitor publications
Track emerging research directions
Build technology landscape maps
Identify key researchers by citation count

🔌 Automating ArXiv Citation Scraper

🟢 Node.js. Install the apify-client NPM package.
🐍 Python. Use the apify-client PyPI package.
📚 See the Apify API documentation for full details.

❓ Frequently Asked Questions

🔌 Integrate with any app

Make - Automate workflows
Zapier - Connect 5,000+ apps
Slack - Get notifications
Airbyte - Data pipelines
GitHub - Trigger from commits
Google Drive - Export to Sheets

🔗 Recommended Actors

📚 Semantic Scholar Scraper - Academic paper metadata
🏥 ClinicalTrials.gov Scraper - Clinical trial data
🤖 Hugging Face Model Scraper - AI model metadata
📊 FRED Scraper - Economic data
📊 Indexmundi Scraper - Global indicators

💡 Pro Tip: browse the complete ParseForge collection for more research and academic scrapers.

🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by ArXiv or Cornell University. All trademarks mentioned are the property of their respective owners. Only publicly available preprint metadata is collected.

Google Scholar Search Scraper

ecomscrape/google-scholar-search-scraper

Extract comprehensive academic data from Google Scholar including research papers, citations, author information, and PDF links. Automate your literature review process with advanced scraping capabilities for researchers and academics.

ecomscrape

Google Scholar Scraper: Articles, Citations & PDFs

primeparse/google-scholar-scraper

Extract academic data from Google Scholar: titles, authors, years, citations, abstracts, PDF links. Supports queries, year filters (1900-2100), pagination (up to 5 pages). Rate-limited for safety. Ideal for research, citations, datasets, AI. Clean JSON output. Run on Apify with proxies.

PrimeParse

Google Scholar Scraper

marco.gullo/google-scholar-scraper

Scrape publication details from scholar.google.com. Add your query, time range, and optionally document type (PDF or HTML only). Extract information about articles such as titles, authors, links, related articles, and more.

Marco Gullo

1.7K

5.0

Google Scholar Scraper

crawlerbros/google-scholar-scraper

Scrape academic papers, articles, and citations from Google Scholar. Search by keywords with filters for year range, document type, sort order, and article type. Extract titles, authors, citations, links, and more.

Crawler Bros

5.0

Google Scholar Scraper

easyapi/google-scholar-scraper

Powerful Google Scholar scraper collect up to 5000 scholarly results per run with flexible search options, citation filtering. Perfect for academic research, bibliometric analysis, and scientific trend tracking. 🎓🔍

EasyApi

351

2.5

PubMed Search Scraper

easyapi/pubmed-search-scraper

Scrape research papers and academic articles from PubMed based on search terms. Extract comprehensive article metadata including titles, authors, citations, abstracts, and more. Perfect for medical research and literature reviews.

EasyApi

ResearchGate Scraper

quickone/researchgate-scraper

Input a direct URL of an researchgate.net article and extract its title, authors, overview, citations, references, and other specifications. Export acquired data into HTML, JSON, CSV, or Excel.

Quick One

5.0

Google Scholar Scraper

george.the.developer/google-scholar-scraper

Scrape Google Scholar for academic papers, citations, author profiles. No API key needed. Extract titles, authors, abstracts, citation counts, PDF links, h-index, i10-index. Export JSON, CSV, Excel. Anti-bot protection with residential proxies, UA rotation, CAPTCHA detection.

George Kioko

5.0

ArXiv Preprint Paper Search

ryanclinton/arxiv-paper-search

Search and extract preprint research papers from the ArXiv open-access repository. Query over 2.4 million academic papers across physics, mathematics, computer science, biology, economics, and more with structured JSON output, no API key required.

ryan clinton

Examine.com Supplement Research

hanamira/examine-com-supplement-research

Extract evidence-based supplement research from Examine.com. Get efficacy ratings, dosage recommendations, safety data, drug interactions, and research summaries for any supplement or health condition.