Pricing

$10.00/month + usage

ArXiv Research Paper Scraper

arXiv Research Paper Scraper retrieves academic paper metadata from the arXiv API based on a keyword. It extracts titles, abstracts, authors with affiliations, DOI, categories, submission dates, and PDF links. Supports proxy usage and outputs structured JSON results for research and data analysis.

Pricing

$10.00/month + usage

Rating

0.0

(0)

Developer

Data Pilot

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

🔥 Features

Comprehensive ArXiv Research Extraction – Scrapes detailed ArXiv Research data, including titles, abstracts, authors, and PDF links for any keyword.
Metadata Enrichment – Provides ArXiv Research metadata like DOI, categories, and submission dates for in-depth analysis.
Author Affiliation Tracking – Extracts author names and affiliations for ArXiv Research networking and citation tracking.
Proxy Support – Utilizes Apify's residential proxies to bypass restrictions and ensure high success rates for ArXiv Research scraping.
Structured JSON Output – Returns structured ArXiv Research data for easy integration into research databases.
Batch Processing – Processes multiple ArXiv Research keywords in a single run for efficient data collection.
Error Handling – Robust logging and fallback mechanisms for failed ArXiv Research extractions.
Dataset Integration – Automatically uploads ArXiv Research data to your Apify dataset for easy export and analysis.

⚙️ How It Works

The ArXiv Research Paper Scraper takes a search keyword as input and queries the arXiv API to retrieve ArXiv Research papers. It parses the API response to extract metadata such as titles, abstracts, and author details. The scraper returns structured ArXiv Research data on success or error details on failure, providing a reliable way to gather ArXiv Research information for academic and data analysis purposes.

Key Processing Steps:

Keyword Input – Parse and validate search keywords
API Query – Query arXiv API with search parameters
Response Parsing – Parse XML API response
Metadata Extraction – Extract title, abstract, authors
Author Affiliation Tracking – Extract author details and affiliations
DOI & Category Extraction – Extract DOI and research categories
PDF Link Generation – Generate PDF download links
Export – Push results to dataset in JSON format

Key benefits for ArXiv Research analysis:

Track ArXiv Research trends and publication dates.
Analyze ArXiv Research author networks and affiliations.
Build ArXiv Research databases for literature reviews.
Research emerging topics and trends.
Find related papers and authors.

📥 Input

The scraper accepts the following input parameters:

Field	Type	Default	Description
`keyword`	string	required	The search keyword to find ArXiv Research papers (e.g., `"machine learning"`, `"quantum physics"`).
`useApifyProxy`	boolean	`true`	Enable residential proxies for ArXiv Research scraping.
`apifyProxyGroups`	array	`["RESIDENTIAL"]`	Proxy groups to use (e.g., `["RESIDENTIAL"]`).

Example input JSON:

{
  "keyword": "artificial intelligence",
  "useApifyProxy": true,
  "apifyProxyGroups": ["RESIDENTIAL"]
}

Example Search Keywords:

"machine learning" – ML papers
"quantum computing" – Quantum research
"neural networks" – Deep learning papers
"natural language processing" – NLP research
"computer vision" – CV papers

📤 Output

The scraper outputs detailed ArXiv Research data in JSON format for each paper. Each record includes:

Field	Type	Description
`title`	string	Title of the ArXiv Research paper.
`abstract`	string	Abstract of the ArXiv Research paper.
`authors`	array	List of authors with affiliations.
`doi`	string	DOI of the ArXiv Research paper.
`categories`	array	Categories of the ArXiv Research paper.
`submissionDate`	string	Submission date of the paper.
`pdfLink`	string	Direct link to the PDF.
`url`	string	URL to the ArXiv paper page.

Example output for ArXiv Research data:

{
  "title": "Example ArXiv Research Paper",
  "abstract": "This is an example abstract for the arXiv research paper...",
  "authors": [
    {
      "name": "John Doe",
      "affiliation": "University of Example"
    },
    {
      "name": "Jane Smith",
      "affiliation": "Institute of Technology"
    }
  ],
  "doi": "10.48550/arXiv.1234.5678",
  "categories": ["cs.AI", "stat.ML"],
  "submissionDate": "2025-02-14",
  "pdfLink": "https://arxiv.org/pdf/1234.5678.pdf",
  "url": "https://arxiv.org/abs/1234.5678"
}

Example summary record:

{
  "summary": true,
  "keyword": "artificial intelligence",
  "total_papers": 100,
  "papers_returned": 50,
  "date_range": "2023-2025",
  "categories_found": 12,
  "authors_found": 250,
  "completed_at": "2025-02-14T12:35:00Z"
}

🧰 Technical Stack

API Integration: arXiv API – Official academic paper repository
HTTP Client: requests – API calls and data fetching
Data Parsing: XML parsing for API responses
JSON Processing: Structured data formatting
Proxy Support: Apify Proxy with RESIDENTIAL support
Platform: Apify Actor – serverless, scalable, integrated with Dataset
Deployment: One‑click run on Apify Console or via REST API

🎯 Use Cases

Literature Reviews – Find papers for comprehensive literature review.
Research Trend Analysis – Identify emerging trends in research fields.
Author Network Analysis – Map author networks and collaborations.
Citation Tracking – Track papers and their impact.
Dataset Creation for ML – Create datasets for machine learning research.
Academic Research – Conduct academic and meta-research studies.
Topic Modeling – Analyze research topics and trends.
Affiliation Analysis – Analyze research by institution and affiliation.
Time Series Analysis – Track research publication trends over time.
Field Analysis – Analyze specific research fields comprehensively.
Researcher Profiling – Build profiles of researchers and their work.
Collaboration Analysis – Identify collaboration patterns.
Emerging Technology Research – Track new technologies and methodologies.
Academic Benchmarking – Compare research output across institutions.

🚀 Quick Start

Open in Apify Console – visit the Actor page and click Try for free.
Enter search keyword – provide a research topic (e.g., "artificial intelligence").
Set proxy option – enabled by default for reliable access.
Click Start – the Actor will query the arXiv API.
View Results – check the dataset for extracted paper metadata.
Review Papers – examine titles, abstracts, authors, and affiliations.
Download PDFs – use provided PDF links to access full papers.
Export – download the results as JSON, CSV, or Excel for analysis.

You can also call this Actor programmatically via Apify SDK or REST API – ideal for automated literature review and academic research pipelines.

💎 Why This Scraper?

Feature	Benefit
✅ Official API	Direct access to arXiv's official API.
✅ Complete metadata	Get titles, abstracts, authors, DOI, PDFs.
✅ Author affiliations	Track institution affiliations.
✅ Research categories	Identify research field/category.
✅ Structured output	JSON format ready for databases.
✅ Proxy support	Reliable access with fallback.
✅ Error handling	Robust error handling.
✅ Apify ecosystem	Seamless integration with other Actors, triggers, and webhooks.

📦 Changelog

Initial release of ArXiv Research Paper Scraper
arXiv API integration
Keyword-based paper search
Title and abstract extraction
Author and affiliation extraction
DOI and category extraction
Submission date parsing
PDF link generation
Structured JSON output
Batch processing for multiple keywords
Error handling with fallback mechanisms
Proxy support for reliability
Summary statistics and reporting
Automatic dataset integration
Full Apify Actor integration

🧑‍💻 Support & Feedback

Issues & Ideas: Open a ticket on the Apify Actor issue tracker
Contributions: Pull requests are welcome via the GitHub repository
Documentation: Visit Apify Docs for comprehensive platform guides
Community: Join the Apify community forum for discussions and support
Bug Reports: Submit detailed bug reports through the issue tracker
Feature Requests: Suggest new features to improve the scraper

💰 Pricing

Free for basic usage on Apify platform
Paid plans available for higher limits and priority support

Disclaimer: ArXiv Research Paper Scraper is provided as-is for research and academic purposes. Users are responsible for ensuring their usage complies with arXiv's policies and applicable laws. Always attribute papers appropriately and respect academic integrity standards.

🎉 Get Started Today

Begin researching papers now!

Use ArXiv Research Paper Scraper for:

📚 Literature Reviews
📊 Research Analysis
🔍 Topic Research
💡 Trend Analysis
📈 Academic Research

Perfect for:

Researchers
PhD Students
Academics
Data Scientists
Students

Last Updated: February 2025
Version: 1.0.0
Status: Active Development
Support: 24/7 Customer Support Available
Platform: Apify

For comprehensive academic research and paper analysis, explore our full suite of tools:

Smart Article Extractor
AI Blog Dataset Creator
Fast News Content Scraper
RAG Web Scraper
All-in-One Media Downloader

arXiv Research Paper Scraper

codingfrontend/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results including titles, authors, abstracts, categories, and more.

Coding Frontned

Arxiv Paper Scraper

technicaldost/arxiv-paper-scraper

Technical Dost Solutions

arXiv Search Scraper 📚

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. 🎓📚

EasyApi

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Daniel

arXiv Search & Paper Scraper

scrapeworks/arxiv-search

Search arXiv and get clean structured JSON for each paper: title, authors, abstract, categories, DOI, PDF link, and dates. Built for research, datasets, and AI pipelines.

Nicolas van Arkens

ArXiv Academic Paper Scraper

fortuitous_pirate/arxiv-scraper

Scrape academic papers from ArXiv. Extract titles, authors, abstracts, categories, and PDF links. Essential for research and literature reviews.

Fortuitous Pirate

arXiv Research Paper Scraper

techionik9993/arxiv-research-paper-scraper

Scrape arXiv papers by keyword or category and return research titles, abstracts, authors, dates, links, and topic signals.

Techionik

arXiv Paper Scraper

skystone_labs/arxiv-scraper

Extract research papers from arXiv using the official API. Get titles, authors, abstracts, PDF URLs, categories, and more. Perfect for research datasets and literature reviews.

Skystone

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

arXiv Research Paper Scraper

crawlerbros/arxiv-research-paper-scraper

Scrape research papers from arXiv.org - search by query, category, or author; lookup by arXiv ID. Returns title, authors, abstract, PDF URL, DOI, categories, and more. Uses the public arXiv Atom API. No login or proxy required.