Pricing

from $10.00 / 1,000 results

Arxiv Research Scraper

Scrapes research papers from the arXiv preprint repository. Searches across all scientific disciplines including physics, mathematics, computer science, and more.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Donny

Actor stats

Bookmarked

Total users

Monthly active users

17 hours ago

Last modified

arXiv Research Paper Scraper

What it does

Scrapes research papers from the arXiv preprint repository. Searches across all scientific disciplines including physics, mathematics, computer science, and more.

This actor connects to a public API, fetches structured data based on your search criteria, and stores the results in a clean, normalized dataset on the Apify platform. It handles pagination automatically so you can collect large volumes of results without worrying about API limits or offsets. The actor is designed to be robust with built-in error handling, request timeouts, and input validation to ensure reliable data collection every time you run it.

Why use this actor

Manually querying APIs and handling pagination, rate limits, and data normalization is tedious and error-prone. This actor automates the entire process. Simply provide your search parameters, set the maximum number of results you want, and let the actor handle the rest. The data is stored in a structured dataset that you can export as JSON, CSV, or Excel. You can integrate this actor into larger workflows using the Apify API, schedule it for recurring data collection, or trigger it from your own applications via webhooks.

Input parameters

searchQuery (string, required): The search term to query arXiv papers. Default: "machine learning".
maxResults (integer, optional): Maximum number of results to return. Default: 100. Range: 1-1000.

All inputs are validated at startup with sensible defaults applied when values are missing. The actor will log warnings for any misconfigured options and continue with safe defaults rather than failing outright.

Output data

Each result in the dataset contains the following fields:

arxivId: The unique arXiv identifier for the paper
title: The title of the research paper
summary: The abstract or summary of the paper
authors: Comma-separated list of author names
published: The publication date
categories: Subject categories
pdfLink: Direct link to the PDF version

All string fields are null-checked to ensure consistent data quality. Missing or undefined values are stored as null rather than empty strings or undefined values.

Example output

{
    "arxivId": "2301.00001",
    "title": "Deep Learning for Natural Language Processing",
    "summary": "We present a novel approach...",
    "authors": "John Smith, Jane Doe",
    "published": "2023-01-15T00:00:00Z",
    "categories": "cs.CL",
    "pdfLink": "http://arxiv.org/pdf/2301.00001"
}

Pricing

This actor is available on the Apify platform with transparent usage-based pricing. Each run incurs a small startup cost of approximately $0.005 per start, plus roughly $0.01 per result collected. Actual costs depend on the number of results, API response times, and memory allocation. You can control costs by setting the maxResults parameter to limit the number of results collected per run. For high-volume use cases, consider running the actor on a schedule during off-peak hours to optimize platform resource usage.

More scrapers from brave_paradise

Check out these other useful data collection actors by brave_paradise:

Visit the brave_paradise profile on Apify to explore the full collection of specialized data scrapers and automation tools.

ArXiv Scraper - Extract Research Papers, Abstracts & Citations

intelligent_yaffle/arxiv-scraper

Scrape ArXiv research papers, abstracts, authors, and citations. Extract academic data for ML research. JSON/CSV API access. Need custom data extraction? Visit https://fatihai.app/tools/data-scraping for managed scraping services.

Fatih Dağüstü

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

Udemy Course Scraper 📚

easyapi/udemy-course-scraper

Extract detailed course information from Udemy.com with this powerful scraper. Collect comprehensive data about online courses, including ratings, content details, instructors, and pricing. Perfect for market research, content aggregation, and educational platform development.

EasyApi

5.0

(1)

🤖 Any Website URL to Article Summarizer

easyapi/any-website-url-to-article-summarizer

Transform any article, blog post, or web content into concise, AI-powered summaries. Get key insights and main points instantly with smart text analysis and markdown formatting. Perfect for researchers, content creators, and busy professionals who need quick, accurate content digests.

EasyApi

5.0

(1)

Startpage Search Scraper 🔍

easyapi/startpage-search-scraper

🔍 A powerful Startpage search results scraper that extracts comprehensive data including titles, URLs, descriptions, and source sites. Perfect for market research, competitor analysis, and content discovery while respecting privacy-focused search results.

EasyApi

5.0

(1)

AI Text Summarizer 📝

easyapi/ai-text-summarizer

🤖 Transform long texts into concise, meaningful summaries with AI! Support multiple languages, customizable summary lengths, and different summary styles. Perfect for content creators, researchers, and professionals who need quick, accurate text summarization.

EasyApi

5.0

(1)

Nature Search Results Scraper 🔬

easyapi/nature-search-results-scraper

Extract comprehensive research article data from Nature.com search results. Automatically scrape article details, author information, metadata, and preview images. Perfect for research monitoring, trend analysis, and building scientific literature databases. 🔬📚

EasyApi

5.0

(1)

Education & Research Email Scraper – Cheap & Advanced 🎓📧

scrapestorm/education-research-email-scraper---cheap-advanced

🔍 Scrape Education & Research Emails Easily Enter your search parameters (e.g. academic title, email domains & platform) to collect verified academic or institutional contacts along with role title, research snippet & more 📊 Perfect for academic outreach & education database enrichment 🧩

Storm_Scraper

5.0

(1)

Privacy Stack

bikram786/privacy-stack

Privacy researcher & developer building production Apify actors for arXiv privacy research. Privacy Stack brings 1 5,00+ real arXiv privacy papers into one place ..carefully verified with no fake URLs & no duplicates. Categories : Internet Privacy Data Privacy Crypto Privacy Post-Quantum Privacy

Bikram Biswas

Academic Paper Scraper

constant_quadruped/academic-paper-scraper

Search arXiv and PubMed in one request. Returns unified paper data: titles, authors, abstracts, DOIs, and PDF links. Filter by keywords, authors, categories, and date range. Built-in rate limiting and cross-source deduplication. Export to JSON, CSV, or Excel.