Pricing

from $9.00 / 1,000 results

Arxiv Keyword Spider

Arxiv Keyword Spider efficiently scrapes arXiv.org for research papers using keywords, delivering comprehensive metadata like titles, authors, abstracts, and categories. Perfect for academic research, market analysis, and trend monitoring....

Pricing

from $9.00 / 1,000 results

Rating

0.0

(0)

Developer

GetDataForMe

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Description

Arxiv Keyword Spider

Introduction

The Arxiv Keyword Spider is a powerful Apify Actor designed to scrape and extract research papers from arXiv.org based on user-defined keywords. It provides comprehensive metadata for each paper, including titles, authors, abstracts, and categorization, enabling efficient data collection for academic research, market analysis, and trend monitoring. This tool streamlines the process of gathering insights from the vast arXiv repository, saving time and effort for users seeking targeted scientific information.

Features

Keyword-Based Search: Perform precise queries on arXiv to retrieve relevant papers matching specific terms or topics.
Comprehensive Metadata Extraction: Captures essential details such as paper IDs, URLs, titles, authors, abstracts, and categories for thorough analysis.
High Reliability: Built on robust scraping technology to ensure accurate and consistent data retrieval from arXiv's dynamic content.
Scalable Performance: Handles large volumes of results efficiently, with options for pagination and filtering to manage output size.
Structured Output: Delivers data in clean JSON format, ready for integration into databases, analytics tools, or downstream processing.
Error Handling: Includes built-in mechanisms to manage rate limits, network issues, and incomplete data gracefully.
No Coding Required: User-friendly interface on Apify Store for easy configuration and execution without technical expertise.

Input Parameters

Parameter	Type	Required	Description	Example
Query	string	No	The keyword or phrase to search for in arXiv papers. Defaults to a basic term if not specified.	"machine learning"

Example Usage

To run the Arxiv Keyword Spider, configure the input parameters in the Apify console or via API. Here's an example input JSON:

{
  "Query": "artificial intelligence"
}

This will search for papers related to "artificial intelligence". The output will be a JSON array of objects, each representing a paper. Example output:

[
  {
    "arxiv_id": "2604.06123",
    "abstract_url": "https://arxiv.org/abs/2604.06123",
    "pdf_url": "https://arxiv.org/pdf/2604.06123",
    "title": "A Large-Scale Empirical Comparison of Meta-Learners and Causal Forests for Heterogeneous Treatment Effect Estimation in Marketing Uplift Modeling",
    "authors": [
      "Aman Singh"
    ],
    "abstract": "Estimating Conditional Average Treatment Effects (CATE) at the individual level is central to precision marketing, yet systematic benchmarking of uplift modeling methods at industrial scale remains limited. We present UpliftBench, an empirical evaluation of four CATE estimators: S-Learner, T-Learner, X-Learner (all with LightGBM base learners), and Causal Forest (EconML), applied to the Criteo Uplift v2.1 dataset comprising 13.98 million customer records. The near-random treatment assignment (propensity AUC = 0.509) provides strong internal validity for causal estimation. Evaluated via Qini coefficient and cumulative gain curves, the S-Learner achieves the highest Qini score of 0.376, with the top 20% of customers ranked by predicted CATE capturing 77.7% of all incremental conversions, a 3.9x improvement over random targeting. SHAP analysis identifies f8 as the dominant heterogeneous treatment effect (HTE) driver among the 12 anonymized covariates. Causal Forest uncertainty quantification reveals that 1.9% of customers are confident persuadables (lower 95% CI > 0) and 0.1% are confident sleeping",
    "primary_category": "stat.CO",
    "all_categories": [
      "stat.CO",
      "cs.LG",
      "econ.EM",
      "stat.ME"
    ],
    "actor_id": "nzczPpnpwdctoDoPa",
    "run_id": "Vuycmdfs1pUgS6tsg"
  },
  {
    "arxiv_id": "2604.01708",
    "abstract_url": "https://arxiv.org/abs/2604.01708",
    "pdf_url": "https://arxiv.org/pdf/2604.01708",
    "title": "OpenGo: An OpenClaw-Based Robotic",
    "authors": [
      "Hanbing Li",
      "Xuewei Cao",
      "Zhiwen Zeng",
      "Yuhan Wu",
      "Yanyong Zhang",
      "Yan Xia"
    ],
    "abstract": "Adaptation to complex tasks and multiple scenarios remains a significant challenge for a single robot agent. The ability to acquire organize, and switch between a wide range of skills in real time, particularly in dynamic environments, has become a fundamental requirement for embodied intelligence. We introduce OpenGo, an OpenClaw-powered embodied robotic",
    "primary_category": "cs.RO",
    "all_categories": [
      "cs.RO",
      "cs.AI"
    ],
    "actor_id": "nzczPpnpwdctoDoPa",
    "run_id": "Vuycmdfs1pUgS6tsg"
  },
  {
    "arxiv_id": "2603.29271",
    "abstract_url": "https://arxiv.org/abs/2603.29271",
    "pdf_url": "https://arxiv.org/pdf/2603.29271",
    "title": "ConInfer: Context-Aware Inference for Training-Free Open-Vocabulary Remote Sensing Segmentation",
    "authors": [
      "Wenyang Chen",
      "Zhanxuan Hu",
      "Yaping Zhang",
      "Hailong Ning",
      "Yonghang Tai"
    ],
    "abstract": "Training-free open-vocabulary remote sensing segmentation (OVRSS), empowered by vision-language models, has emerged as a promising paradigm for achieving category-agnostic semantic understanding in remote sensing imagery. Existing approaches mainly focus on enhancing feature representations or mitigating modality discrepancies to improve patch-level prediction accuracy. However, such independent prediction schemes are fundamentally misaligned with the intrinsic characteristics of remote sensing data. In real-world applications, remote sensing scenes are typically large-scale and exhibit strong spatial as well as semantic correlations, making isolated patch-wise predictions insufficient for accurate segmentation. To address this limitation, we propose ConInfer, a context-aware inference framework for OVRSS that performs joint prediction across multiple spatial units while explicitly modeling their inter-unit semantic dependencies. By incorporating global contextual cues, our method significantly enhances segmentation consistency, robustness, and generalization in complex remote sensing environments. Extensive experiments on multiple benchmark datasets demonstrate that our approach consistently surpasses state-of-the-art per-pixel VLM-based baselines such as SegEarth-OV, achieving average improvements of 2.80% and 6.13% on open-vocabulary semantic segmentation and object extraction tasks, respectively. The implementation code is available at: https://github.com/",
    "primary_category": "cs.CV",
    "all_categories": [
      "cs.CV"
    ],
    "actor_id": "nzczPpnpwdctoDoPa",
    "run_id": "Vuycmdfs1pUgS6tsg"
  }
]

Use Cases

Academic Research: Quickly gather papers on emerging topics like AI or quantum computing for literature reviews.
Market Analysis: Monitor trends in fields such as machine learning or data science to inform business strategies.
Competitive Intelligence: Track publications from specific authors or institutions for industry insights.
Content Aggregation: Build datasets of abstracts and metadata for blogs, newsletters, or educational platforms.
Trend Monitoring: Identify popular categories and keywords in scientific discourse for forecasting.
Business Automation: Automate data collection for reports on technological advancements in sectors like robotics or statistics.

Installation and Usage

Search for "Arxiv Keyword Spider" in the Apify Store.
Click "Try for free" or "Run".
Configure input parameters (e.g., set your query keyword).
Click "Start" to begin extraction.
Monitor progress in the log.
Export results in your preferred format (JSON, CSV, Excel).

Output Format

The output is a JSON array of objects, each containing metadata for a single arXiv paper. Key fields include:

arxiv_id: Unique identifier for the paper.
abstract_url and pdf_url: Direct links to the abstract and PDF.
title: Full title of the paper.
authors: Array of author names.
abstract: Summary text of the paper.
primary_category and all_categories: ArXiv classification codes.
actor_id and run_id: Identifiers for the Apify run.

This structured format ensures easy parsing and integration.

Error Handling

The Actor includes robust error handling for common issues like network timeouts, invalid queries, or arXiv site changes. If errors occur, check the run logs for details. For persistent problems, retry with adjusted parameters or contact support.

Rate Limiting and Best Practices

ArXiv may impose rate limits; the Actor respects these to avoid bans. Best practices include using specific queries to limit results, running during off-peak hours, and exporting data incrementally. Avoid overloading with broad keywords.

Limitations and Considerations

Results are based on arXiv's search capabilities and may not include all papers if queries are too vague. Abstracts can be truncated in output. Ensure compliance with arXiv's terms of use. For large datasets, consider pagination to manage memory.

Support

For custom/simplified outputs or bug reports, please contact:

Email: support@getdataforme.com
Subject line: "custom support"
Contact form: https://getdataforme.com/contact/

We're here to help you get the most out of this Actor!

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

arXiv Paper Scraper

plantane/arxiv-scraper

Scrape research papers from arXiv by search query or category. Get titles, abstracts, authors, categories, and PDF links via the public arXiv API.

Daniel

arXiv Paper Scraper — Search Academic Papers & Abstracts

puskin/arxiv-scraper

Search and retrieve academic papers from arXiv by keyword, author, or category. Extracts titles, authors, abstracts, and download links via the free arXiv API — no authentication needed.

Giovanni Bucci

arXiv Research Trend Scraper

techionik9993/arxiv-research-trend-monitor

Scrape arXiv papers by keyword or category and return research titles, abstracts, authors, dates, links, and trend-ready topic signals.

Techionik

arXiv Paper Scraper

lulzasaur/arxiv-scraper

Search and scrape arXiv academic papers. Get titles, authors, abstracts, categories, PDF links, DOIs. Search by keyword, browse recent papers by category, or fetch by arXiv ID.

lulz bot

arXiv Papers Scraper

resounding_diplomacy/arxiv-papers-scraper

Scrape academic papers from arXiv by category, keyword, or author. Extract titles, authors, abstracts, PDF URLs, DOIs, categories, and more. Perfect for AI/ML research datasets.

alars num

ArXiv Academic Paper Scraper

fortuitous_pirate/arxiv-scraper

Scrape academic papers from ArXiv. Extract titles, authors, abstracts, categories, and PDF links. Essential for research and literature reviews.

Fortuitous Pirate

ArXiv Papers Scraper

leftwinglautus/arxiv-papers-scraper

Search and scrape academic papers from the arXiv API by keyword, category, or author.

Moeeze Hassan

arXiv Research Paper Scraper

techionik9993/arxiv-research-paper-scraper

Scrape arXiv papers by keyword or category and return research titles, abstracts, authors, dates, links, and topic signals.

Techionik

arXiv Paper Scraper

skystone_labs/arxiv-scraper

Extract research papers from arXiv using the official API. Get titles, authors, abstracts, PDF URLs, categories, and more. Perfect for research datasets and literature reviews.