# arXiv Research Trend Monitor (`techionik9993/arxiv-research-trend-monitor`) Actor

Find arXiv papers for AI research monitoring, academic trend tracking, technical due diligence, startup research, and market intelligence.

This Actor uses the public arXiv API. It is fast, low-cost, and does not need a proxy by default.

- **URL**: https://apify.com/techionik9993/arxiv-research-trend-monitor.md
- **Developed by:** [Techionik](https://apify.com/techionik9993) (community)
- **Categories:** AI, Developer tools, News
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.20 / 1,000 arxiv paper rows

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## arXiv Research Trend Monitor

Find arXiv papers for AI research monitoring, academic trend tracking, technical due diligence, startup research, and market intelligence.

This Actor uses the public arXiv API. It is fast, low-cost, and does not need a proxy by default.

### What It Returns

- Paper title, arXiv URL, and PDF URL
- Authors and categories
- Published and updated timestamps
- Source query or source category
- Relevance score and matched source terms
- Optional abstract text
- Monitoring mode for newly discovered papers

### Common Uses

- Track new AI/ML papers by topic
- Monitor research categories such as `cs.AI`, `cs.LG`, `cs.CL`, and `cs.CV`
- Build technical due diligence datasets
- Watch emerging academic trends
- Research startup/market themes
- Create alerts for new papers around a keyword

### Input Tips

Use focused research phrases:

- `large language model agents`
- `retrieval augmented generation`
- `computer vision transformer`
- `diffusion model`
- `robot learning`
- `graph neural network`

Use `maxResultsPerSource` when you provide several queries/categories and want balanced output.

### Result Limits

`maxResults` is a maximum cap, not a guarantee. The Actor only writes papers that pass your relevance filters.

The minimum `maxResults` is 50 so runs are commercially practical and users are charged for meaningful output.

### Monitoring

Enable `monitorChanges` to compare the current run against the previous snapshot in the Actor key-value store.

When `onlyChanges` is enabled, the dataset contains only newly detected papers. When it is disabled, the dataset contains the current full result set while the change snapshot is still stored for automation.

### Proxy Use

No proxy is needed by default. This Actor uses the public arXiv API and does not run a browser.

### Output

Each dataset row represents one arXiv paper. The primary output event is one dataset row.

### Notes

This Actor is not affiliated with arXiv or Cornell University. It reads public arXiv API results and stores structured rows in your Apify dataset.

# Actor input Schema

## `searchQueries` (type: `array`):

Research search terms. Use AI, ML, robotics, biotech, finance, security, or technical market phrases.
## `categories` (type: `array`):

Optional arXiv categories such as cs.AI, cs.LG, cs.CL, cs.CV, stat.ML, q-fin, or eess.
## `sortBy` (type: `string`):

Submitted date is best for monitoring fresh papers.
## `sortOrder` (type: `string`):

Descending returns newest or most relevant results first.
## `maxResults` (type: `integer`):

Maximum paper rows to write for the whole run. The minimum is 50 for better marketplace economics.
## `maxResultsPerSource` (type: `integer`):

Optional cap per query/category so one broad source does not fill the entire run. Use 0 for no source cap.
## `pagesPerSource` (type: `integer`):

How many arXiv API pages to scan per source. Each page requests up to 100 papers.
## `includeAbstract` (type: `boolean`):

Include paper abstract/summary in each dataset row.
## `minimumRelevanceScore` (type: `integer`):

Minimum number of meaningful source words that must appear in the title, abstract, category, or author list.
## `monitorChanges` (type: `boolean`):

Compare against the previous saved snapshot and detect newly found papers.
## `onlyChanges` (type: `boolean`):

When monitoring is enabled, write only newly detected papers to the dataset.

## Actor input object example

```json
{
  "searchQueries": [
    "diffusion model",
    "robot learning",
    "graph neural network"
  ],
  "categories": [
    "cs.AI",
    "cs.LG",
    "cs.CL"
  ],
  "sortBy": "submittedDate",
  "sortOrder": "descending",
  "maxResults": 100,
  "maxResultsPerSource": 0,
  "pagesPerSource": 3,
  "includeAbstract": true,
  "minimumRelevanceScore": 1,
  "monitorChanges": false,
  "onlyChanges": false
}
````

# Actor output Schema

## `results` (type: `string`):

Normal runs contain arXiv paper rows. When output-only-changes is enabled, rows contain newly detected papers.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQueries": [
        "large language model agents",
        "retrieval augmented generation",
        "computer vision transformer"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("techionik9993/arxiv-research-trend-monitor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "searchQueries": [
        "large language model agents",
        "retrieval augmented generation",
        "computer vision transformer",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("techionik9993/arxiv-research-trend-monitor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQueries": [
    "large language model agents",
    "retrieval augmented generation",
    "computer vision transformer"
  ]
}' |
apify call techionik9993/arxiv-research-trend-monitor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=techionik9993/arxiv-research-trend-monitor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "arXiv Research Trend Monitor",
        "description": "Find arXiv papers for AI research monitoring, academic trend tracking, technical due diligence, startup research, and market intelligence.\n\nThis Actor uses the public arXiv API. It is fast, low-cost, and does not need a proxy by default.",
        "version": "0.0",
        "x-build-id": "dp6OBCAZ8CoWC5cx1"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/techionik9993~arxiv-research-trend-monitor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-techionik9993-arxiv-research-trend-monitor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/techionik9993~arxiv-research-trend-monitor/runs": {
            "post": {
                "operationId": "runs-sync-techionik9993-arxiv-research-trend-monitor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/techionik9993~arxiv-research-trend-monitor/run-sync": {
            "post": {
                "operationId": "run-sync-techionik9993-arxiv-research-trend-monitor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQueries": {
                        "title": "Search queries",
                        "type": "array",
                        "description": "Research search terms. Use AI, ML, robotics, biotech, finance, security, or technical market phrases.",
                        "default": [
                            "large language model agents",
                            "retrieval augmented generation",
                            "computer vision transformer"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "categories": {
                        "title": "arXiv categories",
                        "type": "array",
                        "description": "Optional arXiv categories such as cs.AI, cs.LG, cs.CL, cs.CV, stat.ML, q-fin, or eess.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "sortBy": {
                        "title": "Sort by",
                        "enum": [
                            "submittedDate",
                            "lastUpdatedDate",
                            "relevance"
                        ],
                        "type": "string",
                        "description": "Submitted date is best for monitoring fresh papers.",
                        "default": "submittedDate"
                    },
                    "sortOrder": {
                        "title": "Sort order",
                        "enum": [
                            "descending",
                            "ascending"
                        ],
                        "type": "string",
                        "description": "Descending returns newest or most relevant results first.",
                        "default": "descending"
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 50,
                        "maximum": 2000,
                        "type": "integer",
                        "description": "Maximum paper rows to write for the whole run. The minimum is 50 for better marketplace economics.",
                        "default": 100
                    },
                    "maxResultsPerSource": {
                        "title": "Max results per source",
                        "minimum": 0,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Optional cap per query/category so one broad source does not fill the entire run. Use 0 for no source cap.",
                        "default": 0
                    },
                    "pagesPerSource": {
                        "title": "Pages per source",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "How many arXiv API pages to scan per source. Each page requests up to 100 papers.",
                        "default": 3
                    },
                    "includeAbstract": {
                        "title": "Include abstract",
                        "type": "boolean",
                        "description": "Include paper abstract/summary in each dataset row.",
                        "default": true
                    },
                    "minimumRelevanceScore": {
                        "title": "Minimum relevance score",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Minimum number of meaningful source words that must appear in the title, abstract, category, or author list.",
                        "default": 1
                    },
                    "monitorChanges": {
                        "title": "Monitor changes since last run",
                        "type": "boolean",
                        "description": "Compare against the previous saved snapshot and detect newly found papers.",
                        "default": false
                    },
                    "onlyChanges": {
                        "title": "Output only new papers",
                        "type": "boolean",
                        "description": "When monitoring is enabled, write only newly detected papers to the dataset.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
