# AI Papers Tracker (arXiv + PWC) (`ianymu/ai-papers-tracker`) Actor

Track new AI / agent / LLM research papers from arXiv + Papers With Code, filterable by keywords. Ranked by trending score (recency + match + category + code attached). Daily refresh for researchers and operators.

- **URL**: https://apify.com/ianymu/ai-papers-tracker.md
- **Developed by:** [Yanlong Mu](https://apify.com/ianymu) (community)
- **Categories:** AI, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## AI Papers Tracker (arXiv + Papers With Code)

> **Track new AI / agent / LLM research papers from arXiv + Papers With Code, filtered by your own keyword list. Returns a ranked, daily-refreshable dataset of the most relevant recent work.**

### What does AI Papers Tracker do?

The AI / agent / LLM research output now runs at hundreds of new arXiv preprints per week, and **filtering signal from noise is impossible by hand**. This Actor tracks two of the highest-signal feeds — [arXiv](https://arxiv.org/) (cs.AI / cs.CL / cs.LG / cs.MA / cs.SE) and [Papers With Code](https://paperswithcode.com/) (papers that ship runnable code) — filters by your own keyword list (e.g. `llm agent`, `agent verification`, `code generation evaluation`), and returns a deduplicated, scored, ranked dataset.

You get the latest 30-day-window papers ranked by **trending score**: recency + how many of your keywords matched + relevant arXiv category + whether code is attached. Pair it with the Apify scheduler to run it daily and you have a personal arXiv digest that does not miss anything in your niche. Apify gives you free scheduling, REST API, webhooks, and integrations (Slack / Make / Zapier) on top.

### Why use AI Papers Tracker?

- **AI researchers**: a personalized "what's new in my subfield this week" digest, no manual scraping
- **Journalists & analysts**: discover papers worth covering before they hit Twitter
- **Founders & operators**: track benchmarks (agent coding, verification, eval) so your product does not lag the frontier
- **Investors**: map which labs / authors keep publishing in a thesis area
- **Course / curriculum builders**: weekly refresh of recommended reading

### How to use AI Papers Tracker

1. Open the **Input** tab
2. Edit the **Keywords** list — phrases that should appear in titles / abstracts (e.g. `RAG evaluation`, `tool-use agent`, `code review LLM`)
3. Set **Days back** (default 30) and **Max results** (default 30)
4. Click **Start**
5. Download the **Dataset** or the human-readable `ai-papers-report.md` from the **Storage** tab
6. Schedule it daily / weekly in the **Schedules** tab for a continuous feed

### Input

- **`keywords`** — array of phrases. Each phrase is searched independently across arXiv title / abstract + PWC search. Results are deduped by arXiv ID.
- **`daysBack`** — only include papers submitted in the last N days (1-365)
- **`maxResults`** — cap the dataset to the top N scored papers (1-200)

### Output

Each row in the dataset:

```json
{
  "title": "MAST: Multi-Agent System Failure Modes",
  "authors": ["Cemri et al."],
  "abstract": "First 400 chars of the abstract…",
  "arxivId": "2503.13657",
  "url": "https://arxiv.org/abs/2503.13657",
  "submittedAt": "2026-05-15",
  "categories": ["cs.AI", "cs.MA"],
  "source": "arxiv",
  "hasCode": true,
  "matchedKeywords": ["agent verification", "llm agent"],
  "trendingScore": 78
}
````

You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

### Data table

| Field | Type | Description |
|---|---|---|
| `title` | string | Paper title |
| `authors` | string\[] | Authors list (full for arXiv, first few for PWC) |
| `abstract` | string | Truncated to 400 chars |
| `arxivId` | string | null | arXiv identifier (e.g. `2503.13657`) when available |
| `url` | string | Direct link to the paper page |
| `submittedAt` | date | YYYY-MM-DD submission date |
| `categories` | string\[] | arXiv categories (e.g. `cs.AI`) |
| `source` | string | `arxiv` or `paperswithcode` |
| `hasCode` | boolean | Paper appears on Papers With Code (i.e. ships code) |
| `matchedKeywords` | string\[] | Which of your keywords matched this paper |
| `trendingScore` | number | 0-100 composite ranking score |

### Trending score (max 100)

| Dimension | Max | What it measures |
|---|---|---|
| Recency | 50 | Closer to today within the window = higher |
| Keyword relevance | 32 | Number of distinct keywords matched (8 each) |
| Category fit | 10 | Counts cs.AI / cs.CL / cs.LG / cs.MA / cs.SE |
| Code attached | 8 | Paper appears on Papers With Code |

### Pricing / Cost estimation

Pay-per-result: roughly **$0.05 per 10 papers** scored. A daily refresh tracking 4 keywords over 30 days typically returns 30-60 results = **under $1/day**. The Apify free trial covers your first run end-to-end so you can validate output before subscribing.

### Tips / Advanced options

- **Cluster keywords semantically**: do not add both `LLM` and `large language model` — they will overlap ~80% and double your runtime.
- **arXiv rate limit**: this Actor sleeps 3.1 seconds between arXiv calls (arXiv's published rate-limit courtesy delay). With 8+ keywords expect a ~30s run.
- **Schedule weekly + diff**: for a real "what's new this week" digest, schedule weekly and diff the dataset against last week's run.
- **Combine with `mcp-server-catalog`**: this Actor catalogues the *research*; that one catalogues the *tools*. Together = full landscape view.

### FAQ, disclaimers, and support

#### Why arXiv + Papers With Code specifically?

arXiv = full firehose of preprints (titles + abstracts indexed). Papers With Code = the subset that ships code (a strong quality + reproducibility signal). Together they cover ~99% of what an AI engineer or researcher actually needs to read.

#### Is the data current?

Both APIs return live data at run time. Schedule daily for a true real-time feed.

#### Why is paper X not in the results?

Either (a) it was submitted outside the `daysBack` window, (b) none of your keywords matched the title or abstract, or (c) it is published in a non-cs.\* arXiv category (e.g. `stat.ML` — open an issue to request).

#### Legality

The arXiv API and Papers With Code REST API are both **public, documented APIs** with no rate limit beyond the arXiv 3-second courtesy delay (which this Actor respects). No login, no scraping of restricted content.

#### Support

Open an issue on the **Issues** tab of this Actor's Apify Console page, or via the author's GitHub.

Built by **Ian Mu** ([github.com/ianymu](https://github.com/ianymu)) — author of [`verify-before-stop`](https://github.com/ianymu/claude-verify-before-stop), a Claude Code harness that gates session-stop on real, verified output.

# Actor input Schema

## `keywords` (type: `array`):

Phrases to search in arXiv abstracts/titles and Papers With Code. Each becomes its own search; results are deduped by arXiv ID.

## `daysBack` (type: `integer`):

Only include papers submitted in the last N days

## `maxResults` (type: `integer`):

Cap the dataset to this many top-scored papers

## Actor input object example

```json
{
  "keywords": [
    "llm agent",
    "agent verification",
    "code generation evaluation",
    "agent coding benchmark"
  ],
  "daysBack": 30,
  "maxResults": 30
}
```

# Actor output Schema

## `results` (type: `string`):

No description

## `report` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("ianymu/ai-papers-tracker").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("ianymu/ai-papers-tracker").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call ianymu/ai-papers-tracker --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=ianymu/ai-papers-tracker",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "AI Papers Tracker (arXiv + PWC)",
        "description": "Track new AI / agent / LLM research papers from arXiv + Papers With Code, filterable by keywords. Ranked by trending score (recency + match + category + code attached). Daily refresh for researchers and operators.",
        "version": "0.0",
        "x-build-id": "MTLPcRzrrZ8VuDOYl"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/ianymu~ai-papers-tracker/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-ianymu-ai-papers-tracker",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/ianymu~ai-papers-tracker/runs": {
            "post": {
                "operationId": "runs-sync-ianymu-ai-papers-tracker",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/ianymu~ai-papers-tracker/run-sync": {
            "post": {
                "operationId": "run-sync-ianymu-ai-papers-tracker",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "keywords": {
                        "title": "Keywords to track",
                        "type": "array",
                        "description": "Phrases to search in arXiv abstracts/titles and Papers With Code. Each becomes its own search; results are deduped by arXiv ID.",
                        "default": [
                            "llm agent",
                            "agent verification",
                            "code generation evaluation",
                            "agent coding benchmark"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "daysBack": {
                        "title": "Days back",
                        "minimum": 1,
                        "maximum": 365,
                        "type": "integer",
                        "description": "Only include papers submitted in the last N days",
                        "default": 30
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Cap the dataset to this many top-scored papers",
                        "default": 30
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
