# HuggingFace Papers Scraper (`dadhalfdev/huggingface-papers-scraper`) Actor

Scrape trending HuggingFace Papers by day, week, or month. Get titles, dates, submitters, organizations, upvotes, abstracts, summaries, PDFs, project links, and agent-ready commands for AI agents, RAG pipelines, research monitoring, and automation.

- **URL**: https://apify.com/dadhalfdev/huggingface-papers-scraper.md
- **Developed by:** [Marco Rodrigues](https://apify.com/dadhalfdev) (community)
- **Categories:** News, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

$20.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🤗 HuggingFace Papers Scraper

Track the latest AI research from [HuggingFace Papers](https://huggingface.co/papers) and turn trending papers into clean, structured data for agents, RAG systems, dashboards, and research workflows.

Choose a period (`Daily`, `Weekly`, or `Monthly`) plus an end date, and scrape up to **100 papers** with titles, dates, submitter details, organizations, upvotes, abstracts, summaries, PDF links, project pages, and the HuggingFace CLI command agents can use to read the paper. The actor starts from the end date and paginates to older papers.

![HuggingFace Papers](https://i.ibb.co/pBB8jM3H/Screenshot-From-2026-06-01-17-29-27.png)

### 💡 Perfect For

- **🤖 AI Agents:** Give agents fresh, structured research context with direct `pdf_url`, `project_url`, and `agent_command` fields.
- **📚 RAG Pipelines:** Index abstracts, summaries, metadata, and source URLs so assistants can answer questions about recent AI papers with citations.
- **🔬 Research Monitoring:** Track emerging models, benchmarks, datasets, and methods across daily, weekly, or monthly HuggingFace trends.
- **📈 Trend Analysis:** Compare upvotes, organizations, publication dates, and topics to spot fast-moving areas in AI.
- **⚙️ Automation Workflows:** Feed new paper metadata into Slack bots, Discord alerts, newsletters, spreadsheets, or internal agent workflows.

### ✨ Why This Actor Matters

AI agents are only as useful as the context they can reliably access. HuggingFace Papers is one of the best places to discover what the AI community is reading right now, but agents and pipelines need structured fields, stable links, and normalized dates instead of raw HTML.

This actor turns that fast-moving research feed into data that is easy to search, rank, summarize, embed, and route into automated systems.

### 📦 What's Inside The Data?

For every paper, the actor returns:

- **Core metadata:** `url`, `title`, `published_date`, `submitted_date`
- **Submitter details:** `submitted_by`, `submitted_by_url`
- **Organization details:** `organization`, `organization_url`
- **Engagement:** `upvotes`
- **Research content:** `abstract`, `summary`
- **Useful links:** `pdf_url`, `project_url`
- **Agent-ready command:** `agent_command`, for example `hf papers read 2605.29486`

### 🚀 Quick Start

1. Open the actor on Apify or run it locally.
2. Choose the `period`: `Daily`, `Weekly`, or `Monthly`.
3. Choose `end_date`. If omitted or set in the future, the actor uses the current date.
4. Set `max_papers` to the number of papers you want, up to **100**.
5. Start the actor and export the results as JSON, CSV, Excel, or through the Apify API.

### 🧑‍💻 Tech Details

**Input Example:**

```json
{
  "period": "Daily",
  "end_date": "2026-06-01",
  "max_papers": 100
}
````

The actor builds the HuggingFace Papers URL from `period` and `end_date`, then paginates to older papers:

- `Daily` + `2026-06-01` -> `https://huggingface.co/papers/date/2026-06-01`
- `Weekly` + `2026-06-01` -> `https://huggingface.co/papers/week/2026-W23`
- `Monthly` + `2026-06-01` -> `https://huggingface.co/papers/month/2026-06`

**Output Example:**

```json
{
  "url": "https://huggingface.co/papers/2605.29486",
  "title": "PhoneWorld: Scaling Phone-Use Agent Environments",
  "published_date": "2026-05-28T00:00:00",
  "submitted_date": "2026-05-29T00:00:00",
  "submitted_by": "Zhengyang Tang",
  "submitted_by_url": "https://huggingface.co/tangzhy",
  "organization": "shanghai ailab",
  "organization_url": "https://huggingface.co/ShanghaiAiLab",
  "upvotes": 2,
  "abstract": "PhoneWorld is a pipeline that transforms real GUI trajectories and screenshots into controllable mobile environments, executable tasks, and automated verifiers, enabling scalable creation of phone-use benchmarks.",
  "summary": "A central bottleneck for phone-use agents is that controllable, reproducible environments covering real mobile behavior are hard to build at scale...",
  "pdf_url": "https://arxiv.org/pdf/2605.29486",
  "project_url": null,
  "agent_command": "hf papers read 2605.29486"
}
```

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `period` | string | No | HuggingFace Papers period to scrape: `Daily`, `Weekly`, or `Monthly`. Default: `Daily`. |
| `end_date` | string | No | Latest date to scrape from. Format: `YYYY-MM-DD`. The actor paginates to older papers from this date. If omitted or in the future, the actor uses the current date. |
| `max_papers` | integer | No | Number of papers to collect from the listing. Min 10, max 100, default 100. |

# Actor input Schema

## `max_papers` (type: `integer`):

The maximum number of HuggingFace papers to scrape from the selected period.

## `period` (type: `string`):

The HuggingFace Papers time window to scrape from the selected end date.

## `end_date` (type: `string`):

The latest date to scrape from. The Actor paginates to older papers from this date. If empty or in the future, the Actor uses the current date.

## Actor input object example

```json
{
  "max_papers": 100,
  "period": "Daily"
}
```

# Actor output Schema

## `overview` (type: `string`):

Table view of scraped Hugging Face papers using the dataset 'overview' view.

## `results` (type: `string`):

All scraped Hugging Face paper records from the default dataset without view transformation.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("dadhalfdev/huggingface-papers-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("dadhalfdev/huggingface-papers-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call dadhalfdev/huggingface-papers-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=dadhalfdev/huggingface-papers-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "HuggingFace Papers Scraper",
        "description": "Scrape trending HuggingFace Papers by day, week, or month. Get titles, dates, submitters, organizations, upvotes, abstracts, summaries, PDFs, project links, and agent-ready commands for AI agents, RAG pipelines, research monitoring, and automation.",
        "version": "0.1",
        "x-build-id": "c4TNJjDEgu1vEpYOA"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/dadhalfdev~huggingface-papers-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-dadhalfdev-huggingface-papers-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/dadhalfdev~huggingface-papers-scraper/runs": {
            "post": {
                "operationId": "runs-sync-dadhalfdev-huggingface-papers-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/dadhalfdev~huggingface-papers-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-dadhalfdev-huggingface-papers-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "max_papers": {
                        "title": "Maximum number of papers",
                        "minimum": 10,
                        "maximum": 100,
                        "type": "integer",
                        "description": "The maximum number of HuggingFace papers to scrape from the selected period.",
                        "default": 100
                    },
                    "period": {
                        "title": "Period",
                        "enum": [
                            "Daily",
                            "Weekly",
                            "Monthly"
                        ],
                        "type": "string",
                        "description": "The HuggingFace Papers time window to scrape from the selected end date.",
                        "default": "Daily"
                    },
                    "end_date": {
                        "title": "End Date",
                        "pattern": "^\\d{4}-\\d{2}-\\d{2}$",
                        "type": "string",
                        "description": "The latest date to scrape from. The Actor paginates to older papers from this date. If empty or in the future, the Actor uses the current date."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
