# Hugging Face Papers Scraper (`parseforge/huggingface-papers-scraper`) Actor

Scrape AI and machine learning research papers from Hugging Face Papers. Get titles, abstracts, authors with affiliations, upvotes, publication dates, ArXiv IDs, and community discussion counts. Search by keyword or browse daily papers.

- **URL**: https://apify.com/parseforge/huggingface-papers-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Education, Developer tools, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $9.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://raw.githubusercontent.com/ParseForge/apify-assets/main/banner.jpg)

## 📄 Hugging Face Papers Scraper

AI research moves fast. Hugging Face Papers curates the most discussed machine learning papers every day with community upvotes, author handles, and linked code repos. This tool pulls that curated feed plus search results into a structured dataset you can feed into a newsletter, research tracker, or literature review.

> **The Hugging Face Papers Scraper collects AI/ML research papers with titles, authors, abstracts, arXiv IDs, upvotes, GitHub repos, thumbnails, and keywords. Search by topic or grab the daily trending list.**

### ✨ What Does It Do

- 📚 **Paper metadata** - titles, abstracts, arXiv IDs, publication dates, and Hugging Face URLs
- 👥 **Author details** - full author list with Hugging Face handles and verification status
- ⭐ **Community signals** - upvotes, comment counts, and thumbnails
- 💻 **Code + project links** - GitHub repository URLs and project pages when authors link them
- 🏷️ **AI keywords and summaries** - auto-generated keywords and condensed summaries where available
- 🔍 **Two modes** - search by keyword or pull today's trending feed

### 🔧 Input

- **Search Query** - keyword to match paper titles and abstracts (e.g. `transformer`, `diffusion model`, `LLM`)
- **Mode** - `search` for keyword search or `trending` for daily curated papers
- **Max Items** - free users get 10 papers, paid users up to 1,000,000

```json
{
    "searchQuery": "diffusion model",
    "mode": "search",
    "maxItems": 100
}
````

### 📊 Output

Each paper record contains 15+ fields. Download as JSON, CSV, or Excel.

| 📌 Field | 📄 Description |
|----------|---------------|
| 🆔 arxivId | arXiv paper identifier |
| 📋 title | Paper title |
| 🔗 url | Hugging Face Papers page |
| 🔗 arxivUrl | arXiv abstract page |
| 📅 publishedAt | Publication date |
| ⬆️ upvotes | Community upvote count |
| 💬 numComments | Discussion comment count |
| 👥 authors | Array of authors with names and HF handles |
| 📝 summary | Paper abstract |
| 🖼️ thumbnail | Paper preview image |
| 💻 githubRepo | Linked code repository |
| 🌐 projectPage | Linked project website |
| 🏷️ aiKeywords | Auto-generated topic keywords |

```json
{
    "arxivId": "2404.12345",
    "title": "Efficient Attention for Long-Context Language Models",
    "url": "https://huggingface.co/papers/2404.12345",
    "arxivUrl": "https://arxiv.org/abs/2404.12345",
    "publishedAt": "2026-04-09",
    "upvotes": 187,
    "numComments": 12,
    "numAuthors": 6,
    "firstAuthor": "Jane Smith",
    "authors": [
        { "name": "Jane Smith", "hfUser": "jsmith", "verified": true }
    ],
    "summary": "We introduce a novel attention mechanism...",
    "githubRepo": "https://github.com/example/long-attention",
    "projectPage": "https://example.github.io/long-attention",
    "aiKeywords": ["attention", "long-context", "efficiency"],
    "scrapedAt": "2026-04-10T12:00:00.000Z"
}
```

### 💎 Why Choose the Hugging Face Papers Scraper?

| Feature | Our Tool | Manual Browsing |
|---------|----------|-----------------|
| Daily trending feed | ✅ Yes | ✅ Yes |
| Keyword search | ✅ Yes | ⚠️ Limited UI |
| Bulk export | ✅ Up to 1M papers | ❌ One at a time |
| Author handles | ✅ Included | ⚠️ Click each profile |
| Linked code + project | ✅ Extracted | ⚠️ Scroll through page |
| Scheduled monitoring | ✅ Daily runs | ❌ Not possible |

### 📋 How to Use

1. **Sign Up** - [Create a free account w/ $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp)
2. **Configure** - pick a keyword or trending mode and set your max items
3. **Run It** - click Start and get curated AI papers in seconds

No coding, no daily manual browsing.

### 🎯 Business Use Cases

- 📬 **Research newsletters** - auto-curate a weekly digest of the hottest ML papers
- 🧠 **AI labs** - build an internal literature tracker for new diffusion, LLM, or RL work
- 🎓 **PhD students** - monitor new papers in your subfield without daily site visits
- 📊 **Trend analysis** - track which topics are gaining community upvotes over time
- 💼 **Recruiters** - spot up-and-coming researchers by watching trending author handles
- 💻 **Dev tool makers** - find papers with open code to feature in your product

### ❓ FAQ

🤖 **What is Hugging Face Papers?**
Hugging Face Papers is a curated feed of AI/ML research papers with community voting, author profiles, and links to code repos and project pages.

🔍 **What's the difference between search and trending mode?**
Trending pulls today's curated daily papers chosen by the Hugging Face team and community. Search runs a keyword query across indexed papers.

⭐ **What does "upvotes" mean?**
Upvotes are community signals from Hugging Face users indicating which papers they think are most worth reading.

💻 **Are GitHub repos always available?**
Only when the paper's authors or the community have linked them. Many papers include code, but not all.

🔁 **Can I run this daily?**
Yes. Set up a scheduled run in trending mode to keep a daily archive of the most discussed ML work.

### 🔗 Integrate Hugging Face Papers Scraper with any app

- [Make](https://docs.apify.com/platform/integrations/make) - automate paper digest generation
- [Zapier](https://docs.apify.com/platform/integrations/zapier) - push new papers to your reading list
- [Slack](https://docs.apify.com/platform/integrations/slack) - post daily AI paper summaries
- [Google Sheets](https://docs.apify.com/platform/integrations/drive) - track upvotes over time
- [Webhooks](https://docs.apify.com/platform/integrations/webhooks) - trigger workflows on completion

### 💡 Recommended Actors

Looking for more data collection tools? Check out these related actors:

| Actor | Description | Link |
|-------|-------------|------|
| Hugging Face Model Scraper | Collect AI model metadata | [Link](https://apify.com/parseforge/hugging-face-model-scraper) |
| Apple App Store Scraper | App listings and ratings | [Link](https://apify.com/parseforge/apple-app-store-iphone-scraper) |
| Stripe App Marketplace Scraper | Stripe app listings | [Link](https://apify.com/parseforge/stripe-marketplace-scraper) |
| AWS Marketplace Scraper | AWS product listings | [Link](https://apify.com/parseforge/aws-marketplace-scraper) |
| Hubspot Marketplace Scraper | Hubspot app listings | [Link](https://apify.com/parseforge/hubspot-marketplace-scraper) |

**Pro Tip:** 💡 Browse the full [ParseForge catalog](https://apify.com/parseforge) to find more data tools.

### 🆘 Need Help?

- Check the FAQ section above for common questions
- Visit the [Apify documentation](https://docs.apify.com) for platform guides
- Contact us at [Tally contact form](https://tally.so/r/BzdKgA)

### ⚠️ Disclaimer

> This Actor is an independent tool and is not affiliated with, endorsed by, or connected to Hugging Face, arXiv, or any paper author. It collects only publicly available paper metadata.

# Actor input Schema

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `searchQuery` (type: `string`):

Search papers by keyword. Example: 'transformer', 'diffusion model', 'LLM'.

## `mode` (type: `string`):

Search or trending.

## Actor input object example

```json
{
  "maxItems": 10,
  "searchQuery": "transformer",
  "mode": "search"
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "maxItems": 10,
    "searchQuery": "transformer"
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/huggingface-papers-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "maxItems": 10,
    "searchQuery": "transformer",
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/huggingface-papers-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "maxItems": 10,
  "searchQuery": "transformer"
}' |
apify call parseforge/huggingface-papers-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/huggingface-papers-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hugging Face Papers Scraper",
        "description": "Scrape AI and machine learning research papers from Hugging Face Papers. Get titles, abstracts, authors with affiliations, upvotes, publication dates, ArXiv IDs, and community discussion counts. Search by keyword or browse daily papers.",
        "version": "1.0",
        "x-build-id": "UdBsgKlyo2HFSz14H"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~huggingface-papers-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-huggingface-papers-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~huggingface-papers-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-huggingface-papers-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~huggingface-papers-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-huggingface-papers-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Search papers by keyword. Example: 'transformer', 'diffusion model', 'LLM'."
                    },
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "search",
                            "trending"
                        ],
                        "type": "string",
                        "description": "Search or trending.",
                        "default": "search"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
