Hugging Face Papers Scraper avatar

Hugging Face Papers Scraper

Pricing

Pay per event

Go to Apify Store
Hugging Face Papers Scraper

Hugging Face Papers Scraper

Extract ML research papers from Hugging Face. Get titles, authors, abstracts, AI summaries, GitHub repos, star counts, upvotes, and keywords. Search by topic or get daily trending papers. Pure API — fast and cheap.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

What does Hugging Face Papers Scraper do?

Hugging Face Papers Scraper extracts machine learning research papers from Hugging Face. Get titles, authors, abstracts, AI-generated summaries, GitHub repositories, star counts, upvotes, and keywords — all via a clean API, no browser needed.

Two scraping modes: search by keyword to find papers on any ML topic, or daily trending to get the hottest papers the community is upvoting right now. Pure HTTP — fast, cheap, and reliable.

This is the only Apify actor for Hugging Face Papers. Papers With Code (paperswithcode.com) now redirects to Hugging Face, making this the single source for ML research paper data.

Who is Hugging Face Papers Scraper for?

  • 🔬 ML researchers and PhD students — Track state-of-the-art progress in your subfield. Monitor new papers on transformers, reinforcement learning, or computer vision. Get GitHub repos automatically so you can reproduce results.

  • 📊 AI startup founders and VCs — Monitor research trends to spot emerging techniques. Track which labs are publishing on what topics. Use upvote counts as a proxy for community interest.

  • 🏗️ MLOps and AI engineers — Find papers with open-source implementations (GitHub links + star counts). Build automated pipelines to discover new models and techniques relevant to your stack.

  • 📰 AI newsletter writers and content creators — Get daily trending papers with AI summaries already written. Build automated newsletters tracking the hottest ML research.

Why use Hugging Face Papers Scraper?

  • 📡 Pure API-based — No browser needed. Fast, lightweight (256 MB), and cost-effective
  • 🤖 AI summaries included — Hugging Face provides AI-generated summaries and keywords for each paper
  • 🔗 GitHub links and star counts — Know immediately which papers have code implementations
  • 📅 Daily trending mode — Get the papers the ML community is upvoting right now
  • 🔍 Search mode — Find papers on any ML topic by keyword
  • 🔄 Apify platform — API access, scheduling, webhooks, export to JSON/CSV/Excel, integrate with 5,000+ apps
  • 💰 Pay per result — Only pay for the papers you extract, no flat fees

What data can you extract?

FieldDescriptionExample
📄 paperIdArXiv paper identifier2604.15145
📝 titleFull paper titleAttention Is All You Need
👥 authorsList of author names["Ashish Vaswani", "Noam Shazeer"]
📖 abstractFull paper abstractWe propose a new simple network...
📅 publishedAtPublication date (ISO 8601)2026-04-16T15:19:58.000Z
👍 upvotesCommunity upvote count81
💬 numCommentsNumber of discussion comments12
🔗 paperUrlHugging Face paper page URLhttps://huggingface.co/papers/1706.03762
📚 arxivUrlArXiv abstract URLhttps://arxiv.org/abs/1706.03762
🐙 githubUrlGitHub repository URLhttps://github.com/tensorflow/tensor2tensor
githubStarsGitHub repository star count15432
🤖 aiSummaryAI-generated paper summaryThe Transformer architecture...
🏷️ aiKeywordsAI-extracted keywords["attention mechanism", "Transformer"]
🖼️ thumbnailUrlPaper thumbnail image URLhttps://cdn-thumbnails.huggingface.co/...
🕐 scrapedAtTimestamp of extraction2026-04-19T08:15:04.254Z

How much does it cost to scrape Hugging Face papers?

Hugging Face Papers Scraper uses pay-per-event pricing:

EventCost
🚀 Run started$0.001 per run
📄 Paper extracted$0.002 per paper

Example costs:

  • 20 trending papers (1 day) → $0.001 + 20 × $0.002 = $0.041
  • 50 papers from search → $0.001 + 50 × $0.002 = $0.101
  • 200 papers (multi-query) → $0.001 + 200 × $0.002 = $0.401

With the free Apify plan ($5/month credits), you can extract approximately 2,400 papers per month.

How to scrape Hugging Face papers

  1. Go to Hugging Face Papers Scraper on Apify Store
  2. Click Try for free to open the actor in Apify Console
  3. Choose your scraping mode:
    • Search by keyword — Enter keywords like "transformer", "reinforcement learning"
    • Daily trending — Enter dates to get that day's trending papers
  4. Set the maximum number of papers per query
  5. Click Start and wait for results
  6. Download your data as JSON, CSV, or Excel

Search mode example input:

{
"mode": "search",
"searchQueries": ["transformer", "large language model"],
"maxPapersPerQuery": 50,
"includeDetails": true
}

Daily trending example input:

{
"mode": "daily",
"dates": ["2026-04-17", "2026-04-18"],
"maxPapersPerQuery": 30
}

Input parameters

ParameterTypeDefaultDescription
modestringsearchScraping mode: search (keyword search) or daily (trending papers)
searchQueriesstring[]["transformer"]Keywords to search (search mode only)
datesstring[][] (today)Dates in YYYY-MM-DD format (daily mode only)
maxPapersPerQueryinteger50Maximum papers per keyword or date (1-200)
includeDetailsbooleantrueFetch full paper details (AI summary, GitHub, keywords)
maxRequestRetriesinteger3Retry attempts for failed requests

Output example

{
"paperId": "2604.14268",
"title": "HY-World 2.0: A Multi-Modal World Model",
"authors": ["Team HY-World", "Chenjie Cao", "Xuhui Zuo"],
"abstract": "We present HY-World 2.0, a multi-modal world model...",
"publishedAt": "2026-04-15T00:00:00.000Z",
"upvotes": 81,
"numComments": 5,
"paperUrl": "https://huggingface.co/papers/2604.14268",
"arxivUrl": "https://arxiv.org/abs/2604.14268",
"githubUrl": "https://github.com/Tencent-Hunyuan/HY-World-2.0",
"githubStars": 1174,
"aiSummary": "HY-World 2.0 presents a multi-modal world model...",
"aiKeywords": ["multi-modal world model", "3D Gaussian Splatting"],
"thumbnailUrl": "https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2604.14268.png",
"scrapedAt": "2026-04-19T08:15:04.254Z"
}

Tips for best results

  • 🎯 Start small — Use 5-10 papers per query for your first run. Scale up once you're happy with the output
  • 📅 Daily mode for monitoring — Schedule daily runs to track trending ML papers automatically
  • 🔍 Specific keywords — "vision transformer" gives more relevant results than just "AI"
  • Skip details for speed — Set includeDetails: false for faster runs when you only need titles and abstracts
  • 💾 Multiple queries — Search for multiple topics in one run to save on start event costs
  • 📆 Date ranges — In daily mode, pass multiple dates to get trending papers across a date range

Integrations

  • 📊 Hugging Face Papers → Google Sheets — Track trending papers daily in a spreadsheet. Schedule the actor to run every morning and push results to Sheets via Apify integration
  • 🔔 Hugging Face Papers → Slack/Discord — Get notified when papers matching your keywords appear. Set up a webhook to post new high-upvote papers to your team channel
  • 📧 Hugging Face Papers → Email newsletter — Build an automated weekly ML research digest by scheduling the actor and connecting to Mailchimp via Make/Zapier
  • 🗃️ Hugging Face Papers → Airtable/Notion — Build a searchable research paper database that updates automatically
  • ⚙️ Webhooks — Trigger downstream processing when new papers are found. Parse AI summaries to categorize papers automatically

Using the Apify API

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('automation-lab/huggingface-papers-scraper').call({
mode: 'search',
searchQueries: ['transformer'],
maxPapersPerQuery: 50,
includeDetails: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("automation-lab/huggingface-papers-scraper").call(run_input={
"mode": "search",
"searchQueries": ["transformer"],
"maxPapersPerQuery": 50,
"includeDetails": True,
})
items = client.dataset(run["defaultDatasetId"]).list_items().items
print(items)

cURL

curl "https://api.apify.com/v2/acts/automation-lab~huggingface-papers-scraper/runs" \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-d '{
"mode": "search",
"searchQueries": ["transformer"],
"maxPapersPerQuery": 50,
"includeDetails": true
}'

Use with AI agents via MCP

Hugging Face Papers Scraper is available as a tool for AI assistants that support the Model Context Protocol (MCP).

Add the Apify MCP server to your AI client — this gives you access to all Apify actors, including this one:

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/huggingface-papers-scraper"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
"mcpServers": {
"apify": {
"type": "http",
"url": "https://mcp.apify.com?tools=automation-lab/huggingface-papers-scraper"
}
}
}

Example prompts

  • "Find the 20 most recent papers about vision transformers on Hugging Face"
  • "Get today's trending ML papers and summarize the top 5 by upvotes"
  • "Search for papers about reinforcement learning from human feedback and list their GitHub repos"

Hugging Face Papers Scraper accesses publicly available research paper metadata through Hugging Face's public API. All extracted data (paper titles, abstracts, author names) is publicly available information originally published on arXiv and other open-access repositories.

We follow ethical scraping practices — no authentication bypass, no rate limit abuse, and full compliance with robots.txt. The actor only reads publicly accessible API endpoints.

For GDPR compliance, the actor only extracts public research metadata (author names, paper titles, abstracts). No private user data is collected. If you have concerns about specific data, consult your legal team.

FAQ

How fast is Hugging Face Papers Scraper? Very fast. Since it uses Hugging Face's API directly (no browser), a typical run of 50 papers completes in under 30 seconds.

How much does it cost to monitor daily papers? A daily run fetching ~30 trending papers costs about $0.06. Monthly monitoring (30 days) costs approximately $1.80.

Why do some papers have empty GitHub URLs? Not all papers have linked GitHub repositories. The githubUrl field is only populated when the paper's authors or community members have linked a code repository on Hugging Face.

Why are search results limited? Hugging Face's search API returns a maximum of ~100 results per query. For broader coverage, use multiple specific search queries instead of one generic term.

Can I get papers from Papers With Code? Yes! Papers With Code (paperswithcode.com) now redirects to Hugging Face. All data previously available on Papers With Code is now accessible through this actor.

What happened to paperswithcode.com? Papers With Code was acquired by Meta and has been fully integrated into Hugging Face. All paper URLs now redirect to huggingface.co/papers.

Other research and data scrapers