Deprecated

Pricing

Pay per event

See alternative Actors

Go to Apify Store

Hugging Face Papers Scraper

Deprecated

See alternative Actors

Extract ML research papers from Hugging Face. Get titles, authors, abstracts, AI summaries, GitHub repos, star counts, upvotes, and keywords. Search by topic or get daily trending papers. Pure API — fast and cheap.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

5 days ago

Last modified

What does Hugging Face Papers Scraper do?

Hugging Face Papers Scraper extracts machine learning research papers from Hugging Face. Get titles, authors, abstracts, AI-generated summaries, GitHub repositories, star counts, upvotes, and keywords — all via a clean API, no browser needed.

Two scraping modes: search by keyword to find papers on any ML topic, or daily trending to get the hottest papers the community is upvoting right now. Pure HTTP — fast, cheap, and reliable.

This is the only Apify actor for Hugging Face Papers. Papers With Code (paperswithcode.com) now redirects to Hugging Face, making this the single source for ML research paper data.

Who is Hugging Face Papers Scraper for?

🔬 ML researchers and PhD students — Track state-of-the-art progress in your subfield. Monitor new papers on transformers, reinforcement learning, or computer vision. Get GitHub repos automatically so you can reproduce results.
📊 AI startup founders and VCs — Monitor research trends to spot emerging techniques. Track which labs are publishing on what topics. Use upvote counts as a proxy for community interest.
🏗️ MLOps and AI engineers — Find papers with open-source implementations (GitHub links + star counts). Build automated pipelines to discover new models and techniques relevant to your stack.
📰 AI newsletter writers and content creators — Get daily trending papers with AI summaries already written. Build automated newsletters tracking the hottest ML research.

Why use Hugging Face Papers Scraper?

📡 Pure API-based — No browser needed. Fast, lightweight (256 MB), and cost-effective
🤖 AI summaries included — Hugging Face provides AI-generated summaries and keywords for each paper
🔗 GitHub links and star counts — Know immediately which papers have code implementations
📅 Daily trending mode — Get the papers the ML community is upvoting right now
🔍 Search mode — Find papers on any ML topic by keyword
🔄 Apify platform — API access, scheduling, webhooks, export to JSON/CSV/Excel, integrate with 5,000+ apps
💰 Pay per result — Only pay for the papers you extract, no flat fees

What data can you extract?

Field	Description	Example
📄 `paperId`	ArXiv paper identifier	`2604.15145`
📝 `title`	Full paper title	`Attention Is All You Need`
👥 `authors`	List of author names	`["Ashish Vaswani", "Noam Shazeer"]`
📖 `abstract`	Full paper abstract	`We propose a new simple network...`
📅 `publishedAt`	Publication date (ISO 8601)	`2026-04-16T15:19:58.000Z`
👍 `upvotes`	Community upvote count	`81`
💬 `numComments`	Number of discussion comments	`12`
🔗 `paperUrl`	Hugging Face paper page URL	`https://huggingface.co/papers/1706.03762`
📚 `arxivUrl`	ArXiv abstract URL	`https://arxiv.org/abs/1706.03762`
🐙 `githubUrl`	GitHub repository URL	`https://github.com/tensorflow/tensor2tensor`
⭐ `githubStars`	GitHub repository star count	`15432`
🤖 `aiSummary`	AI-generated paper summary	`The Transformer architecture...`
🏷️ `aiKeywords`	AI-extracted keywords	`["attention mechanism", "Transformer"]`
🖼️ `thumbnailUrl`	Paper thumbnail image URL	`https://cdn-thumbnails.huggingface.co/...`
🕐 `scrapedAt`	Timestamp of extraction	`2026-04-19T08:15:04.254Z`

How much does it cost to scrape Hugging Face papers?

Hugging Face Papers Scraper uses pay-per-event pricing:

Event	Cost
🚀 Run started	$0.001 per run
📄 Paper extracted	$0.002 per paper

Example costs:

20 trending papers (1 day) → $0.001 + 20 × $0.002 = $0.041
50 papers from search → $0.001 + 50 × $0.002 = $0.101
200 papers (multi-query) → $0.001 + 200 × $0.002 = $0.401

With the free Apify plan ($5/month credits), you can extract approximately 2,400 papers per month.

How to scrape Hugging Face papers

Go to Hugging Face Papers Scraper on Apify Store
Click Try for free to open the actor in Apify Console
Choose your scraping mode:
- Search by keyword — Enter keywords like "transformer", "reinforcement learning"
- Daily trending — Enter dates to get that day's trending papers
Set the maximum number of papers per query
Click Start and wait for results
Download your data as JSON, CSV, or Excel

Search mode example input:

{
    "mode": "search",
    "searchQueries": ["transformer", "large language model"],
    "maxPapersPerQuery": 50,
    "includeDetails": true
}

Daily trending example input:

{
    "mode": "daily",
    "dates": ["2026-04-17", "2026-04-18"],
    "maxPapersPerQuery": 30
}

Input parameters

Parameter	Type	Default	Description
`mode`	string	`search`	Scraping mode: `search` (keyword search) or `daily` (trending papers)
`searchQueries`	string[]	`["transformer"]`	Keywords to search (search mode only)
`dates`	string[]	`[]` (today)	Dates in YYYY-MM-DD format (daily mode only)
`maxPapersPerQuery`	integer	`50`	Maximum papers per keyword or date (1-200)
`includeDetails`	boolean	`true`	Fetch full paper details (AI summary, GitHub, keywords)
`maxRequestRetries`	integer	`3`	Retry attempts for failed requests

Output example

{
    "paperId": "2604.14268",
    "title": "HY-World 2.0: A Multi-Modal World Model",
    "authors": ["Team HY-World", "Chenjie Cao", "Xuhui Zuo"],
    "abstract": "We present HY-World 2.0, a multi-modal world model...",
    "publishedAt": "2026-04-15T00:00:00.000Z",
    "upvotes": 81,
    "numComments": 5,
    "paperUrl": "https://huggingface.co/papers/2604.14268",
    "arxivUrl": "https://arxiv.org/abs/2604.14268",
    "githubUrl": "https://github.com/Tencent-Hunyuan/HY-World-2.0",
    "githubStars": 1174,
    "aiSummary": "HY-World 2.0 presents a multi-modal world model...",
    "aiKeywords": ["multi-modal world model", "3D Gaussian Splatting"],
    "thumbnailUrl": "https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2604.14268.png",
    "scrapedAt": "2026-04-19T08:15:04.254Z"
}

Tips for best results

🎯 Start small — Use 5-10 papers per query for your first run. Scale up once you're happy with the output
📅 Daily mode for monitoring — Schedule daily runs to track trending ML papers automatically
🔍 Specific keywords — "vision transformer" gives more relevant results than just "AI"
⚡ Skip details for speed — Set includeDetails: false for faster runs when you only need titles and abstracts
💾 Multiple queries — Search for multiple topics in one run to save on start event costs
📆 Date ranges — In daily mode, pass multiple dates to get trending papers across a date range

Integrations

📊 Hugging Face Papers → Google Sheets — Track trending papers daily in a spreadsheet. Schedule the actor to run every morning and push results to Sheets via Apify integration
🔔 Hugging Face Papers → Slack/Discord — Get notified when papers matching your keywords appear. Set up a webhook to post new high-upvote papers to your team channel
📧 Hugging Face Papers → Email newsletter — Build an automated weekly ML research digest by scheduling the actor and connecting to Mailchimp via Make/Zapier
🗃️ Hugging Face Papers → Airtable/Notion — Build a searchable research paper database that updates automatically
⚙️ Webhooks — Trigger downstream processing when new papers are found. Parse AI summaries to categorize papers automatically

Using the Apify API

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/huggingface-papers-scraper').call({
    mode: 'search',
    searchQueries: ['transformer'],
    maxPapersPerQuery: 50,
    includeDetails: true,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("automation-lab/huggingface-papers-scraper").call(run_input={
    "mode": "search",
    "searchQueries": ["transformer"],
    "maxPapersPerQuery": 50,
    "includeDetails": True,
})

items = client.dataset(run["defaultDatasetId"]).list_items().items
print(items)

cURL

curl "https://api.apify.com/v2/acts/automation-lab~huggingface-papers-scraper/runs" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -d '{
    "mode": "search",
    "searchQueries": ["transformer"],
    "maxPapersPerQuery": 50,
    "includeDetails": true
  }'

Use with AI agents via MCP

Hugging Face Papers Scraper is available as a tool for AI assistants that support the Model Context Protocol (MCP).

Add the Apify MCP server to your AI client — this gives you access to all Apify actors, including this one:

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/huggingface-papers-scraper"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "type": "http",
            "url": "https://mcp.apify.com?tools=automation-lab/huggingface-papers-scraper"
        }
    }
}

Example prompts

"Find the 20 most recent papers about vision transformers on Hugging Face"
"Get today's trending ML papers and summarize the top 5 by upvotes"
"Search for papers about reinforcement learning from human feedback and list their GitHub repos"

Is it legal to scrape Hugging Face?

Hugging Face Papers Scraper accesses publicly available research paper metadata through Hugging Face's public API. All extracted data (paper titles, abstracts, author names) is publicly available information originally published on arXiv and other open-access repositories.

We follow ethical scraping practices — no authentication bypass, no rate limit abuse, and full compliance with robots.txt. The actor only reads publicly accessible API endpoints.

For GDPR compliance, the actor only extracts public research metadata (author names, paper titles, abstracts). No private user data is collected. If you have concerns about specific data, consult your legal team.

FAQ

How fast is Hugging Face Papers Scraper? Very fast. Since it uses Hugging Face's API directly (no browser), a typical run of 50 papers completes in under 30 seconds.

How much does it cost to monitor daily papers? A daily run fetching ~30 trending papers costs about $0.06. Monthly monitoring (30 days) costs approximately $1.80.

Why do some papers have empty GitHub URLs? Not all papers have linked GitHub repositories. The githubUrl field is only populated when the paper's authors or community members have linked a code repository on Hugging Face.

Why are search results limited? Hugging Face's search API returns a maximum of ~100 results per query. For broader coverage, use multiple specific search queries instead of one generic term.

Can I get papers from Papers With Code? Yes! Papers With Code (paperswithcode.com) now redirects to Hugging Face. All data previously available on Papers With Code is now accessible through this actor.

What happened to paperswithcode.com? Papers With Code was acquired by Meta and has been fully integrated into Hugging Face. All paper URLs now redirect to huggingface.co/papers.

Other research and data scrapers

📚 ArXiv Scraper — Extract research papers directly from ArXiv with full abstracts and subjects
🔬 Google Scholar Scraper — Search academic papers, citations, and author profiles
📊 Google Trends Scraper — Track search interest trends over time

Hugging Face Papers Scraper

parseforge/huggingface-papers-scraper

Scrape AI and machine learning research papers from Hugging Face Papers. Get titles, abstracts, authors with affiliations, upvotes, publication dates, ArXiv IDs, and community discussion counts. Search by keyword or browse daily papers.

ParseForge

HuggingFace Daily Papers Scraper

tzmyk/huggingface-daily-papers-scraper

Scrapes AI/ML research papers from HuggingFace Daily Papers (huggingface.co/papers). Extracts title, authors, abstract, GitHub repo, star count, upvotes, AI summary, and keywords.

tzmyk

HuggingFaceTP

aligned_tripod/huggingfacetp

Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page and visits individual paper pages for full abstracts, providing a structured dataset of the latest AI research.

amazing

Ai-ML-scraper

labrat011/ai-ml-scraper

Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv. No API key required.

mick_

HuggingFace Hub Scraper - Models, Datasets, Spaces & Authors

makework36/huggingface-hub-scraper

Scrape HuggingFace Hub: models, datasets, spaces. 30+ fields per record, trending filters, author profiles, parsed tags, web enrichment for emails & websites.

deusex machine

Hugging Face Scraper — AI Models, Datasets, Spaces & Papers

logiover/huggingface-hub-intelligence-scraper

Export every AI model, dataset, space and daily paper from the Hugging Face Hub. Filter by task, library (transformers, diffusers, GGUF), language, license, author. Sort by downloads, likes, trending. Sibling files + README. Public HF API, no token. For AI builders, ML research, RAG and VC AI intel.

Logiover

ML Contests Scraper

automation-lab/mlcontests-scraper

Scrape machine learning, data science, and robotics competitions from mlcontests.com

Stas Persiianenko

📄 ArXiv Scraper — Preprints & Research Data

nexgendata/arxiv-scraper

Extract papers from ArXiv — titles, abstracts, authors, categories & PDF links. Monitor new AI, physics, math & CS research. Build tracking & literature review tools. Pay per paper.

Stephan Corbeil

Kaggle Dataset Scraper — Search, Metadata & Trending

openclawmara/kaggle-dataset-scraper

Scrape Kaggle datasets marketplace. Modes: search by keyword/tag, dataset details (owner, license, file list, size, votes, downloads), trending, and user profiles. Extracts titles, descriptions, updated dates, usability scores. Ideal for ML dataset discovery and competitive landscape research.

OpenClaw Mara

Semantic Scholar Scraper - Cheap 📚🔎🤖

scrapestorm/semantic-scholar-scraper---cheap

🔎 Easily collect research papers from Semantic Scholar Provide one or multiple search keywords, paper URLs or author profiles and extract structured academic data such as 📄 Paper Title👨‍🔬 Authors 📅 Publication Year 🔗 Paper URL & more Perfect for academic research & AI research monitoring 📚

Storm_Scraper

5.0

(1)

Hugging Face Datasets Catalog — ML Training Data Intel

nexgendata/huggingface-datasets-catalog

Hugging Face dataset registry: downloads, likes, last_modified, task_categories, language, size_categories, license, tags, author. Filter by task/language/size. Sort by downloads/likes/trending/modified. ML researchers, MLOps, AI compliance.

Stephan Corbeil

Hugging Face Papers Scraper

What does Hugging Face Papers Scraper do?

Who is Hugging Face Papers Scraper for?

Why use Hugging Face Papers Scraper?

What data can you extract?

How much does it cost to scrape Hugging Face papers?

How to scrape Hugging Face papers

Input parameters

Output example

Tips for best results

Integrations

Using the Apify API

Node.js

Python

cURL

Use with AI agents via MCP

Setup for Claude Code

Setup for Claude Desktop, Cursor, or VS Code

Example prompts

Is it legal to scrape Hugging Face?

FAQ

Other research and data scrapers

You might also like

Hugging Face Papers Scraper

HuggingFace Daily Papers Scraper

HuggingFaceTP

Ai-ML-scraper

HuggingFace Hub Scraper - Models, Datasets, Spaces & Authors

Hugging Face Scraper — AI Models, Datasets, Spaces & Papers

ML Contests Scraper

📄 ArXiv Scraper — Preprints & Research Data

Kaggle Dataset Scraper — Search, Metadata & Trending

Semantic Scholar Scraper - Cheap 📚🔎🤖

Hugging Face Datasets Catalog — ML Training Data Intel