HuggingFace Daily Papers Scraper
Pricing
from $2.00 / 1,000 paper scrapeds
HuggingFace Daily Papers Scraper
Scrapes AI/ML research papers from HuggingFace Daily Papers (huggingface.co/papers). Extracts title, authors, abstract, GitHub repo, star count, upvotes, AI summary, and keywords.
Pricing
from $2.00 / 1,000 paper scrapeds
Rating
0.0
(0)
Developer
tzmyk
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape AI/ML research papers from HuggingFace Daily Papers — the go-to source for trending research in the AI community.
Extracts structured data including titles, authors, abstracts, GitHub repositories, star counts, upvotes, and AI-generated summaries and keywords.
What it does
- Scrapes today's trending papers or papers from a specific date range
- Extracts full abstracts, GitHub repo URLs, star counts, upvote counts
- Includes HuggingFace's AI-generated summary and keywords for each paper
- Supports both fast mode (list-only) and full detail mode (with abstract + AI metadata)
Use cases
- RAG / LLM data pipelines — feed fresh research papers into your vector database daily
- AI trend monitoring — track which topics are trending in the research community
- Competitive intelligence — monitor GitHub repos and star growth of new papers
- Research assistants — power AI agents with up-to-date academic content
- Newsletter automation — curate weekly AI research digests automatically
Input
| Field | Type | Default | Description |
|---|---|---|---|
startDate | string | — | Fetch papers from this date (YYYY-MM-DD). Leave empty for today's trending papers. |
endDate | string | — | Fetch papers up to this date. Defaults to startDate. |
maxPapers | integer | 50 | Maximum number of papers to scrape (1–500). |
includeFullDetail | boolean | true | Fetch each paper's detail page for abstract, AI summary, keywords, and upvotes. |
Example inputs
Today's trending papers (fast mode):
{"maxPapers": 50,"includeFullDetail": false}
Full detail for a specific date:
{"startDate": "2026-03-20","includeFullDetail": true}
Date range:
{"startDate": "2026-03-18","endDate": "2026-03-20","maxPapers": 100}
Output
Each paper is saved as a dataset item with the following fields:
{"id": "2603.19235","title": "Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding","publishedAt": "2026-03-19T17:59:58.000Z","summary": "While Multimodal Large Language Models demonstrate impressive semantic capabilities...","upvotes": 77,"githubRepo": "https://github.com/H-EmbodVis/VEGA-3D","githubStars": 109,"authors": ["Xianjin Wu", "Dingkang Liang", "Tianrui Feng"],"arxivUrl": "https://arxiv.org/abs/2603.19235","paperUrl": "https://huggingface.co/papers/2603.19235","aiSummary": "A video diffusion model is repurposed as a latent world simulator...","aiKeywords": ["multimodal large language models", "3D structural priors", "video diffusion model"],"scrapedAt": "2026-03-22T01:59:38.919Z"}
Features
- No bot protection issues — HuggingFace serves clean HTML with no Cloudflare or CAPTCHA
- Structured JSON extraction — data parsed directly from Svelte hydration payloads for reliability
- Deduplication — papers are deduplicated across date ranges
- Graceful error handling — individual paper failures are logged and skipped without stopping the run
Notes
includeFullDetail: falseis significantly faster (1 list page vs. 1 list + N detail pages)- HuggingFace typically publishes 20–50 papers per day
- Papers older than ~2 weeks may not appear on the date archive pages
Support
Found a bug or have a feature request? Leave a review or reach out via the Apify platform.