HuggingFace Daily Papers Scraper avatar

HuggingFace Daily Papers Scraper

Pricing

from $2.00 / 1,000 paper scrapeds

Go to Apify Store
HuggingFace Daily Papers Scraper

HuggingFace Daily Papers Scraper

Scrapes AI/ML research papers from HuggingFace Daily Papers (huggingface.co/papers). Extracts title, authors, abstract, GitHub repo, star count, upvotes, AI summary, and keywords.

Pricing

from $2.00 / 1,000 paper scrapeds

Rating

0.0

(0)

Developer

tzmyk

tzmyk

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Scrape AI/ML research papers from HuggingFace Daily Papers — the go-to source for trending research in the AI community.

Extracts structured data including titles, authors, abstracts, GitHub repositories, star counts, upvotes, and AI-generated summaries and keywords.

What it does

  • Scrapes today's trending papers or papers from a specific date range
  • Extracts full abstracts, GitHub repo URLs, star counts, upvote counts
  • Includes HuggingFace's AI-generated summary and keywords for each paper
  • Supports both fast mode (list-only) and full detail mode (with abstract + AI metadata)

Use cases

  • RAG / LLM data pipelines — feed fresh research papers into your vector database daily
  • AI trend monitoring — track which topics are trending in the research community
  • Competitive intelligence — monitor GitHub repos and star growth of new papers
  • Research assistants — power AI agents with up-to-date academic content
  • Newsletter automation — curate weekly AI research digests automatically

Input

FieldTypeDefaultDescription
startDatestringFetch papers from this date (YYYY-MM-DD). Leave empty for today's trending papers.
endDatestringFetch papers up to this date. Defaults to startDate.
maxPapersinteger50Maximum number of papers to scrape (1–500).
includeFullDetailbooleantrueFetch each paper's detail page for abstract, AI summary, keywords, and upvotes.

Example inputs

Today's trending papers (fast mode):

{
"maxPapers": 50,
"includeFullDetail": false
}

Full detail for a specific date:

{
"startDate": "2026-03-20",
"includeFullDetail": true
}

Date range:

{
"startDate": "2026-03-18",
"endDate": "2026-03-20",
"maxPapers": 100
}

Output

Each paper is saved as a dataset item with the following fields:

{
"id": "2603.19235",
"title": "Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding",
"publishedAt": "2026-03-19T17:59:58.000Z",
"summary": "While Multimodal Large Language Models demonstrate impressive semantic capabilities...",
"upvotes": 77,
"githubRepo": "https://github.com/H-EmbodVis/VEGA-3D",
"githubStars": 109,
"authors": ["Xianjin Wu", "Dingkang Liang", "Tianrui Feng"],
"arxivUrl": "https://arxiv.org/abs/2603.19235",
"paperUrl": "https://huggingface.co/papers/2603.19235",
"aiSummary": "A video diffusion model is repurposed as a latent world simulator...",
"aiKeywords": ["multimodal large language models", "3D structural priors", "video diffusion model"],
"scrapedAt": "2026-03-22T01:59:38.919Z"
}

Features

  • No bot protection issues — HuggingFace serves clean HTML with no Cloudflare or CAPTCHA
  • Structured JSON extraction — data parsed directly from Svelte hydration payloads for reliability
  • Deduplication — papers are deduplicated across date ranges
  • Graceful error handling — individual paper failures are logged and skipped without stopping the run

Notes

  • includeFullDetail: false is significantly faster (1 list page vs. 1 list + N detail pages)
  • HuggingFace typically publishes 20–50 papers per day
  • Papers older than ~2 weeks may not appear on the date archive pages

Support

Found a bug or have a feature request? Leave a review or reach out via the Apify platform.