Hugging Face Papers Scraper avatar

Hugging Face Papers Scraper

Pricing

from $9.00 / 1,000 results

Go to Apify Store
Hugging Face Papers Scraper

Hugging Face Papers Scraper

Scrape AI and machine learning research papers from Hugging Face Papers. Get titles, abstracts, authors with affiliations, upvotes, publication dates, ArXiv IDs, and community discussion counts. Search by keyword or browse daily papers.

Pricing

from $9.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

6 hours ago

Last modified

Share

ParseForge Banner

๐Ÿ“„ Hugging Face Papers Scraper

๐Ÿš€ Scrape trending and keyword-searched AI/ML papers from Hugging Face with titles, abstracts, authors, upvotes, arXiv IDs, and GitHub repos. Returns structured data in seconds.

๐Ÿ•’ Last updated: 2026-04-17

Every day, Hugging Face Papers surfaces the most discussed machine learning research with community upvotes, author profiles, and links to code repositories. This Actor pulls that curated feed or runs keyword searches across the entire index, returning structured records with titles, abstracts, arXiv identifiers, author details, GitHub links, project pages, AI-generated keywords, and community engagement metrics.

Whether you run an AI newsletter, track a research subfield for your lab, or want to spot emerging trends before they go mainstream, this scraper saves you hours of manual browsing. Set it on a daily schedule and let it build a living archive of the papers that matter to your work.

TargetHugging Face Papers
Use CasesResearch newsletters, literature reviews, ML trend tracking, academic monitoring

๐Ÿ“‹ What it does

  • ๐Ÿ“š Paper metadata. Titles, abstracts, arXiv IDs, publication dates, and direct Hugging Face URLs for every paper.
  • ๐Ÿ‘ฅ Author details. Full author lists with Hugging Face usernames and verification status included.
  • โญ Community engagement. Upvote counts, comment totals, and thumbnails so you can gauge which papers resonate.
  • ๐Ÿ’ป Code and project links. GitHub repository URLs and project pages when authors have linked them.
  • ๐Ÿ” Two collection modes. Search by keyword across indexed papers or grab today's trending daily feed.

Each record includes the arXiv ID, paper title, abstract, publication date, full author list with HF handles, upvote and comment counts, thumbnail image, GitHub repo link, project page, and AI-generated keywords.

๐Ÿ’ก Why it matters: Manually checking Hugging Face Papers every day and copying metadata into a spreadsheet takes 30+ minutes. This Actor does it in seconds and delivers a clean, structured dataset ready for analysis.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
searchQuerystring"transformer"Keyword to match against paper titles and abstracts. Examples: "diffusion model", "LLM", "reinforcement learning".
modestring"search"Collection mode. Use "search" for keyword search or "trending" for the daily curated feed.
maxItemsinteger10Maximum number of papers to return. Free users are limited to 10. Paid users can request up to 1,000,000.

Example: Search for diffusion model papers.

{
"searchQuery": "diffusion model",
"mode": "search",
"maxItems": 50
}

Example: Grab today's trending papers.

{
"mode": "trending",
"maxItems": 25
}

โš ๏ธ Good to Know: Hugging Face Papers indexes new publications daily. Trending mode returns papers curated by the HF team and community for the current day. Search mode queries across all indexed papers. Results are limited by what the Hugging Face API exposes.


๐Ÿ“Š Output

Each record contains 15+ fields. Download as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ†” arxivIdstring"2404.12345"
๐Ÿ“‹ titlestring"Efficient Attention for Long-Context Language Models"
๐Ÿ”— urlstring"https://huggingface.co/papers/2404.12345"
๐Ÿ”— arxivUrlstring"https://arxiv.org/abs/2404.12345"
๐Ÿ“… publishedAtstring"2026-04-09"
โฌ†๏ธ upvotesinteger187
๐Ÿ’ฌ numCommentsinteger12
๐Ÿ‘ค firstAuthorstring"Jane Smith"
๐Ÿ‘ฅ authorsarray[{"name": "Jane Smith", "hfUser": "jsmith", "verified": true}]
๐Ÿ“ summarystring"We introduce a novel attention mechanism..."
๐Ÿ’ป githubRepostring"https://github.com/example/long-attention"
๐ŸŒ projectPagestring"https://example.github.io/long-attention"
๐Ÿท๏ธ aiKeywordsarray["attention", "long-context", "efficiency"]
๐Ÿ–ผ๏ธ thumbnailstring"https://cdn-thumbnails.huggingface.co/..."
๐Ÿ• scrapedAtstring"2026-04-10T12:00:00.000Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐Ÿ“šTwo collection modes. Search by keyword or pull the daily trending feed.
โšกFast results. Papers arrive in seconds, not minutes of manual browsing.
๐Ÿ‘ฅAuthor metadata. Hugging Face usernames and verification status for every author.
๐Ÿ’ปCode links included. GitHub repos and project pages extracted automatically.
๐Ÿท๏ธAI keywords. Machine-generated topic tags for easier filtering and categorization.
๐Ÿ“…Schedule-ready. Set it on a daily cron to build a rolling archive of ML research.
๐Ÿ“ŠMultiple export formats. Download results as CSV, Excel, JSON, or XML.

Hugging Face Papers features hundreds of new AI/ML papers every week, curated by the community and surfaced through upvotes. Staying current manually is a full-time job.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshSetup
โญ Hugging Face Papers Scraper (this Actor)$5 free credit, then pay-per-useAll HF indexed papersLive per runโšก 2 min
Manual browsingFreeLimited by timeManual daily checks๐Ÿ• 30 min/day
Official API integrationFreeFull accessPer request๐Ÿ”ง 1-2 hours
Third-party data providers$50-500/moVariesWeekly or monthly๐Ÿ“‹ 30 min

Pick this Actor when you want structured, schedule-ready paper data without writing API integration code yourself.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the Hugging Face Papers Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Choose a keyword and mode (search or trending), then set your max items.
  4. ๐Ÿš€ Run it. Click Start and let the Actor collect your data.
  5. ๐Ÿ“ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿ“ฌ Research Newsletters

  • Auto-curate weekly digests of trending ML papers
  • Filter by keyword to match your audience's interests
  • Include upvote counts to highlight community favorites
  • Link directly to arXiv and GitHub repos

๐Ÿง  Academic Research

  • Monitor new publications in your subfield daily
  • Build literature review datasets without manual searches
  • Track author output and collaboration patterns
  • Export to spreadsheets for bibliometric analysis

๐Ÿ“Š Trend Analysis

  • Track which ML topics gain upvotes over time
  • Spot emerging research areas before they peak
  • Compare engagement across diffusion, LLM, and RL papers
  • Build time-series datasets of publication volume

๐Ÿ’ผ Talent Scouting

  • Identify active researchers by watching trending authors
  • Find engineers who open-source their paper code
  • Monitor verified Hugging Face contributors
  • Build prospect lists for recruiting outreach

๐Ÿ”Œ Automating Hugging Face Papers Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Set a daily run in trending mode and never miss the papers the community is talking about.


โ“ Frequently Asked Questions


๐Ÿ”Œ Integrate with any app

Hugging Face Papers Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications
  • Airbyte - Pipe data into your warehouse
  • GitHub - Trigger runs from commits
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.


๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more data scrapers and tools.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Hugging Face or arXiv. All trademarks mentioned are the property of their respective owners. Only publicly available data is collected.