Hugging Face Papers Scraper avatar

Hugging Face Papers Scraper

Pricing

from $9.00 / 1,000 results

Go to Apify Store
Hugging Face Papers Scraper

Hugging Face Papers Scraper

Scrape AI and machine learning research papers from Hugging Face Papers. Get titles, abstracts, authors with affiliations, upvotes, publication dates, ArXiv IDs, and community discussion counts. Search by keyword or browse daily papers.

Pricing

from $9.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

1

Bookmarked

4

Total users

3

Monthly active users

15 hours ago

Last modified

Share

ParseForge Banner

๐Ÿ“„ Hugging Face Papers Scraper

๐Ÿš€ Scrape trending and keyword-searched AI/ML papers from Hugging Face with titles, abstracts, authors, upvotes, arXiv IDs, and GitHub repos. Returns structured data in seconds.

๐Ÿ•’ Last updated: 2026-04-23

Every day, Hugging Face Papers surfaces the most discussed machine learning research with community upvotes, author profiles, and links to code repositories. This Actor pulls that curated feed or runs keyword searches across the entire index, returning structured records with titles, abstracts, arXiv identifiers, author details, GitHub links, project pages, AI-generated keywords, and community engagement metrics.

Whether you run an AI newsletter, track a research subfield for your lab, or want to spot emerging trends before they go mainstream, this scraper saves you hours of manual browsing. Set it on a daily schedule and let it build a living archive of the papers that matter to your work.

TargetHugging Face Papers
Use CasesResearch newsletters, literature reviews, ML trend tracking, academic monitoring

๐Ÿ“‹ What it does

  • ๐Ÿ“š Paper metadata. Titles, abstracts, arXiv IDs, publication dates, and direct Hugging Face URLs for every paper.
  • ๐Ÿ‘ฅ Author details. Full author lists with Hugging Face usernames and verification status included.
  • โญ Community engagement. Upvote counts, comment totals, and thumbnails so you can gauge which papers resonate.
  • ๐Ÿ’ป Code and project links. GitHub repository URLs and project pages when authors have linked them.
  • ๐Ÿ” Two collection modes. Search by keyword across indexed papers or grab today's trending daily feed.

Each record includes the arXiv ID, paper title, abstract, publication date, full author list with HF handles, upvote and comment counts, thumbnail image, GitHub repo link, project page, and AI-generated keywords.

๐Ÿ’ก Why it matters: Manually checking Hugging Face Papers every day and copying metadata into a spreadsheet takes 30+ minutes. This Actor does it in seconds and delivers a clean, structured dataset ready for analysis.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
searchQuerystring"transformer"Keyword to match against paper titles and abstracts. Examples: "diffusion model", "LLM", "reinforcement learning".
modestring"search"Collection mode. Use "search" for keyword search or "trending" for the daily curated feed.
maxItemsinteger10Maximum number of papers to return. Free users are limited to 10. Paid users can request up to 1,000,000.

Example: Search for diffusion model papers.

{
"searchQuery": "diffusion model",
"mode": "search",
"maxItems": 50
}

Example: Grab today's trending papers.

{
"mode": "trending",
"maxItems": 25
}

โš ๏ธ Good to Know: Hugging Face Papers indexes new publications daily. Trending mode returns papers curated by the HF team and community for the current day. Search mode queries across all indexed papers. Results are limited by what the Hugging Face API exposes.


๐Ÿ“Š Output

Each record contains 15+ fields. Download as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ†” arxivIdstring"2404.12345"
๐Ÿ“‹ titlestring"Efficient Attention for Long-Context Language Models"
๐Ÿ”— urlstring"https://huggingface.co/papers/2404.12345"
๐Ÿ”— arxivUrlstring"https://arxiv.org/abs/2404.12345"
๐Ÿ“… publishedAtstring"2026-04-09"
โฌ†๏ธ upvotesinteger187
๐Ÿ’ฌ numCommentsinteger12
๐Ÿ‘ค firstAuthorstring"Jane Smith"
๐Ÿ‘ฅ authorsarray[{"name": "Jane Smith", "hfUser": "jsmith", "verified": true}]
๐Ÿ“ summarystring"We introduce a novel attention mechanism..."
๐Ÿ’ป githubRepostring"https://github.com/example/long-attention"
๐ŸŒ projectPagestring"https://example.github.io/long-attention"
๐Ÿท๏ธ aiKeywordsarray["attention", "long-context", "efficiency"]
๐Ÿ–ผ๏ธ thumbnailstring"https://cdn-thumbnails.huggingface.co/..."
๐Ÿ• scrapedAtstring"2026-04-10T12:00:00.000Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐Ÿ“šTwo collection modes. Search by keyword or pull the daily trending feed.
โšกFast results. Papers arrive in seconds, not minutes of manual browsing.
๐Ÿ‘ฅAuthor metadata. Hugging Face usernames and verification status for every author.
๐Ÿ’ปCode links included. GitHub repos and project pages extracted automatically.
๐Ÿท๏ธAI keywords. Machine-generated topic tags for easier filtering and categorization.
๐Ÿ“…Schedule-ready. Set it on a daily cron to build a rolling archive of ML research.
๐Ÿ“ŠMultiple export formats. Download results as CSV, Excel, JSON, or XML.

Hugging Face Papers features hundreds of new AI/ML papers every week, curated by the community and surfaced through upvotes. Staying current manually is a full-time job.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshSetup
โญ Hugging Face Papers Scraper (this Actor)$5 free credit, then pay-per-useAll HF indexed papersLive per runโšก 2 min
Manual browsingFreeLimited by timeManual daily checks๐Ÿ• 30 min/day
Official API integrationFreeFull accessPer request๐Ÿ”ง 1-2 hours
Third-party data providers$50-500/moVariesWeekly or monthly๐Ÿ“‹ 30 min

Pick this Actor when you want structured, schedule-ready paper data without writing API integration code yourself.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the Hugging Face Papers Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Choose a keyword and mode (search or trending), then set your max items.
  4. ๐Ÿš€ Run it. Click Start and let the Actor collect your data.
  5. ๐Ÿ“ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿ“ฌ Research Newsletters

  • Auto-curate weekly digests of trending ML papers
  • Filter by keyword to match your audience's interests
  • Include upvote counts to highlight community favorites
  • Link directly to arXiv and GitHub repos

๐Ÿง  Academic Research

  • Monitor new publications in your subfield daily
  • Build literature review datasets without manual searches
  • Track author output and collaboration patterns
  • Export to spreadsheets for bibliometric analysis

๐Ÿ“Š Trend Analysis

  • Track which ML topics gain upvotes over time
  • Spot emerging research areas before they peak
  • Compare engagement across diffusion, LLM, and RL papers
  • Build time-series datasets of publication volume

๐Ÿ’ผ Talent Scouting

  • Identify active researchers by watching trending authors
  • Find engineers who open-source their paper code
  • Monitor verified Hugging Face contributors
  • Build prospect lists for recruiting outreach


๐ŸŒŸ Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Empirical datasets for papers, thesis work, and coursework
  • Longitudinal studies tracking changes across snapshots
  • Reproducible research with cited, versioned data pulls
  • Classroom exercises on data analysis and ethical scraping

๐ŸŽจ Personal and creative

  • Side projects, portfolio demos, and indie app launches
  • Data visualizations, dashboards, and infographics
  • Content research for bloggers, YouTubers, and podcasters
  • Hobbyist collections and personal trackers

๐Ÿค Non-profit and civic

  • Transparency reporting and accountability projects
  • Advocacy campaigns backed by public-interest data
  • Community-run databases for local issues
  • Investigative journalism on public records

๐Ÿงช Experimentation

  • Prototype AI and machine-learning pipelines with real data
  • Validate product-market hypotheses before engineering spend
  • Train small domain-specific models on niche corpora
  • Test dashboard concepts with live input

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

โ“ Frequently Asked Questions

๐Ÿ’ณ Do I need a paid Apify plan to run this actor?

No. You can start right now on the free Apify plan, which includes $5 in free monthly credit. That is enough to run this actor several times and explore the output before committing to anything. Paid plans unlock higher limits, more concurrent runs, and larger datasets. Create a free Apify account here to get started.

๐Ÿšจ What happens if my run fails or returns no results?

Failed runs are not charged. If the source site changes, proxies get rate-limited, or a specific input matches nothing, re-run the actor or open our contact form and we will investigate. You can also check the run log in the Apify console to see why the run stopped.

๐Ÿ“ How many items can I scrape per run?

Free users are limited to 10 items per run so you can preview the output and confirm the actor works for your use case. Paid users can raise maxItems up to 1,000,000 per run. Upgrade here if you need full scale.

๐Ÿ•’ How fresh is the data?

Every run fetches live data at the moment of execution. There is no cache or delay: the records you get reflect what the source returned at that moment. Schedule the actor to maintain a rolling snapshot of the data you need.

๐Ÿง‘โ€๐Ÿ’ป Can I call this actor from my own code?

Yes. Apify exposes every actor as a REST endpoint and ships first-class SDKs for Node.js and Python. You can start a run, read the dataset, and handle webhooks from your own app in a few lines. All you need is your Apify API token.

๐Ÿ“ค How do I export the data?

Every Apify dataset can be downloaded in one click from the console as CSV, JSON, JSONL, Excel, HTML, XML, or RSS. You can also pull results programmatically via the Apify API or stream them into BigQuery, S3, and other destinations through built-in integrations.

๐Ÿ“… Can I schedule the actor to run automatically?

Yes. Use the Apify scheduler to run the actor on any cadence, from hourly to monthly. Results are saved to your dataset and can be delivered to webhooks, email, Slack, cloud storage, or automation tools such as Zapier and Make.


๐Ÿ”Œ Automating Hugging Face Papers Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Set a daily run in trending mode and never miss the papers the community is talking about.

๐Ÿ”Œ Integrate with any app

Hugging Face Papers Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications
  • Airbyte - Pipe data into your warehouse
  • GitHub - Trigger runs from commits
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.


๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more data scrapers and tools.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Hugging Face or arXiv. All trademarks mentioned are the property of their respective owners. Only publicly available data is collected.