Hugging Face Papers Scraper
Pricing
from $9.00 / 1,000 results
Hugging Face Papers Scraper
Scrape AI and machine learning research papers from Hugging Face Papers. Get titles, abstracts, authors with affiliations, upvotes, publication dates, ArXiv IDs, and community discussion counts. Search by keyword or browse daily papers.
Pricing
from $9.00 / 1,000 results
Rating
0.0
(0)
Developer
ParseForge
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
6 hours ago
Last modified
Categories
Share

๐ Hugging Face Papers Scraper
๐ Scrape trending and keyword-searched AI/ML papers from Hugging Face with titles, abstracts, authors, upvotes, arXiv IDs, and GitHub repos. Returns structured data in seconds.
๐ Last updated: 2026-04-17
Every day, Hugging Face Papers surfaces the most discussed machine learning research with community upvotes, author profiles, and links to code repositories. This Actor pulls that curated feed or runs keyword searches across the entire index, returning structured records with titles, abstracts, arXiv identifiers, author details, GitHub links, project pages, AI-generated keywords, and community engagement metrics.
Whether you run an AI newsletter, track a research subfield for your lab, or want to spot emerging trends before they go mainstream, this scraper saves you hours of manual browsing. Set it on a daily schedule and let it build a living archive of the papers that matter to your work.
| Target | Hugging Face Papers |
|---|---|
| Use Cases | Research newsletters, literature reviews, ML trend tracking, academic monitoring |
๐ What it does
- ๐ Paper metadata. Titles, abstracts, arXiv IDs, publication dates, and direct Hugging Face URLs for every paper.
- ๐ฅ Author details. Full author lists with Hugging Face usernames and verification status included.
- โญ Community engagement. Upvote counts, comment totals, and thumbnails so you can gauge which papers resonate.
- ๐ป Code and project links. GitHub repository URLs and project pages when authors have linked them.
- ๐ Two collection modes. Search by keyword across indexed papers or grab today's trending daily feed.
Each record includes the arXiv ID, paper title, abstract, publication date, full author list with HF handles, upvote and comment counts, thumbnail image, GitHub repo link, project page, and AI-generated keywords.
๐ก Why it matters: Manually checking Hugging Face Papers every day and copying metadata into a spreadsheet takes 30+ minutes. This Actor does it in seconds and delivers a clean, structured dataset ready for analysis.
๐ฌ Full Demo
๐ง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
โ๏ธ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
searchQuery | string | "transformer" | Keyword to match against paper titles and abstracts. Examples: "diffusion model", "LLM", "reinforcement learning". |
mode | string | "search" | Collection mode. Use "search" for keyword search or "trending" for the daily curated feed. |
maxItems | integer | 10 | Maximum number of papers to return. Free users are limited to 10. Paid users can request up to 1,000,000. |
Example: Search for diffusion model papers.
{"searchQuery": "diffusion model","mode": "search","maxItems": 50}
Example: Grab today's trending papers.
{"mode": "trending","maxItems": 25}
โ ๏ธ Good to Know: Hugging Face Papers indexes new publications daily. Trending mode returns papers curated by the HF team and community for the current day. Search mode queries across all indexed papers. Results are limited by what the Hugging Face API exposes.
๐ Output
Each record contains 15+ fields. Download as CSV, Excel, JSON, or XML.
๐งพ Schema
| Field | Type | Example |
|---|---|---|
๐ arxivId | string | "2404.12345" |
๐ title | string | "Efficient Attention for Long-Context Language Models" |
๐ url | string | "https://huggingface.co/papers/2404.12345" |
๐ arxivUrl | string | "https://arxiv.org/abs/2404.12345" |
๐
publishedAt | string | "2026-04-09" |
โฌ๏ธ upvotes | integer | 187 |
๐ฌ numComments | integer | 12 |
๐ค firstAuthor | string | "Jane Smith" |
๐ฅ authors | array | [{"name": "Jane Smith", "hfUser": "jsmith", "verified": true}] |
๐ summary | string | "We introduce a novel attention mechanism..." |
๐ป githubRepo | string | "https://github.com/example/long-attention" |
๐ projectPage | string | "https://example.github.io/long-attention" |
๐ท๏ธ aiKeywords | array | ["attention", "long-context", "efficiency"] |
๐ผ๏ธ thumbnail | string | "https://cdn-thumbnails.huggingface.co/..." |
๐ scrapedAt | string | "2026-04-10T12:00:00.000Z" |
๐ฆ Sample records
โจ Why choose this Actor
| Capability | |
|---|---|
| ๐ | Two collection modes. Search by keyword or pull the daily trending feed. |
| โก | Fast results. Papers arrive in seconds, not minutes of manual browsing. |
| ๐ฅ | Author metadata. Hugging Face usernames and verification status for every author. |
| ๐ป | Code links included. GitHub repos and project pages extracted automatically. |
| ๐ท๏ธ | AI keywords. Machine-generated topic tags for easier filtering and categorization. |
| ๐ | Schedule-ready. Set it on a daily cron to build a rolling archive of ML research. |
| ๐ | Multiple export formats. Download results as CSV, Excel, JSON, or XML. |
Hugging Face Papers features hundreds of new AI/ML papers every week, curated by the community and surfaced through upvotes. Staying current manually is a full-time job.
๐ How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Setup |
|---|---|---|---|---|
| โญ Hugging Face Papers Scraper (this Actor) | $5 free credit, then pay-per-use | All HF indexed papers | Live per run | โก 2 min |
| Manual browsing | Free | Limited by time | Manual daily checks | ๐ 30 min/day |
| Official API integration | Free | Full access | Per request | ๐ง 1-2 hours |
| Third-party data providers | $50-500/mo | Varies | Weekly or monthly | ๐ 30 min |
Pick this Actor when you want structured, schedule-ready paper data without writing API integration code yourself.
๐ How to use
- ๐ Sign up. Create a free account with $5 credit (takes 2 minutes).
- ๐ Open the Actor. Go to the Hugging Face Papers Scraper page on the Apify Store.
- ๐ฏ Set input. Choose a keyword and mode (search or trending), then set your max items.
- ๐ Run it. Click Start and let the Actor collect your data.
- ๐ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
๐ผ Business use cases
๐ Automating Hugging Face Papers Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- ๐ข Node.js. Install the
apify-clientNPM package. - ๐ Python. Use the
apify-clientPyPI package. - ๐ See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Set a daily run in trending mode and never miss the papers the community is talking about.
โ Frequently Asked Questions
๐ Integrate with any app
Hugging Face Papers Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications
- Airbyte - Pipe data into your warehouse
- GitHub - Trigger runs from commits
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes.
๐ Recommended Actors
- ๐ค Hugging Face Model Scraper - Collect AI model metadata, downloads, and tags from the HF Hub
- ๐ Apple App Store Scraper - Scrape iPhone app listings, ratings, and reviews
- ๐ฐ PR Newswire Scraper - Collect press releases and corporate news
- ๐ช AWS Marketplace Scraper - Extract AWS product listings and pricing
- ๐ Stripe App Marketplace Scraper - Scrape Stripe app listings and integrations
๐ก Pro Tip: browse the complete ParseForge collection for more data scrapers and tools.
๐ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
โ ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Hugging Face or arXiv. All trademarks mentioned are the property of their respective owners. Only publicly available data is collected.