Pricing

from $9.00 / 1,000 results

Hugging Face Papers Scraper

Scrape AI and machine learning research papers from Hugging Face Papers. Get titles, abstracts, authors with affiliations, upvotes, publication dates, ArXiv IDs, and community discussion counts. Search by keyword or browse daily papers.

Pricing

from $9.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

📄 Hugging Face Papers Scraper

🚀 Scrape trending and keyword-searched AI/ML papers from Hugging Face with titles, abstracts, authors, upvotes, arXiv IDs, and GitHub repos. Returns structured data in seconds.

🕒 Last updated: 2026-04-23

Every day, Hugging Face Papers surfaces the most discussed machine learning research with community upvotes, author profiles, and links to code repositories. This Actor pulls that curated feed or runs keyword searches across the entire index, returning structured records with titles, abstracts, arXiv identifiers, author details, GitHub links, project pages, AI-generated keywords, and community engagement metrics.

Whether you run an AI newsletter, track a research subfield for your lab, or want to spot emerging trends before they go mainstream, this scraper saves you hours of manual browsing. Set it on a daily schedule and let it build a living archive of the papers that matter to your work.

Target	Hugging Face Papers
Use Cases	Research newsletters, literature reviews, ML trend tracking, academic monitoring

📋 What it does

📚 Paper metadata. Titles, abstracts, arXiv IDs, publication dates, and direct Hugging Face URLs for every paper.
👥 Author details. Full author lists with Hugging Face usernames and verification status included.
⭐ Community engagement. Upvote counts, comment totals, and thumbnails so you can gauge which papers resonate.
💻 Code and project links. GitHub repository URLs and project pages when authors have linked them.
🔍 Two collection modes. Search by keyword across indexed papers or grab today's trending daily feed.

Each record includes the arXiv ID, paper title, abstract, publication date, full author list with HF handles, upvote and comment counts, thumbnail image, GitHub repo link, project page, and AI-generated keywords.

💡 Why it matters: Manually checking Hugging Face Papers every day and copying metadata into a spreadsheet takes 30+ minutes. This Actor does it in seconds and delivers a clean, structured dataset ready for analysis.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.

⚙️ Input

Input	Type	Default	Behavior
searchQuery	string	"transformer"	Keyword to match against paper titles and abstracts. Examples: "diffusion model", "LLM", "reinforcement learning".
mode	string	"search"	Collection mode. Use "search" for keyword search or "trending" for the daily curated feed.
maxItems	integer	10	Maximum number of papers to return. Free users are limited to 10. Paid users can request up to 1,000,000.

Example: Search for diffusion model papers.

{
    "searchQuery": "diffusion model",
    "mode": "search",
    "maxItems": 50
}

Example: Grab today's trending papers.

{
    "mode": "trending",
    "maxItems": 25
}

⚠️ Good to Know: Hugging Face Papers indexes new publications daily. Trending mode returns papers curated by the HF team and community for the current day. Search mode queries across all indexed papers. Results are limited by what the Hugging Face API exposes.

📊 Output

Each record contains 15+ fields. Download as CSV, Excel, JSON, or XML.

🧾 Schema

Field	Type	Example
🆔 arxivId	string	`"2404.12345"`
📋 title	string	`"Efficient Attention for Long-Context Language Models"`
🔗 url	string	`"https://huggingface.co/papers/2404.12345"`
🔗 arxivUrl	string	`"https://arxiv.org/abs/2404.12345"`
📅 publishedAt	string	`"2026-04-09"`
⬆️ upvotes	integer	`187`
💬 numComments	integer	`12`
👤 firstAuthor	string	`"Jane Smith"`
👥 authors	array	`[{"name": "Jane Smith", "hfUser": "jsmith", "verified": true}]`
📝 summary	string	`"We introduce a novel attention mechanism..."`
💻 githubRepo	string	`"https://github.com/example/long-attention"`
🌐 projectPage	string	`"https://example.github.io/long-attention"`
🏷️ aiKeywords	array	`["attention", "long-context", "efficiency"]`
🖼️ thumbnail	string	`"https://cdn-thumbnails.huggingface.co/..."`
🕐 scrapedAt	string	`"2026-04-10T12:00:00.000Z"`

📦 Sample records

✨ Why choose this Actor

	Capability
📚	Two collection modes. Search by keyword or pull the daily trending feed.
⚡	Fast results. Papers arrive in seconds, not minutes of manual browsing.
👥	Author metadata. Hugging Face usernames and verification status for every author.
💻	Code links included. GitHub repos and project pages extracted automatically.
🏷️	AI keywords. Machine-generated topic tags for easier filtering and categorization.
📅	Schedule-ready. Set it on a daily cron to build a rolling archive of ML research.
📊	Multiple export formats. Download results as CSV, Excel, JSON, or XML.

Hugging Face Papers features hundreds of new AI/ML papers every week, curated by the community and surfaced through upvotes. Staying current manually is a full-time job.

📈 How it compares to alternatives

Approach	Cost	Coverage	Refresh	Setup
⭐ Hugging Face Papers Scraper (this Actor)	$5 free credit, then pay-per-use	All HF indexed papers	Live per run	⚡ 2 min
Manual browsing	Free	Limited by time	Manual daily checks	🕐 30 min/day
Official API integration	Free	Full access	Per request	🔧 1-2 hours
Third-party data providers	$50-500/mo	Varies	Weekly or monthly	📋 30 min

Pick this Actor when you want structured, schedule-ready paper data without writing API integration code yourself.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the Hugging Face Papers Scraper page on the Apify Store.
🎯 Set input. Choose a keyword and mode (search or trending), then set your max items.
🚀 Run it. Click Start and let the Actor collect your data.
📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.

💼 Business use cases

📬 Research Newsletters

Auto-curate weekly digests of trending ML papers
Filter by keyword to match your audience's interests
Include upvote counts to highlight community favorites
Link directly to arXiv and GitHub repos

🧠 Academic Research

Monitor new publications in your subfield daily
Build literature review datasets without manual searches
Track author output and collaboration patterns
Export to spreadsheets for bibliometric analysis

📊 Trend Analysis

Track which ML topics gain upvotes over time
Spot emerging research areas before they peak
Compare engagement across diffusion, LLM, and RL papers
Build time-series datasets of publication volume

💼 Talent Scouting

Identify active researchers by watching trending authors
Find engineers who open-source their paper code
Monitor verified Hugging Face contributors
Build prospect lists for recruiting outreach

🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

Empirical datasets for papers, thesis work, and coursework
Longitudinal studies tracking changes across snapshots
Reproducible research with cited, versioned data pulls
Classroom exercises on data analysis and ethical scraping

🎨 Personal and creative

Side projects, portfolio demos, and indie app launches
Data visualizations, dashboards, and infographics
Content research for bloggers, YouTubers, and podcasters
Hobbyist collections and personal trackers

🤝 Non-profit and civic

Transparency reporting and accountability projects
Advocacy campaigns backed by public-interest data
Community-run databases for local issues
Investigative journalism on public records

🧪 Experimentation

Prototype AI and machine-learning pipelines with real data
Validate product-market hypotheses before engineering spend
Train small domain-specific models on niche corpora
Test dashboard concepts with live input

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions

🔌 Automating Hugging Face Papers Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

🟢 Node.js. Install the apify-client NPM package.
🐍 Python. Use the apify-client PyPI package.
📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Set a daily run in trending mode and never miss the papers the community is talking about.

🔌 Integrate with any app

Hugging Face Papers Scraper connects to any cloud service via Apify integrations:

Make - Automate multi-step workflows
Zapier - Connect with 5,000+ apps
Slack - Get run notifications
Airbyte - Pipe data into your warehouse
GitHub - Trigger runs from commits
Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.

🔗 Recommended Actors

🤖 Hugging Face Model Scraper - Collect AI model metadata, downloads, and tags from the HF Hub
🍎 Apple App Store Scraper - Scrape iPhone app listings, ratings, and reviews
📰 PR Newswire Scraper - Collect press releases and corporate news
🏪 AWS Marketplace Scraper - Extract AWS product listings and pricing
🔗 Stripe App Marketplace Scraper - Scrape Stripe app listings and integrations

💡 Pro Tip: browse the complete ParseForge collection for more data scrapers and tools.

🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Hugging Face or arXiv. All trademarks mentioned are the property of their respective owners. Only publicly available data is collected.

Hugging Face Scraper - Models, Datasets, Papers

logiover/huggingface-hub-intelligence-scraper

Hugging Face data export tool: scrape models, datasets & daily papers without a token. Export to CSV/JSON. A no-login Hugging Face API alternative.

Logiover

ArXiv Papers Scraper

leftwinglautus/arxiv-papers-scraper

Search and scrape academic papers from the arXiv API by keyword, category, or author.

Moeeze Hassan

arXiv Papers Scraper

resounding_diplomacy/arxiv-papers-scraper

Scrape academic papers from arXiv by category, keyword, or author. Extract titles, authors, abstracts, PDF URLs, DOIs, categories, and more. Perfect for AI/ML research datasets.

alars num

arXiv Papers Scraper Pro — Research Papers, Authors, Citations

diverse_venture/arxiv-papers-scraper

Search and scrape arXiv research papers. Returns titles, abstracts, authors, categories, DOIs, and PDF download links. Filter by keywords (cat:cs.LG, all:transformer, au:author_name). Up to 500 papers per run. No auth required. Ideal for AI researchers and academic data mining.

Chak Man Fung

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

arXiv Paper Scraper

lulzasaur/arxiv-scraper

Search and scrape arXiv academic papers. Get titles, authors, abstracts, categories, PDF links, DOIs. Search by keyword, browse recent papers by category, or fetch by arXiv ID.

lulz bot

HuggingFace Daily Papers Scraper

tzmyk/huggingface-daily-papers-scraper

Scrapes AI/ML research papers from HuggingFace Daily Papers (huggingface.co/papers). Extracts title, authors, abstract, GitHub repo, star count, upvotes, AI summary, and keywords.

tzmyk

ArXiv Papers Scraper — Research Paper API

fast_api/arxiv-papers-scraper

Search and extract ArXiv research papers as structured JSON: titles, authors, abstracts, categories, dates, PDFs, and metadata. Built for AI research monitoring, literature review, RAG datasets, and academic intelligence.

Fast API

Hugging Face Insights Scraper — Models, Datasets & Spaces

brilliant_gum/huggingface-insights-scraper

Scrape Hugging Face models, datasets, spaces, and daily papers with downloads, likes, parameters, tags, and growth tracking between runs. Filter by pipeline, library, author, or keyword.

Yuliia Kulakova

ArXiv Paper Search

gentle_cloud/arxiv-paper-search

Search and extract academic papers from ArXiv. Find papers by keyword, author, or category with full metadata including title, authors, abstract, categories, and PDF links.

Monkey Coder

Hugging Face Papers Scraper

📄 Hugging Face Papers Scraper

📋 What it does

🎬 Full Demo

⚙️ Input

📊 Output

🧾 Schema

📦 Sample records

✨ Why choose this Actor

📈 How it compares to alternatives

🚀 How to use

💼 Business use cases

📬 Research Newsletters

🧠 Academic Research

📊 Trend Analysis

💼 Talent Scouting

🌟 Beyond business use cases

🎓 Research and academia

🎨 Personal and creative

🤝 Non-profit and civic

🧪 Experimentation

🤖 Ask an AI assistant about this scraper

❓ Frequently Asked Questions

🔌 Automating Hugging Face Papers Scraper

🔌 Integrate with any app

🔗 Recommended Actors

You might also like

Hugging Face Scraper - Models, Datasets, Papers

ArXiv Papers Scraper

arXiv Papers Scraper

arXiv Papers Scraper Pro — Research Papers, Authors, Citations

arXiv Paper Scraper

arXiv Paper Scraper

HuggingFace Daily Papers Scraper

ArXiv Papers Scraper — Research Paper API

Hugging Face Insights Scraper — Models, Datasets & Spaces

ArXiv Paper Search