HuggingFaceTP
Pricing
from $0.01 / 1,000 results
HuggingFaceTP
Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page and visits individual paper pages for full abstracts, providing a structured dataset of the latest AI research.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

amazing
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
HuggingFace Trending Papers Scraper
A lightweight and fast web scraper built on Apify that extracts trending AI research papers from the HuggingFace Papers Trending page. It collects essential research details by scraping both the listing page and individual paper pages for complete data.
🚀 Features
- ✅ Scrapes trending AI/ML research papers from HuggingFace
- ✅ Extracts paper titles, authors, abstracts, and publication dates
- ✅ Collects paper URLs and direct links to research papers
- ✅ Fast and efficient scraping with Playwright
- ✅ Easy to use via Apify Console
- ✅ Exports data in JSON, CSV, or Excel format
- ✅ Configurable number of papers to scrape
📊 Data Extracted
The scraper collects the following information for each paper:
| Field | Description |
|---|---|
| Paper Title | Full title of the research paper |
| Authors | List of paper authors |
| Abstract | Paper abstract/summary |
| Publication Date | When the paper was published |
| Paper URL | Link to the HuggingFace paper page |
| ArXiv URL | Direct link to the paper on ArXiv (if available) |
| Upvotes | Number of upvotes on HuggingFace |
| Comments | Number of comments/discussions |
| Scraped At | Timestamp when data was collected |
🛠️ How to Use
Option 1: Using Apify Console (No Coding Required)
-
Create an Apify Account
- Go to apify.com and sign up for free
-
Import This Actor
- Click on Actors → Create new
- Choose this actor from the store or import via GitHub
-
Configure Input
- Set Max Papers (default: 50)
- Optionally adjust other settings
-
Run the Actor
- Click the Start button
- Wait for the scraper to complete (usually 1-3 minutes)
-
Download Results
- Go to Dataset tab
- Click Export and choose your format (CSV, JSON, Excel)
Option 2: Using Apify API
const ApifyClient = require('apify-client');const client = new ApifyClient({token: 'YOUR_APIFY_TOKEN',});const input = {maxPapers: 30,};const run = await client.actor('YOUR_ACTOR_ID').call(input);const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Option 3: Scheduled Runs
Set up automatic daily/weekly scraping:
- Go to Schedules in Apify Console
- Click Create new
- Select this actor
- Choose frequency (daily, weekly, etc.)
- Save and activate
⚙️ Configuration Options
Input Parameters
{"maxPapers": 50,"startUrls": [{"url": "https://huggingface.co/papers"}],"proxyConfiguration": {"useApifyProxy": true}}
| Parameter | Type | Default | Description |
|---|---|---|---|
maxPapers | Number | 50 | Maximum number of papers to scrape |
startUrls | Array | HuggingFace Papers | URLs to start scraping from |
proxyConfiguration | Object | Apify Proxy | Proxy settings to avoid blocking |
📦 Output Format
JSON Example
[{"Paper Title": "Attention Is All You Need","Authors": "Vaswani et al.","Abstract": "The dominant sequence transduction models...","Publication Date": "2023-12-01","Paper URL": "https://huggingface.co/papers/1706.03762","ArXiv URL": "https://arxiv.org/abs/1706.03762","Upvotes": 1250,"Comments": 45,"Scraped At": "2025-12-06T09:45:00.000Z"}]
CSV Example
Paper Title,Authors,Abstract,Publication Date,Paper URL,ArXiv URL,Upvotes,Comments,Scraped At"Attention Is All You Need","Vaswani et al.","The dominant sequence...","2023-12-01","https://huggingface.co/papers/1706.03762","https://arxiv.org/abs/1706.03762",1250,45,"2025-12-06T09:45:00.000Z"
🔧 Technical Details
Built With
- Apify SDK - Actor framework
- Crawlee - Web crawling and scraping library
- Playwright - Headless browser automation
- Cheerio - HTML parsing
Requirements
- Node.js 18+
- Apify account (free tier available)
📈 Use Cases
- Research Tracking: Stay updated with trending AI research
- Content Curation: Aggregate papers for newsletters or blogs
- Academic Monitoring: Track specific research areas
- Data Analysis: Analyze trends in AI/ML research
- Literature Review: Collect papers for research projects
🚨 Rate Limiting & Best Practices
- The scraper uses Apify proxy by default to avoid blocking
- Respects HuggingFace's robots.txt
- Implements reasonable delays between requests
- Recommended: Run no more than once per hour
🐛 Troubleshooting
No Data Scraped
- Check if HuggingFace changed their page structure
- Verify proxy settings are enabled
- Increase wait time in settings
Partial Data
- Some papers may not have all fields available
- The scraper handles missing data gracefully
Actor Fails
- Check the logs in the Run tab
- Ensure you have sufficient Apify credits
- Try reducing
maxPapersvalue
📝 Example Use Case: Daily AI Research Digest
- Schedule the actor to run daily at 9 AM
- Connect to Zapier/Make to send results to:
- Notion database
- Google Sheets
- Slack channel
- Email digest
- Filter papers by keywords in your own processing pipeline
🤝 Contributing
Found a bug or want to suggest improvements?
- Open an issue in the repository
- Submit a pull request
- Contact support via Apify Console
📄 License
This actor is provided as-is under the MIT License.
🔗 Links
💡 Tips
- Combine with other scrapers: Use alongside arXiv or Google Scholar scrapers for comprehensive coverage
- Set up alerts: Use Apify webhooks to get notified when new papers are found
- Custom filtering: Process the output with your own scripts to filter by topics/authors
- Data enrichment: Combine with citation APIs to get paper impact metrics
Note: This scraper is for educational and research purposes. Always respect website terms of service and rate limits. Use responsibly! 🎓
Last Updated: December 2025