Papers With Code Scraper - ML Research Paper Data
Pricing
Pay per usage
Papers With Code Scraper - ML Research Paper Data
Scrape Papers With Code research data. Extract papers, benchmarks, datasets, code repos, and state-of-the-art results from ML research.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Donny Nguyen
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Papers with Code Scraper - ML Research Data
What does Papers With Code Scraper - ML Research Data do?
Scrape Papers With Code. Extract ML papers, benchmarks, repos and SOTA results. Export research data to JSON, CSV. This Apify actor automates the data extraction process so you can collect structured data without writing any code. The results are delivered in clean JSON, CSV, or Excel format, ready for analysis, integration, or storage in your database or data warehouse.
Why use Papers With Code Scraper - ML Research Data?
- No coding required — Simply configure your inputs in the Apify Console and click Start. No programming knowledge is needed to extract professional-grade data.
- Export in multiple formats — Download your results as JSON, CSV, Excel, or connect directly via the Apify API for seamless programmatic access to your data.
- Scheduled and automated runs — Set up recurring schedules to keep your data fresh. Run hourly, daily, or weekly with automatic email or webhook notifications when new data is ready.
- Built-in proxy rotation — The actor handles proxy management and rotation automatically to ensure reliable data collection, avoid rate limiting, and maintain high success rates.
- Scalable extraction — Process hundreds or thousands of items in a single run. The actor manages concurrency, retries, error handling, and memory allocation for you.
- Reliable error handling — If individual requests fail, the actor retries them automatically and continues processing the remaining items. You get partial results even if some pages are unavailable.
How to use Papers With Code Scraper - ML Research Data
- Navigate to the Papers With Code Scraper - ML Research Data page on Apify Store and click Try for free to open the actor in Apify Console.
- Configure your input parameters using the visual editor in the Input tab. Set your search terms, URLs, or other parameters according to your needs.
- Click Start to begin the extraction. The actor will run in the Apify cloud and you can monitor progress in real time from the Log tab.
- Once complete, view your results in the Output tab. The data is displayed in a formatted overview table for easy browsing and quick analysis.
- Download your data as JSON, CSV, or Excel using the export buttons, or access it programmatically via the Apify API or direct dataset endpoint URLs.
Input configuration
| Field | Type | Description | Default |
|---|---|---|---|
| Search Query | string | Search term to find papers on Papers With Code (e.g., 'machine learning', 'transformer', 'object detection'). Used when | "machine learning" |
| Direct Paper URLs | array | Optional list of direct Papers With Code paper URLs to scrape (e.g., https://paperswithcode.com/paper/attention-is-all-y | - |
| Max Results | integer | Maximum number of papers to extract from search results. Only applies when using searchQuery. | 100 |
| Use Residential Proxy | boolean | Enable residential proxy for better success rate. Uses more proxy bandwidth but avoids blocks. | false |
Output data
The actor stores results in a structured dataset. Each item in the dataset represents one extracted record and contains the following key fields:
- URL (
url) - Title (
title) - Description (
description) - Data (
data)
Each run also includes a scrapedAt timestamp indicating when the data was collected. You can use this field to track data freshness across multiple runs.
Example output:
{"url": "https://example.com/page","title": "Example title","description": "Example description","data": "Example data","scrapedAt": "2026-02-18T00:00:00.000Z"}
You can preview the data in the formatted Overview table on the Output tab, which displays the most important fields in an easy-to-read format. The full dataset with all fields is available for download or API access.
Cost of usage
This actor is priced using Apify's Pay-Per-Event model. Each successfully extracted result costs approximately $0.003 per item ($3.00 per 1,000 results).
- Extracting 100 results costs approximately $0.30
- Extracting 1,000 results costs approximately $3.00
- On the free Apify plan ($5/month platform credit), you can extract approximately 1,666 results per month
Platform usage costs (compute units for memory and CPU time) are charged separately by Apify at standard rates. Most runs of this actor complete quickly with minimal compute overhead, so the per-event charge represents the majority of the total cost.
Tips and advanced usage
This actor uses lightweight HTTP requests to extract data efficiently. It is fast and uses minimal resources, making it cost-effective for large-scale data extraction. The actor handles request retries, proxy rotation, and rate limiting automatically.
You can schedule this actor to run automatically at regular intervals using Apify Schedules. This is ideal for monitoring price changes, tracking new listings, aggregating fresh data, or keeping your dataset up to date without manual intervention. Schedules support cron expressions for precise timing control.
For large-scale extraction or integration into automated workflows, use the Apify API to start runs programmatically and retrieve results directly into your data pipeline. The actor integrates seamlessly with tools like Google Sheets, Zapier, Make (Integromat), and n8n for building automated data workflows. You can also use webhooks to trigger downstream actions when a run completes successfully.
Related actors
Part of our ai & dev tools data collection suite. See also:
- Huggingface Model Scraper
- Ollama Model Scraper
- Together Ai Model Scraper
- Groq Model Scraper
- Fireworks Ai Model Scraper
Browse all actors: apify.com/donnycodesdefi | GitHub: github.com/donnywin85