Pricing

Pay per usage

Papers With Code Scraper - ML Research Paper Data

Scrape Papers With Code research data. Extract papers, benchmarks, datasets, code repos, and state-of-the-art results from ML research.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny Nguyen

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

Papers with Code Scraper - ML Research Data

What does Papers With Code Scraper - ML Research Data do?

Scrape Papers With Code. Extract ML papers, benchmarks, repos and SOTA results. Export research data to JSON, CSV. This Apify actor automates the data extraction process so you can collect structured data without writing any code. The results are delivered in clean JSON, CSV, or Excel format, ready for analysis, integration, or storage in your database or data warehouse.

Why use Papers With Code Scraper - ML Research Data?

No coding required — Simply configure your inputs in the Apify Console and click Start. No programming knowledge is needed to extract professional-grade data.
Export in multiple formats — Download your results as JSON, CSV, Excel, or connect directly via the Apify API for seamless programmatic access to your data.
Scheduled and automated runs — Set up recurring schedules to keep your data fresh. Run hourly, daily, or weekly with automatic email or webhook notifications when new data is ready.
Built-in proxy rotation — The actor handles proxy management and rotation automatically to ensure reliable data collection, avoid rate limiting, and maintain high success rates.
Scalable extraction — Process hundreds or thousands of items in a single run. The actor manages concurrency, retries, error handling, and memory allocation for you.
Reliable error handling — If individual requests fail, the actor retries them automatically and continues processing the remaining items. You get partial results even if some pages are unavailable.

How to use Papers With Code Scraper - ML Research Data

Navigate to the Papers With Code Scraper - ML Research Data page on Apify Store and click Try for free to open the actor in Apify Console.
Configure your input parameters using the visual editor in the Input tab. Set your search terms, URLs, or other parameters according to your needs.
Click Start to begin the extraction. The actor will run in the Apify cloud and you can monitor progress in real time from the Log tab.
Once complete, view your results in the Output tab. The data is displayed in a formatted overview table for easy browsing and quick analysis.
Download your data as JSON, CSV, or Excel using the export buttons, or access it programmatically via the Apify API or direct dataset endpoint URLs.

Input configuration

Field	Type	Description	Default
Search Query	string	Search term to find papers on Papers With Code (e.g., 'machine learning', 'transformer', 'object detection'). Used when	"machine learning"
Direct Paper URLs	array	Optional list of direct Papers With Code paper URLs to scrape (e.g., https://paperswithcode.com/paper/attention-is-all-y	-
Max Results	integer	Maximum number of papers to extract from search results. Only applies when using searchQuery.	100
Use Residential Proxy	boolean	Enable residential proxy for better success rate. Uses more proxy bandwidth but avoids blocks.	false

Output data

The actor stores results in a structured dataset. Each item in the dataset represents one extracted record and contains the following key fields:

URL (url)
Title (title)
Description (description)
Data (data)

Each run also includes a scrapedAt timestamp indicating when the data was collected. You can use this field to track data freshness across multiple runs.

Example output:

{
  "url": "https://example.com/page",
  "title": "Example title",
  "description": "Example description",
  "data": "Example data",
  "scrapedAt": "2026-02-18T00:00:00.000Z"
}

You can preview the data in the formatted Overview table on the Output tab, which displays the most important fields in an easy-to-read format. The full dataset with all fields is available for download or API access.

Cost of usage

This actor is priced using Apify's Pay-Per-Event model. Each successfully extracted result costs approximately $0.003 per item ($3.00 per 1,000 results).

Extracting 100 results costs approximately $0.30
Extracting 1,000 results costs approximately $3.00
On the free Apify plan ($5/month platform credit), you can extract approximately 1,666 results per month

Platform usage costs (compute units for memory and CPU time) are charged separately by Apify at standard rates. Most runs of this actor complete quickly with minimal compute overhead, so the per-event charge represents the majority of the total cost.

Tips and advanced usage

This actor uses lightweight HTTP requests to extract data efficiently. It is fast and uses minimal resources, making it cost-effective for large-scale data extraction. The actor handles request retries, proxy rotation, and rate limiting automatically.

You can schedule this actor to run automatically at regular intervals using Apify Schedules. This is ideal for monitoring price changes, tracking new listings, aggregating fresh data, or keeping your dataset up to date without manual intervention. Schedules support cron expressions for precise timing control.

For large-scale extraction or integration into automated workflows, use the Apify API to start runs programmatically and retrieve results directly into your data pipeline. The actor integrates seamlessly with tools like Google Sheets, Zapier, Make (Integromat), and n8n for building automated data workflows. You can also use webhooks to trigger downstream actions when a run completes successfully.

Part of our ai & dev tools data collection suite. See also:

Browse all actors: apify.com/donnycodesdefi | GitHub: github.com/donnywin85

Ai-ML-scraper

labrat011/ai-ml-scraper

Search AI/ML models, research papers, and trending papers from HuggingFace Hub and arXiv. No API key required.

Mick

arXiv Scraper

artificially/arxiv-scraper

Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.

Artificially

GitHub Code Dataset Builder

consummate_mandala/github-code-dataset-builder

Build code datasets from GitHub repos. Extract files by language, license, stars, and topics for code LLM training.

Donny Nguyen

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

Ai Code Review

vivid_astronaut/ai-code-review

Fabio Suizu

HuggingFaceTP

aligned_tripod/huggingfacetp

Scrapes trending research papers from HuggingFace, capturing each paper’s title, description, and URL. The scraper collects data from the listing page and visits individual paper pages for full abstracts, providing a structured dataset of the latest AI research.

amazing

Ai Training Data Curator

lanky_quantifier/ai-training-data-curator

Curate high-quality training datasets for AI/ML models. Extract, clean & format text data from websites, papers & forums. Perfect for LLM training, RAG systems & research.

Vhub Systems

Dataset to HuggingFace

flamboyant_leaf/DatasetToHuggingFace

Transfers data from Apify datasets to Hugging Face datasets. Bridges web scraping with ML platforms, enabling access to pre-trained models and collaborative tools. Customize transfer limits, streamline ML workflows, and leverage data versioning. Ideal for data scientists and ML researchers.