Pricing

Pay per event

Semantic Scholar Scraper

Search and extract academic paper data from Semantic Scholar. Find papers, analyze citations, track references. 200M+ papers, no API key needed.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

📖 What does Semantic Scholar Scraper do?

Semantic Scholar Scraper extracts academic research data from Semantic Scholar's database of 200M+ papers. Four modes:

🔹 Search — Find papers by keyword, filter by year, field of study, and citation count 🔹 Details — Get full metadata for specific papers by ID or DOI 🔹 Citations — List all papers that cite a given paper (who cited this?) 🔹 References — List all papers referenced by a given paper (bibliography)

❓ Why use Semantic Scholar Scraper?

🔹 No API key needed for most modes — Details, Citations, and References use public endpoints. Search mode requires a free API key. 🔹 200M+ papers — Access one of the largest academic paper databases 🔹 Citation analysis — Track who cites a paper and build citation graphs 🔹 Open access detection — Find papers with free PDF access 🔹 Rich metadata — Authors, abstracts, venues, DOIs, ArXiv IDs, fields of study 🔹 Influential citations — Distinguish routine citations from influential ones

💡 Use cases

🔹 Literature reviews — Find all relevant papers on a topic with citation counts 🔹 Research tracking — Monitor new publications in your field 🔹 Citation analysis — Build citation networks and find influential papers 🔹 Academic SEO — Track citation impact of your publications 🔹 Competitive research — Monitor competitor institutions' publications 🔹 Dataset building — Create structured datasets of academic literature for ML/NLP

📊 Sample output

Paper data

Field	Example
title	Attention Is All You Need
year	2017
citationCount	120000
authors	Ashish Vaswani, Noam Shazeer, ...
venue	NeurIPS
doi	10.48550/arXiv.1706.03762
fieldsOfStudy	Computer Science
isOpenAccess	true

💰 How much does it cost to scrape Semantic Scholar?

Event	Price
Start (per run)	$0.005
Paper scraped	$0.001

Free plan estimate: ~200 papers per month on the Apify Free plan.

🔢 How to search academic papers

Go to the Semantic Scholar Scraper page on Apify
Select mode (search, details, citations, or references)
Enter keywords or paper IDs
Set filters (year, field of study, minimum citations)
Click "Start" and download results as JSON, CSV, or Excel

📥 Input parameters

Parameter	Type	Description
mode	string	search, details, citations, or references
searchTerms	string[]	Keywords to search (search mode)
paperIds	string[]	Paper IDs, DOIs, or ArXiv IDs (details/citations/references mode)
year	string	Year filter (e.g. "2023", "2020-2024")
fieldsOfStudy	string[]	Research fields to filter by
openAccessOnly	boolean	Only papers with free PDF (default: false)
minCitations	number	Minimum citation count filter
maxResults	number	Max papers per query (default: 50)

📤 Output fields

type, paperId, title, year, citationCount, authors, authorIds, abstract, venue, publicationDate, doi, arxivId, url, pdfUrl, pdfLicense, fieldsOfStudy, isOpenAccess, influentialCitationCount, referenceCount, searchTerm, scrapedAt

💡 Tips

🔹 Paper IDs — You can use Semantic Scholar IDs (40-char hex), DOIs (prefix with DOI:), ArXiv IDs (prefix with ARXIV:), or Corpus IDs (prefix with CorpusId:). 🔹 Year ranges — Use 2020-2024 for a range, or 2023- for 2023 onwards. 🔹 Fields of study — Common values: Computer Science, Medicine, Physics, Biology, Chemistry, Mathematics, Economics, Psychology, Engineering. 🔹 Influential citations — The influentialCitationCount field counts citations that meaningfully build on the work (not just routine mentions). 🔹 Rate limits — Semantic Scholar allows ~1 request/second without an API key. For higher throughput, request a free key at semanticscholar.org/product/api.

🔗 Integrations

Export paper data to Google Sheets, Slack, Zapier, Make, or any webhook. Connect via the Apify API for automated research monitoring. Schedule weekly runs to track new publications.

💻 API usage

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/semantic-scholar-scraper').call({
    mode: 'search',
    searchTerms: ['transformer neural network'],
    year: '2023-',
    minCitations: 100,
    maxResults: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient

client = ApifyClient('YOUR_TOKEN')
run = client.actor('automation-lab/semantic-scholar-scraper').call(run_input={
    'mode': 'citations',
    'paperIds': ['649def34f8be52c8b66281af98ae884c09aef38b'],
    'maxResults': 100,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl "https://api.apify.com/v2/acts/automation-lab~semantic-scholar-scraper/runs" \
  -X POST -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"mode": "search", "searchTerms": ["CRISPR gene editing"], "maxResults": 50}'

⚖️ Legality

Semantic Scholar Scraper accesses publicly available data through the official Semantic Scholar Academic Graph API. This API is provided by the Allen Institute for AI (AI2) and is designed for programmatic access to academic paper metadata. All data is derived from public academic publications.

Use with AI agents via MCP

Semantic Scholar Scraper is available as a tool for AI assistants via the Model Context Protocol (MCP).

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/semantic-scholar-scraper"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com?tools=automation-lab/semantic-scholar-scraper"
        }
    }
}

Example prompts

"Find highly-cited papers about 'attention mechanism'"
"Search Semantic Scholar for NLP papers from 2025"
"Get all papers that cite the original transformer paper using Semantic Scholar"

How do I do a systematic literature review using automation?

Use Semantic Scholar Scraper in "search" mode to find all papers matching your topic keywords, filtering by year range and minimum citation count to surface the most relevant work. Then switch to "citations" mode for your key foundational papers to find all downstream work that builds on them. For each important paper, run "references" mode to extract its full bibliography. Export to CSV or JSON and load into a reference manager or spreadsheet. Scheduling weekly runs lets you automatically track new publications as they appear — critical for fast-moving fields like AI and biomedicine.

What is the difference between citation count and influential citation count?

The citationCount field counts all papers that cite a given work, regardless of how it is used. The influentialCitationCount field counts only citations where the citing paper meaningfully builds on, uses, or extends the methods of the referenced work — as determined by Semantic Scholar's classification model. A high influential citation count indicates a foundational paper that other researchers actively build on, rather than one that is merely mentioned in passing. For identifying truly important work in a field, influentialCitationCount is a more meaningful signal than raw citation count.

How do I find open-access papers on a research topic?

Set openAccessOnly: true in the input parameters and the scraper will return only papers with a free PDF available. The output pdfUrl field gives you the direct download link for each paper. This is especially useful for building training datasets, automated literature pipelines, or research tools where you need actual paper content, not just metadata. Semantic Scholar indexes open-access papers from ArXiv, PubMed Central, and publisher agreements — covering a large fraction of recent scientific output. For preprints specifically, combine with ArXiv Scraper for complete coverage.

Can I build a citation graph or knowledge network from this data?

Yes. Start with a root paper using "details" mode to get its paper ID, then run "citations" and "references" modes to get the connected papers. Each result includes its own paper ID, so you can recursively expand the graph to any depth. Export each layer as JSON and use a graph tool like Gephi, NetworkX, or Neo4j to visualize and analyze the citation network. This is a common technique for mapping the intellectual lineage of a research field, identifying key papers that bridge sub-communities, and finding research gaps.

FAQ

Q: Do I need an API key? A: No. The Semantic Scholar API works without authentication, with a rate limit of ~1 request/second. For higher throughput, you can get a free API key from Semantic Scholar.

Q: How many papers are in the database? A: Over 200 million papers from all fields of science, indexed from major publishers, ArXiv, PubMed, and other sources.

Q: Can I search by author? A: The search mode uses keyword matching on titles and abstracts. For author-specific searches, find a paper by that author first, then use the details or citations mode.

Q: What's the difference between citations and references? A: Citations are papers that cite the target paper (who cited this?). References are papers that the target paper cites (its bibliography).

Q: Does this include full paper text? A: No. The API provides metadata (title, abstract, authors, etc.) and the pdfUrl field links to open-access PDFs when available.

Q: The scraper returns a "429 Too Many Requests" error. A: Semantic Scholar's public API limits requests to approximately 1 per second. The scraper handles this automatically with retries, but very large scrapes (1000+ papers) may occasionally hit rate limits. If this persists, try reducing maxResults or splitting across multiple runs.

Q: Why are some paper abstracts missing? A: Not all papers in Semantic Scholar have abstracts indexed. This is especially common for older papers or papers from publishers who do not share abstract text. The abstract field will be null for these entries.

Q: How can I use this for automated literature review or research pipeline? A: Combine search mode with the references and citations modes to build comprehensive citation networks. Schedule regular runs to track newly-published papers in your field, export results to a knowledge management system, and use the citation data to identify influential research directions and research gaps.

Other academic research scrapers

🔹 ArXiv Scraper — Search and extract preprints from ArXiv 🔹 CrossRef Scraper — Extract DOI metadata and citation data 🔹 OpenAlex Scraper — Academic paper data from OpenAlex 🔹 ClinicalTrials Scraper — Extract clinical trial data from ClinicalTrials.gov

Semantic Scholar Academic Paper Scraper

cloud9_ai/semantic-scholar-scraper

Search and extract academic papers, citations, and authors from Semantic Scholar. 200M+ papers with citation graphs and impact metrics. Search papers, get detailed paper info, or find researchers. API key optional. For research and AI.

cloud9

Semantic Scholar Search Scraper

powerai/semantic-scholar-search-scraper

Scrape academic papers from Semantic Scholar by keyword search, with automatic pagination and comprehensive research data extraction.

PowerAI

Semantic Scholar Paper Search

ryanclinton/semantic-scholar-search

Search and extract academic research papers from Semantic Scholar's database of over 200 million publications.

Ryan Clinton

Google Scholar Scraper - Academic Papers & Citations

klondikeking/google-scholar-scraper-v2

Extract academic papers, citations, authors, and PDF links from Google Scholar.

Pierrick McD0nald

Semantic Scholar Scraper - Cheap 📚🔎🤖

scrapestorm/semantic-scholar-scraper---cheap

🔎 Easily collect research papers from Semantic Scholar Provide one or multiple search keywords, paper URLs or author profiles and extract structured academic data such as 📄 Paper Title👨‍🔬 Authors 📅 Publication Year 🔗 Paper URL & more Perfect for academic research & AI research monitoring 📚

Storm_Scraper

5.0

Google Scholar Scraper

codingfrontend/google-scholar-scraper

Scrape academic papers, citations, and author profiles from Google Scholar

codingfrontend

Semantic Scholar Scraper

openclawmara/semantic-scholar-scraper

Scrape Semantic Scholar for academic papers, citations, abstracts, and author profiles. Search by topic, author, or venue. Extract citation graphs, reference lists, and research trends. Essential for literature reviews, academic research, and AI/ML paper discovery.

OpenClaw Mara

Free Google Scholar Scraper — Papers + Citations

s-r/free-google-scholar-scraper

Semantic Scholar Paper Scraper

agenscrape/semantic-scholar-paper-scraper

Scrape academic papers from Semantic Scholar. Search by keyword and extract paper titles, abstracts, authors, citation counts, publication dates, DOIs, open access PDFs... Perfect for literature reviews, citation analysis, and research databases. Real time data output with pagination support.

Agenscrape

Semantic Scholar Scraper

parseforge/semantic-scholar-scraper

Extract detailed academic paper data from Semantic Scholar, including abstracts, citations, authors, and publication details. Ideal for researchers, academics, and analysts who need structured scholarly data for literature reviews, research workflows, and large-scale academic analysis.