Semantic Scholar Scraper
Pricing
Pay per event
Semantic Scholar Scraper
Search and extract academic paper data from Semantic Scholar. Find papers, analyze citations, track references. 200M+ papers, no API key needed.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Share
Search and extract academic paper data from Semantic Scholar. Find research papers, analyze citations, and track references. No API key needed.
📖 What does Semantic Scholar Scraper do?
Semantic Scholar Scraper extracts academic research data from Semantic Scholar's database of 200M+ papers. Four modes:
🔹 Search — Find papers by keyword, filter by year, field of study, and citation count 🔹 Details — Get full metadata for specific papers by ID or DOI 🔹 Citations — List all papers that cite a given paper (who cited this?) 🔹 References — List all papers referenced by a given paper (bibliography)
❓ Why use Semantic Scholar Scraper?
🔹 No API key needed — Uses the public Semantic Scholar Academic Graph API 🔹 200M+ papers — Access one of the largest academic paper databases 🔹 Citation analysis — Track who cites a paper and build citation graphs 🔹 Open access detection — Find papers with free PDF access 🔹 Rich metadata — Authors, abstracts, venues, DOIs, ArXiv IDs, fields of study 🔹 Influential citations — Distinguish routine citations from influential ones
💡 Use cases
🔹 Literature reviews — Find all relevant papers on a topic with citation counts 🔹 Research tracking — Monitor new publications in your field 🔹 Citation analysis — Build citation networks and find influential papers 🔹 Academic SEO — Track citation impact of your publications 🔹 Competitive research — Monitor competitor institutions' publications 🔹 Dataset building — Create structured datasets of academic literature for ML/NLP
📊 Sample output
Paper data
| Field | Example |
|---|---|
| title | Attention Is All You Need |
| year | 2017 |
| citationCount | 120000 |
| authors | Ashish Vaswani, Noam Shazeer, ... |
| venue | NeurIPS |
| doi | 10.48550/arXiv.1706.03762 |
| fieldsOfStudy | Computer Science |
| isOpenAccess | true |
💰 How much does it cost to scrape Semantic Scholar?
| Event | Price |
|---|---|
| Start (per run) | $0.005 |
| Paper scraped | $0.001 |
Free plan estimate: ~200 papers per month on the Apify Free plan.
🔢 How to search academic papers
- Go to the Semantic Scholar Scraper page on Apify
- Select mode (search, details, citations, or references)
- Enter keywords or paper IDs
- Set filters (year, field of study, minimum citations)
- Click "Start" and download results as JSON, CSV, or Excel
📥 Input parameters
| Parameter | Type | Description |
|---|---|---|
| mode | string | search, details, citations, or references |
| searchTerms | string[] | Keywords to search (search mode) |
| paperIds | string[] | Paper IDs, DOIs, or ArXiv IDs (details/citations/references mode) |
| year | string | Year filter (e.g. "2023", "2020-2024") |
| fieldsOfStudy | string[] | Research fields to filter by |
| openAccessOnly | boolean | Only papers with free PDF (default: false) |
| minCitations | number | Minimum citation count filter |
| maxResults | number | Max papers per query (default: 50) |
📤 Output fields
type, paperId, title, year, citationCount, authors, authorIds, abstract, venue, publicationDate, doi, arxivId, url, pdfUrl, pdfLicense, fieldsOfStudy, isOpenAccess, influentialCitationCount, referenceCount, searchTerm, scrapedAt
💡 Tips
🔹 Paper IDs — You can use Semantic Scholar IDs (40-char hex), DOIs (prefix with DOI:), ArXiv IDs (prefix with ARXIV:), or Corpus IDs (prefix with CorpusId:).
🔹 Year ranges — Use 2020-2024 for a range, or 2023- for 2023 onwards.
🔹 Fields of study — Common values: Computer Science, Medicine, Physics, Biology, Chemistry, Mathematics, Economics, Psychology, Engineering.
🔹 Influential citations — The influentialCitationCount field counts citations that meaningfully build on the work (not just routine mentions).
🔹 Rate limits — Semantic Scholar allows ~1 request/second without an API key. For higher throughput, request a free key at semanticscholar.org/product/api.
🔗 Integrations
Export paper data to Google Sheets, Slack, Zapier, Make, or any webhook. Connect via the Apify API for automated research monitoring. Schedule weekly runs to track new publications.
💻 API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_TOKEN' });const run = await client.actor('automation-lab/semantic-scholar-scraper').call({mode: 'search',searchTerms: ['transformer neural network'],year: '2023-',minCitations: 100,maxResults: 50,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_TOKEN')run = client.actor('automation-lab/semantic-scholar-scraper').call(run_input={'mode': 'citations','paperIds': ['649def34f8be52c8b66281af98ae884c09aef38b'],'maxResults': 100,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
cURL
curl "https://api.apify.com/v2/acts/automation-lab~semantic-scholar-scraper/runs" \-X POST -H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_TOKEN" \-d '{"mode": "search", "searchTerms": ["CRISPR gene editing"], "maxResults": 50}'
⚖️ Legality
Semantic Scholar Scraper accesses publicly available data through the official Semantic Scholar Academic Graph API. This API is provided by the Allen Institute for AI (AI2) and is designed for programmatic access to academic paper metadata. All data is derived from public academic publications.
Use with AI agents via MCP
Semantic Scholar Scraper is available as a tool for AI assistants via the Model Context Protocol (MCP).
Setup for Claude Code
$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/semantic-scholar-scraper"
Setup for Claude Desktop, Cursor, or VS Code
Add this to your MCP config file:
{"mcpServers": {"apify": {"url": "https://mcp.apify.com?tools=automation-lab/semantic-scholar-scraper"}}}
Example prompts
- "Find highly-cited papers about 'attention mechanism'"
- "Search Semantic Scholar for NLP papers from 2025"
- "Get all papers that cite the original transformer paper using Semantic Scholar"
How do I do a systematic literature review using automation?
Use Semantic Scholar Scraper in "search" mode to find all papers matching your topic keywords, filtering by year range and minimum citation count to surface the most relevant work. Then switch to "citations" mode for your key foundational papers to find all downstream work that builds on them. For each important paper, run "references" mode to extract its full bibliography. Export to CSV or JSON and load into a reference manager or spreadsheet. Scheduling weekly runs lets you automatically track new publications as they appear — critical for fast-moving fields like AI and biomedicine.
What is the difference between citation count and influential citation count?
The citationCount field counts all papers that cite a given work, regardless of how it is used. The influentialCitationCount field counts only citations where the citing paper meaningfully builds on, uses, or extends the methods of the referenced work — as determined by Semantic Scholar's classification model. A high influential citation count indicates a foundational paper that other researchers actively build on, rather than one that is merely mentioned in passing. For identifying truly important work in a field, influentialCitationCount is a more meaningful signal than raw citation count.
How do I find open-access papers on a research topic?
Set openAccessOnly: true in the input parameters and the scraper will return only papers with a free PDF available. The output pdfUrl field gives you the direct download link for each paper. This is especially useful for building training datasets, automated literature pipelines, or research tools where you need actual paper content, not just metadata. Semantic Scholar indexes open-access papers from ArXiv, PubMed Central, and publisher agreements — covering a large fraction of recent scientific output. For preprints specifically, combine with ArXiv Scraper for complete coverage.
Can I build a citation graph or knowledge network from this data?
Yes. Start with a root paper using "details" mode to get its paper ID, then run "citations" and "references" modes to get the connected papers. Each result includes its own paper ID, so you can recursively expand the graph to any depth. Export each layer as JSON and use a graph tool like Gephi, NetworkX, or Neo4j to visualize and analyze the citation network. This is a common technique for mapping the intellectual lineage of a research field, identifying key papers that bridge sub-communities, and finding research gaps.
FAQ
Q: Do I need an API key? A: No. The Semantic Scholar API works without authentication, with a rate limit of ~1 request/second. For higher throughput, you can get a free API key from Semantic Scholar.
Q: How many papers are in the database? A: Over 200 million papers from all fields of science, indexed from major publishers, ArXiv, PubMed, and other sources.
Q: Can I search by author? A: The search mode uses keyword matching on titles and abstracts. For author-specific searches, find a paper by that author first, then use the details or citations mode.
Q: What's the difference between citations and references? A: Citations are papers that cite the target paper (who cited this?). References are papers that the target paper cites (its bibliography).
Q: Does this include full paper text?
A: No. The API provides metadata (title, abstract, authors, etc.) and the pdfUrl field links to open-access PDFs when available.
Q: The scraper returns a "429 Too Many Requests" error.
A: Semantic Scholar's public API limits requests to approximately 1 per second. The scraper handles this automatically with retries, but very large scrapes (1000+ papers) may occasionally hit rate limits. If this persists, try reducing maxResults or splitting across multiple runs.
Q: Why are some paper abstracts missing?
A: Not all papers in Semantic Scholar have abstracts indexed. This is especially common for older papers or papers from publishers who do not share abstract text. The abstract field will be null for these entries.
Q: How can I use this for automated literature review or research pipeline? A: Combine search mode with the references and citations modes to build comprehensive citation networks. Schedule regular runs to track newly-published papers in your field, export results to a knowledge management system, and use the citation data to identify influential research directions and research gaps.
Other academic research scrapers
🔹 ArXiv Scraper — Search and extract preprints from ArXiv 🔹 CrossRef Scraper — Extract DOI metadata and citation data 🔹 OpenAlex Scraper — Academic paper data from OpenAlex 🔹 ClinicalTrials Scraper — Extract clinical trial data from ClinicalTrials.gov

