Semantic Scholar Scraper avatar

Semantic Scholar Scraper

Pricing

Pay per event

Go to Apify Store
Semantic Scholar Scraper

Semantic Scholar Scraper

Search and extract academic paper data from Semantic Scholar. Find papers, analyze citations, track references. 200M+ papers, no API key needed.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Search and extract academic paper data from Semantic Scholar. Find research papers, analyze citations, and track references. No API key needed.

πŸ“– What does Semantic Scholar Scraper do?

Semantic Scholar Scraper extracts academic research data from Semantic Scholar's database of 200M+ papers. Four modes:

πŸ”Ή Search β€” Find papers by keyword, filter by year, field of study, and citation count πŸ”Ή Details β€” Get full metadata for specific papers by ID or DOI πŸ”Ή Citations β€” List all papers that cite a given paper (who cited this?) πŸ”Ή References β€” List all papers referenced by a given paper (bibliography)

❓ Why use Semantic Scholar Scraper?

πŸ”Ή No API key needed β€” Uses the public Semantic Scholar Academic Graph API πŸ”Ή 200M+ papers β€” Access one of the largest academic paper databases πŸ”Ή Citation analysis β€” Track who cites a paper and build citation graphs πŸ”Ή Open access detection β€” Find papers with free PDF access πŸ”Ή Rich metadata β€” Authors, abstracts, venues, DOIs, ArXiv IDs, fields of study πŸ”Ή Influential citations β€” Distinguish routine citations from influential ones

πŸ’‘ Use cases

πŸ”Ή Literature reviews β€” Find all relevant papers on a topic with citation counts πŸ”Ή Research tracking β€” Monitor new publications in your field πŸ”Ή Citation analysis β€” Build citation networks and find influential papers πŸ”Ή Academic SEO β€” Track citation impact of your publications πŸ”Ή Competitive research β€” Monitor competitor institutions' publications πŸ”Ή Dataset building β€” Create structured datasets of academic literature for ML/NLP

πŸ“Š Sample output

Paper data

FieldExample
titleAttention Is All You Need
year2017
citationCount120000
authorsAshish Vaswani, Noam Shazeer, ...
venueNeurIPS
doi10.48550/arXiv.1706.03762
fieldsOfStudyComputer Science
isOpenAccesstrue

πŸ’° Pricing

EventPrice
Start (per run)$0.005
Paper scraped$0.001

Free plan estimate: ~200 papers per month on the Apify Free plan.

πŸ”’ How to search academic papers

  1. Go to the Semantic Scholar Scraper page on Apify
  2. Select mode (search, details, citations, or references)
  3. Enter keywords or paper IDs
  4. Set filters (year, field of study, minimum citations)
  5. Click "Start" and download results as JSON, CSV, or Excel

πŸ“₯ Input parameters

ParameterTypeDescription
modestringsearch, details, citations, or references
searchTermsstring[]Keywords to search (search mode)
paperIdsstring[]Paper IDs, DOIs, or ArXiv IDs (details/citations/references mode)
yearstringYear filter (e.g. "2023", "2020-2024")
fieldsOfStudystring[]Research fields to filter by
openAccessOnlybooleanOnly papers with free PDF (default: false)
minCitationsnumberMinimum citation count filter
maxResultsnumberMax papers per query (default: 50)

πŸ“€ Output fields

type, paperId, title, year, citationCount, authors, authorIds, abstract, venue, publicationDate, doi, arxivId, url, pdfUrl, pdfLicense, fieldsOfStudy, isOpenAccess, influentialCitationCount, referenceCount, searchTerm, scrapedAt

πŸ’‘ Tips

πŸ”Ή Paper IDs β€” You can use Semantic Scholar IDs (40-char hex), DOIs (prefix with DOI:), ArXiv IDs (prefix with ARXIV:), or Corpus IDs (prefix with CorpusId:). πŸ”Ή Year ranges β€” Use 2020-2024 for a range, or 2023- for 2023 onwards. πŸ”Ή Fields of study β€” Common values: Computer Science, Medicine, Physics, Biology, Chemistry, Mathematics, Economics, Psychology, Engineering. πŸ”Ή Influential citations β€” The influentialCitationCount field counts citations that meaningfully build on the work (not just routine mentions). πŸ”Ή Rate limits β€” Semantic Scholar allows ~1 request/second without an API key. For higher throughput, request a free key at semanticscholar.org/product/api.

πŸ”— Integrations

Export paper data to Google Sheets, Slack, Zapier, Make, or any webhook. Connect via the Apify API for automated research monitoring. Schedule weekly runs to track new publications.

πŸ’» API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/semantic-scholar-scraper').call({
mode: 'search',
searchTerms: ['transformer neural network'],
year: '2023-',
minCitations: 100,
maxResults: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('automation-lab/semantic-scholar-scraper').call(run_input={
'mode': 'citations',
'paperIds': ['649def34f8be52c8b66281af98ae884c09aef38b'],
'maxResults': 100,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl "https://api.apify.com/v2/acts/automation-lab~semantic-scholar-scraper/runs" \
-X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"mode": "search", "searchTerms": ["CRISPR gene editing"], "maxResults": 50}'

βš–οΈ Legality

Semantic Scholar Scraper accesses publicly available data through the official Semantic Scholar Academic Graph API. This API is provided by the Allen Institute for AI (AI2) and is designed for programmatic access to academic paper metadata. All data is derived from public academic publications.

❓ FAQ

Q: Do I need an API key? A: No. The Semantic Scholar API works without authentication, with a rate limit of ~1 request/second. For higher throughput, you can get a free API key from Semantic Scholar.

Q: How many papers are in the database? A: Over 200 million papers from all fields of science, indexed from major publishers, ArXiv, PubMed, and other sources.

Q: Can I search by author? A: The search mode uses keyword matching on titles and abstracts. For author-specific searches, find a paper by that author first, then use the details or citations mode.

Q: What's the difference between citations and references? A: Citations are papers that cite the target paper (who cited this?). References are papers that the target paper cites (its bibliography).

Q: Does this include full paper text? A: No. The API provides metadata (title, abstract, authors, etc.) and the pdfUrl field links to open-access PDFs when available.

πŸ”Ή ArXiv Scraper β€” Search and extract preprints from ArXiv πŸ”Ή CrossRef Scraper β€” Extract DOI metadata and citation data πŸ”Ή OpenAlex Scraper β€” Academic paper data from OpenAlex