Pricing

from $6.00 / 1,000 results

Semantic Scholar Scraper

[💰 $6 / 1K] Extract academic papers, abstracts, citations, references, authors, and open-access PDF links from Semantic Scholar's 200M+ database. Search by keyword, paper ID/DOI/URL, or author. Filter by year, field, and citations. No API key.

Pricing

from $6.00 / 1,000 results

Rating

0.0

(0)

Developer

SolidCode

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Why This Scraper?

Three ways in, one dataset — keyword search across titles and abstracts, direct fetch by Semantic Scholar paper ID / DOI / arXiv ID / CorpusId / PMID / URL, and author lookup by ID or profile URL. Mix all three in a single run.
200M+ papers across 23 fields of study — filter to any combination of Computer Science, Medicine, Biology, Physics, Economics, Mathematics, Law, Linguistics, and 15 more — exact filters, not fuzzy "suggestions".
12 publication-type filters — narrow to peer-reviewed JournalArticle, Review, MetaAnalysis, ClinicalTrial, Conference, Dataset, Book, and more for systematic-review-grade precision.
Citation + reference graph expansion — opt in to pull every paper that cites a work, or every paper it references, as separate rows — capped per paper so even "Attention Is All You Need" stays bounded.
Author profiles with h-index — name, affiliations, paper count, total citations, h-index, and homepage as first-class records, plus an opt-in full publication list per author.
Identifier-rich rows — every paper carries its DOI, arXiv ID, native paper ID, author IDs, influential-citation count, and canonical URL, so you can join against PubMed, Crossref, or arXiv downstream.
Direct open-access PDF links — openAccessPdfUrl and an isOpenAccess flag surface free full text on every eligible paper, with an open-access-only filter to keep just the downloadable ones.
High-impact filtering — minimum-citation-count, year-range, and sort-by-citations-or-date controls let you surface the most-cited or most-recent work in a field instantly.
No API key, no sign-up — go from a keyword or a DOI to a structured dataset of up to 10,000 papers per query.

Use Cases

Literature Reviews & Systematic Reviews

Assemble a complete, deduplicated reading list for a new topic in minutes
Filter to Review and MetaAnalysis types for evidence-synthesis projects
Restrict to open-access PDFs to build a downloadable full-text corpus

Research Trend Analysis

Track publication volume in a field across a year range
Surface the most-cited papers of the last two years with citation sorting
Detect emerging sub-fields from a burst of recent open-access work

Citation Network Mapping

Expand a seminal paper's citing-paper graph to find follow-up research
Pull a paper's reference list to trace its intellectual lineage
Build directed citation edges between papers for bibliometric graphs

Competitive Research Intelligence

Monitor what a lab or institution is publishing by author ID
Benchmark researcher output with h-index, paper count, and total citations
Quantify a topic's influence with influential-citation counts

Academic Lead Generation

Find domain experts to quote, interview, or recruit via author profiles
Pull affiliations and homepages for outreach to corresponding researchers
Identify rising authors by citation growth in a specific field

Dataset Building for Machine Learning

Harvest titles + abstracts at scale for NLP and recommendation models
Build labeled corpora filtered by field of study and publication type
Collect open-access PDF links for full-text mining pipelines

Getting Started

Basic Keyword Search

The simplest run — one topic, 100 papers:

{
    "searchQueries": ["large language models"],
    "maxResults": 100
}

Filtered Search (Year + Field + Open Access)

Narrow to recent, high-impact, open-access computer science work and sort by citations:

{
    "searchQueries": ["retrieval augmented generation"],
    "yearFrom": 2023,
    "yearTo": 2025,
    "fieldsOfStudy": ["Computer Science"],
    "publicationTypes": ["JournalArticle", "Conference"],
    "openAccessOnly": true,
    "minCitationCount": 25,
    "sortBy": "citationCount",
    "maxResults": 200
}

Direct Fetch with Citation + Reference Graph

Fetch exact papers by DOI and arXiv ID, then pull who cites them and what they reference:

{
    "paperIds": [
        "10.1038/nature14539",
        "arXiv:1706.03762",
        "https://www.semanticscholar.org/paper/204e3073870fae3d05bcbc2f6a8e263d9b72e776"
    ],
    "includeCitations": true,
    "includeReferences": true,
    "maxCitationsPerPaper": 50
}

Author Profile Lookup

Pull author profiles by ID or URL, with their full publication lists:

{
    "authorIds": ["1741101", "https://www.semanticscholar.org/author/2061296"],
    "includeAuthorPapers": true,
    "maxResults": 200
}

To find an author ID, open any Semantic Scholar author page and copy the number after /author/ in the URL.

Input Reference

Search & Input

Parameter	Type	Default	Description
`searchQueries`	string[]	`["large language models"]`	Keywords searched across paper titles and abstracts. Each query produces its own result set.
`paperIds`	string[]	`[]`	Fetch exact papers by Semantic Scholar paper ID, DOI, arXiv ID, CorpusId, PMID, or paper URL. One record per paper.
`authorIds`	string[]	`[]`	Author IDs (numeric) or full profile URLs. Returns an author-profile record with name, affiliations, h-index, and citation count.
`maxResults`	integer	`100`	Maximum papers per search query — an exact cap on what you are charged. Set to `0` for all available results (capped at 10,000 per query).

Filters

Filters apply to search queries only, not to directly-fetched papers or authors.

Parameter	Type	Default	Description
`yearFrom`	integer	null	Only include papers published in this year or later (1900–2100).
`yearTo`	integer	null	Only include papers published in this year or earlier (1900–2100).
`fieldsOfStudy`	string[]	`[]`	Restrict to one or more of 23 research fields (Computer Science, Medicine, Biology, Physics, Economics, and more).
`publicationTypes`	string[]	`[]`	Restrict to one or more of 12 types: Review, JournalArticle, CaseReport, ClinicalTrial, Conference, Dataset, Editorial, LettersAndComments, MetaAnalysis, News, Study, Book.
`openAccessOnly`	boolean	`false`	Only return papers with a free, downloadable open-access PDF.
`minCitationCount`	integer	null	Only return papers cited at least this many times — ideal for surfacing high-impact work.
`sortBy`	string	`"relevance"`	`"Relevance"` (default order), `"Most cited first"` (by citation count), or `"Most recent first"` (by publication date).

Output Options

The citation, reference, and author-paper expansions each add one row per child item, which multiplies your result count and cost — leave them off unless you need the full graph.

Parameter	Type	Default	Description
`includeAbstracts`	boolean	`true`	Include the abstract text for each paper. Disable to shrink the dataset.
`includeReferences`	boolean	`false`	For each paper, also output the papers it cites (its reference list) as separate records.
`includeCitations`	boolean	`false`	For each paper, also output the papers that cite it as separate records.
`maxCitationsPerPaper`	integer	`50`	Caps how many citing/referenced papers are fetched per source paper (1–1000) when expansion is on.
`includeAuthorPapers`	boolean	`false`	When you provide author IDs, also output each author's publications as separate paper records.

Output

Every row carries a recordType field — paper or author — so you can filter cleanly downstream. The dataset ships with two ready-made views: Papers and Author profiles.

Paper (`recordType: "paper"`)

{
    "recordType": "paper",
    "paperId": "204e3073870fae3d05bcbc2f6a8e263d9b72e776",
    "title": "Attention Is All You Need",
    "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
    "authors": [
        {"authorId": "40348417", "name": "Ashish Vaswani"},
        {"authorId": "1846258", "name": "Noam Shazeer"}
    ],
    "year": 2017,
    "publicationDate": "2017-06-12",
    "venue": "Neural Information Processing Systems",
    "publicationTypes": ["JournalArticle", "Conference"],
    "fieldsOfStudy": ["Computer Science"],
    "citationCount": 102543,
    "referenceCount": 41,
    "influentialCitationCount": 12876,
    "doi": "10.48550/arXiv.1706.03762",
    "arxivId": "1706.03762",
    "isOpenAccess": true,
    "openAccessPdfUrl": "https://arxiv.org/pdf/1706.03762.pdf",
    "url": "https://www.semanticscholar.org/paper/204e3073870fae3d05bcbc2f6a8e263d9b72e776",
    "sourceQuery": "transformer architecture",
    "parentPaperId": null,
    "parentAuthorId": null,
    "relation": null,
    "scrapedAt": "2026-06-02T10:30:00.000Z"
}

Core Fields

Field	Type	Description
`recordType`	string	Always `"paper"`
`title`	string	Paper title
`abstract`	string	Abstract text (`null` when abstracts are off or unavailable)
`authors`	object[]	Authors, each `{authorId, name}`
`year`	number	Publication year
`publicationDate`	string	ISO publication date when available
`venue`	string	Journal or conference name
`publicationTypes`	string[]	E.g. `["JournalArticle"]`
`fieldsOfStudy`	string[]	E.g. `["Computer Science"]`

Identifiers

Field	Type	Description
`paperId`	string	Native 40-character Semantic Scholar paper ID
`doi`	string	DOI when available
`arxivId`	string	arXiv ID when available
`url`	string	Canonical Semantic Scholar paper page

Metrics

Field	Type	Description
`citationCount`	number	Times this paper has been cited
`referenceCount`	number	Number of references in this paper
`influentialCitationCount`	number	Semantic Scholar's "influential" citation count

Open Access & Lineage

Field	Type	Description
`isOpenAccess`	boolean	Whether a free open-access PDF exists
`openAccessPdfUrl`	string	Direct PDF link when open access
`sourceQuery`	string	The search query that produced this row (`null` for direct fetches)
`parentPaperId`	string	Source paper ID on citation/reference child rows (`null` on primary rows)
`parentAuthorId`	string	Source author ID on author-publication child rows (`null` on primary rows)
`relation`	string	`"citation"`, `"reference"`, or `"authorPaper"` on child rows (`null` on primary rows)
`scrapedAt`	string	ISO 8601 timestamp

Author Profile (`recordType: "author"`)

{
    "recordType": "author",
    "authorId": "1741101",
    "name": "Geoffrey E. Hinton",
    "affiliations": ["University of Toronto", "Google"],
    "homepage": "https://www.cs.toronto.edu/~hinton/",
    "paperCount": 412,
    "citationCount": 631204,
    "hIndex": 178,
    "url": "https://www.semanticscholar.org/author/1741101",
    "scrapedAt": "2026-06-02T10:30:00.000Z"
}

Field	Type	Description
`recordType`	string	Always `"author"`
`authorId`	string	Numeric Semantic Scholar author ID
`name`	string	Author display name
`affiliations`	string[]	Listed affiliations
`homepage`	string	Homepage URL when available
`paperCount`	number	Number of papers attributed to the author
`citationCount`	number	Total citations across all papers
`hIndex`	number	Author h-index
`url`	string	Canonical Semantic Scholar author page
`scrapedAt`	string	ISO 8601 timestamp

When includeAuthorPapers is on, each author's publications are also emitted as paper rows alongside the profile, so an author's full body of work lands in the Papers view ready to filter and sort.

Tips for Best Results

Fetch by DOI or arXiv ID for guaranteed exact matches. Keyword search is fuzzy; a DOI like 10.1038/nature14539 or arXiv:1706.03762 resolves to exactly one paper, every time — perfect for verifying a known reference.
Narrow broad topics with filters. A bare query like "machine learning" returns a flood. Add a yearFrom, a fieldsOfStudy value, and a minCitationCount to surface a tight, high-signal set.
Use minCitationCount for impact triage. Set it to 50 or 100 to skip preprints and low-impact work when you only want established, well-cited literature.
Filter to Review and MetaAnalysis for evidence synthesis. These publication types are the backbone of systematic reviews and save hours of manual screening.
Turn off abstracts on large harvests. Setting includeAbstracts: false shrinks every row and speeds up runs when you only need metadata and metrics.
Keep maxResults modest when expanding the citation graph. includeCitations and includeReferences multiply rows per paper — pair them with a small maxResults (5–20) and a sensible maxCitationsPerPaper to keep runs predictable.
Sort by "Most recent first" for monitoring. Re-run a saved query on a schedule with date sorting to catch new publications in your field as they land.

Pricing

From $6 per 1,000 results — the lowest-cost way to pull discipline-spanning academic data with citation graphs and author metrics bundled in. Bronze, Silver, and Gold subscribers pay progressively less; the table below shows total cost at each discount tier.

Results	No discount	Bronze	Silver	Gold
100	$0.72	$0.68	$0.64	$0.60
1,000	$7.20	$6.80	$6.40	$6.00
10,000	$72.00	$68.00	$64.00	$60.00
100,000	$720.00	$680.00	$640.00	$600.00

No compute or time-based charges — you pay per result, plus a small fixed per-run start fee. A "result" is any row in the output dataset: a paper, an author profile, or a citing/referenced/author-paper row from the opt-in graph expansions (so enabling those expansions increases your result count). Platform fees depend on your Apify plan.

Integrations

Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:

Zapier / Make / n8n — Workflow automation
Google Sheets — Direct spreadsheet export
Slack / Email — Notifications on new results
Webhooks — Trigger custom APIs on run completion
Apify API — Full programmatic access

Legal & Ethical Use

This actor is designed for legitimate academic research, bibliometrics, literature review, and market intelligence. Users are responsible for complying with applicable laws and Semantic Scholar's terms of service, including making reasonable-rate requests and respecting content usage rules for any papers or PDFs linked from the dataset. Do not use extracted data for spam, harassment, or any illegal purpose.

Semantic Scholar Scraper

forlex/semantic-scholar-scraper

Search Semantic Scholar's 200M+ paper database and get clean JSON with titles, abstracts, authors, citations, DOIs, and open-access PDF links. Optional API key for higher rate limits.

Forlex Lab

Semantic Scholar Scraper

fortuitous_pirate/semantic-scholar-scraper

Search 200M+ academic papers from Semantic Scholar: titles, abstracts, authors, citations, open-access PDFs, and fields of study. Filter by year, venue, or citation count. Free API.

Fortuitous Pirate

Semantic Scholar Scraper - Papers, Authors, Citations

gio21/semantic-scholar-scraper

Search and fetch academic papers, authors, citations, and references via the Semantic Scholar Graph API.

Gio

Semantic Scholar Scraper — Papers, Authors & Citations

du7chmaniac/semantic-scholar-scraper

Search academic papers, authors, and citation data from the Semantic Scholar API. No API key required.

Joren Maurissen

Semantic Scholar Scraper

crawlerbros/semanticscholar-scraper

Scrape Semantic Scholar with 200M+ academic papers and authors with full citation graph. Search, fetch by paper/author ID, get citations / references / recommendations, with abstracts, TLDRs, fields-of-study, open-access PDFs, h-index, affiliations, and more

Crawler Bros

Semantic Scholar Academic Paper Scraper

cloud9_ai/semantic-scholar-scraper

Search and extract academic papers, citations, and authors from Semantic Scholar. 200M+ papers with citation graphs and impact metrics. Search papers, get detailed paper info, or find researchers. API key optional. For research and AI.

cloud9

Semantic Scholar Paper Search

ryanclinton/semantic-scholar-search

Search and extract academic research papers from Semantic Scholar's database of over 200 million publications.

Ryan Clinton

Google Scholar Scraper - Academic Papers & Citations

klondikeking/google-scholar-scraper-v2

Extract academic papers, citations, authors, and PDF links from Google Scholar.

Pierrick McD0nald

Semantic Scholar Scraper

openclawmara/semantic-scholar-scraper

Scrape Semantic Scholar for academic papers, citations, abstracts, and author profiles. Search by topic, author, or venue. Extract citation graphs, reference lists, and research trends. Essential for literature reviews, academic research, and AI/ML paper discovery.