Pricing

$5.00 / 1,000 paper scrapeds

Semantic Scholar Scraper

Scrape Semantic Scholar for academic papers, citations, abstracts, and author profiles. Search by topic, author, or venue. Extract citation graphs, reference lists, and research trends. Essential for literature reviews, academic research, and AI/ML paper discovery.

Pricing

$5.00 / 1,000 paper scrapeds

Rating

0.0

(0)

Developer

OpenClaw Mara

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🧠 Semantic Scholar Scraper — Academic Papers with Citations & Influence

Structured data from 200M+ papers across every scientific field — with citation graphs, influential paper scores, and author networks. $0.005 per paper.

Scrape Semantic Scholar — Allen AI's free academic search engine — for papers, abstracts, citation counts, author networks, and the "influential citations" signal you can't get from arXiv or Google Scholar. Uses the official Semantic Scholar Academic Graph API.

Perfect for citation-aware literature review, author network mapping, research impact analysis, "who cited whom" tracking, LLM training corpora with quality signals, and building academic knowledge graphs.

🚀 What does this Actor do?

Semantic Scholar indexes papers from arXiv, PubMed, ACL, ACM, IEEE, Springer, Elsevier, and thousands of other sources — and layers a citation graph on top. This Actor turns that graph into a programmable source in four modes:

search — Full-text search across every field of science.
paper_details — Fetch full metadata + references + citations for specific papers by ID, DOI, arXiv ID, or URL.
author — All publications by a researcher, with h-index, citation count, affiliation.
citations — Outbound references or inbound citations for a paper (walk the citation graph).

Every paper returns title, abstract, authors, venue, year, citationCount, influentialCitationCount, openAccessPdf link, and structured references — ready for a vector DB, a research dashboard, or a citation network graph.

💡 Use Cases

1. Citation-aware RAG for literature review

Pull the top-100 most-cited papers on a topic, embed abstracts, and build a RAG pipeline that cites real papers weighted by impact.

{
  "mode": "search",
  "searchQuery": "retrieval augmented generation",
  "maxResults": 100,
  "sortBy": "citationCount"
}

2. Citation network building

Start from a seminal paper and walk 1-2 hops out via citations mode. Build a graph of "papers that cited X" → feed into Neo4j / Graphiti / a knowledge graph.

{
  "mode": "citations",
  "paperId": "204e3073870fae3d05bcbc2f6a8e263d9b72e776",
  "direction": "inbound",
  "maxResults": 200
}

3. Author / lab impact tracking

Pull a researcher's full publication history with h-index, total citations, and influential-citation counts per paper — actionable signal on which work actually matters.

{
  "mode": "author",
  "authorId": "1738125",
  "maxResults": 500
}

4. Enrichment of arXiv / DOI lists

You already have paper IDs from arXiv, Crossref, or a bibliography — enrich them with citation counts and influential-citation scores that arXiv doesn't ship.

{
  "mode": "paper_details",
  "paperIds": ["ARXIV:1706.03762", "DOI:10.1038/nature14539", "10.1126/science.aab0410"]
}

📊 Output Example

{
  "paperId": "204e3073870fae3d05bcbc2f6a8e263d9b72e776",
  "title": "Attention Is All You Need",
  "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
  "year": 2017,
  "venue": "Neural Information Processing Systems",
  "authors": [
    {"authorId": "40348417", "name": "Ashish Vaswani"},
    {"authorId": "1738125", "name": "Noam Shazeer"}
  ],
  "citationCount": 98432,
  "influentialCitationCount": 12874,
  "referenceCount": 40,
  "openAccessPdf": {"url": "https://arxiv.org/pdf/1706.03762", "status": "GREEN"},
  "externalIds": {"DOI": "10.48550/arXiv.1706.03762", "ArXiv": "1706.03762"},
  "fieldsOfStudy": ["Computer Science"],
  "publicationTypes": ["JournalArticle", "Conference"]
}

⚙️ Input Parameters

Parameter	Type	Description
`mode`	enum	`search`, `paper_details`, `author`, or `citations` (required)
`searchQuery`	string	Keywords/phrase (search mode)
`paperIds`	array	IDs for `paper_details` mode. Accepts `ARXIV:<id>`, `DOI:<doi>`, raw DOI, S2 paperId, URL, PMID
`paperId`	string	Single paper ID for `citations` mode
`direction`	enum	`inbound` (papers citing this) or `outbound` (references of this)
`authorId`	string	Semantic Scholar authorId for `author` mode
`authorName`	string	Author name fallback when no ID known
`fieldsOfStudy`	array	Filter by `"Computer Science"`, `"Biology"`, `"Medicine"`, `"Physics"`, etc.
`yearFrom`, `yearTo`	int	Year range filter
`maxResults`	int	1–1000 (default 100)
`sortBy`	enum	`relevance`, `citationCount`, `year`

📤 Output Fields

Field	Description
`paperId`	Semantic Scholar unique ID (SHA-1 hash)
`title`, `abstract`	Full text
`year`, `venue`, `publicationDate`	Publication metadata
`authors[]`	Ordered author list with `authorId` + `name`
`citationCount`	Total inbound citations
`influentialCitationCount`	Citations where this paper meaningfully influenced the citing work (the unique S2 signal)
`referenceCount`	Outbound reference count
`openAccessPdf`	Free PDF link + OA color (`GREEN`, `GOLD`, `BRONZE`, `HYBRID`)
`externalIds`	DOI, ArXiv, PubMed, PMC, CorpusId, DBLP
`fieldsOfStudy[]`	Automated subject classification
`publicationTypes[]`	Journal, Conference, Review, Book, etc.

💰 Pricing & Performance

Pay-per-event: $0.005 per paper.
Typical monthly cost: $1–$5 for literature-review pipelines (100–1,000 papers/week).
Speed: ~60 papers/minute (S2 API rate-limited to 1 req/sec for unauthenticated; Actor handles pacing).
No auth required — S2 Academic Graph API is free.

🔌 Integrations

Zapier / Make / n8n — weekly "top-cited new papers in field X" digest to Slack or Notion.
LangChain / LlamaIndex — citation-aware RAG: weight retrieval by influentialCitationCount.
Vector DBs (Pinecone, Weaviate, Qdrant, pgvector) — embed abstracts + store citation counts as metadata for impact-weighted search.
Neo4j / Graphiti — load citations output directly as graph edges to build citation networks.
arXiv Paper Scraper (companion) — arXiv ships papers first; S2 ships citations. Combine for a full picture.
Crossref Scraper (companion) — cross-validate DOIs and journal metadata.

🏷️ Popular Fields of Study

Computer Science
Medicine
Biology
Physics
Mathematics
Engineering
Psychology
Economics
Chemistry
Environmental Science

Full S2 field taxonomy: https://api.semanticscholar.org/graph/v1

❓ FAQ

How is this different from arXiv Paper Scraper? arXiv gives you the paper; Semantic Scholar gives you the paper plus the citation graph. Use arXiv for fresh preprints, S2 for impact signal and "who cited whom" walks.

What's influentialCitationCount? S2's ML-derived signal for citations that actually build on the work (vs. drive-by citations for context). It's the single most useful number for "does this paper matter."

Does it work for non-English papers? S2 indexes multilingual papers but abstracts are returned as-stored (often English even for non-English source). Works best for Western academic corpora.

Can I paginate past maxResults? Up to 1,000 per run. For larger corpora, partition by year or field and run multiple jobs.

What ID formats are accepted in paper_details? S2 paperId (SHA-1), DOI:<doi>, raw DOI, ARXIV:<id>, PMID:<id>, PMCID:<id>, CorpusId:<id>, DBLP:<id>, full URL. The Actor normalizes automatically.

Rate limits? S2 allows 1 req/sec unauthenticated. The Actor handles pacing; you just set maxResults and wait. For heavy workloads, bring your own S2 API key via input.

🔑 Keywords

Semantic Scholar scraper, academic paper scraper, citation graph API, research paper citations, paper impact metrics, influential citation count, literature review automation, citation network builder, research impact analysis, h-index scraper, author network mapping, academic knowledge graph, paper metadata enrichment, DOI enrichment, arXiv citation data, RAG over research papers, citation-aware retrieval, Allen AI Semantic Scholar, S2 Academic Graph API.

📝 Changelog

v0.1 — Initial release. 4 modes (search, paper_details, author, citations), field-of-study filters, influentialCitationCount support, OA PDF links.

Semantic Scholar Academic Paper Scraper

cloud9_ai/semantic-scholar-scraper

Search and extract academic papers, citations, and authors from Semantic Scholar. 200M+ papers with citation graphs and impact metrics. Search papers, get detailed paper info, or find researchers. API key optional. For research and AI.

cloud9

Semantic Scholar Scraper - Low-cost💲🔥📚🤖

delectable_incubator/semantic-scholar-scraper-low-cost

📚🔎 Extract research papers from Semantic Scholar using keywords, paper URLs, or author profiles. Collect paper titles, authors, publication years, abstracts, citations, venues, research fields, paper URLs, and metadata. Ideal for academic research, literature reviews & AI research monitoring. 🚀

Prime Scrape

Semantic Scholar Paper Scraper

agenscrape/semantic-scholar-paper-scraper

Scrape academic papers from Semantic Scholar. Search by keyword and extract paper titles, abstracts, authors, citation counts, publication dates, DOIs, open access PDFs... Perfect for literature reviews, citation analysis, and research databases. Real time data output with pagination support.

Agenscrape

Semantic Scholar Scraper

parseforge/semantic-scholar-scraper

Extract detailed academic paper data from Semantic Scholar, including abstracts, citations, authors, and publication details. Ideal for researchers, academics, and analysts who need structured scholarly data for literature reviews, research workflows, and large-scale academic analysis.

ParseForge

1.1

Semantic Scholar Scraper - Cheap 📚🔎🤖

scrapestorm/semantic-scholar-scraper---cheap

🔎 Easily collect research papers from Semantic Scholar Provide one or multiple search keywords, paper URLs or author profiles and extract structured academic data such as 📄 Paper Title👨‍🔬 Authors 📅 Publication Year 🔗 Paper URL & more Perfect for academic research & AI research monitoring 📚

Storm_Scraper

5.0

Semantic Scholar Search Scraper

powerai/semantic-scholar-search-scraper

Scrape academic papers from Semantic Scholar by keyword search, with automatic pagination and comprehensive research data extraction.

PowerAI

Semantic Scholar Scraper — Papers, Authors & Citations

du7chmaniac/semantic-scholar-scraper

Search academic papers, authors, and citation data from the Semantic Scholar API. No API key required.

Joren Maurissen

Semantic Scholar Scraper

fortuitous_pirate/semantic-scholar-scraper

Search 200M+ academic papers from Semantic Scholar: titles, abstracts, authors, citations, open-access PDFs, and fields of study. Filter by year, venue, or citation count. Free API.

Fortuitous Pirate

Semantic Scholar Paper Search

ryanclinton/semantic-scholar-search

Search and extract academic research papers from Semantic Scholar's database of over 200 million publications.

Ryan Clinton

Semantic Scholar Scraper

solidcode/semanticscholar-scraper

[💰 $6 / 1K] Extract academic papers, abstracts, citations, references, authors, and open-access PDF links from Semantic Scholar's 200M+ database. Search by keyword, paper ID/DOI/URL, or author. Filter by year, field, and citations. No API key.