Semantic Scholar Scraper avatar

Semantic Scholar Scraper

Pricing

Pay per event

Go to Apify Store
Semantic Scholar Scraper

Semantic Scholar Scraper

Extract detailed academic paper data from Semantic Scholar, including abstracts, citations, authors, and publication details. Ideal for researchers, academics, and analysts who need structured scholarly data for literature reviews, research workflows, and large-scale academic analysis.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

20

Total users

4

Monthly active users

5 hours ago

Last modified

Share

ParseForge Banner

📚 Semantic Scholar Scraper

🚀 Collect academic paper data from Semantic Scholar in minutes. Search by keyword, author, venue, or year range. Export titles, abstracts, citations, authors, and PDF links. No coding, no API key required.

🕒 Last updated: 2026-04-16 · 📊 20+ fields per paper · 🔍 6 search filters · 📄 PDF availability · 🚫 No auth required

The Semantic Scholar Scraper collects academic paper data from Semantic Scholar, returning 20+ fields per paper: title, abstract, authors, citation count, reference count, year, venue, DOI, PDF URL, and paper URL. Filter by keyword, author, venue, year range, and PDF availability. Runs support up to 1,000,000 papers on a paid plan.

Semantic Scholar indexes over 200 million academic papers. This Actor queries its database with 6 filters and returns structured results ready for literature reviews, citation analysis, or research dashboards.

🎯 Target Audience💡 Primary Use Cases
Academic researchers, data scientists, R&D teams, librarians, science journalists, bibliometric analystsLiterature reviews, citation analysis, research trend tracking, author profiling, venue benchmarking

📋 What the Semantic Scholar Scraper does

Six search filters:

  • 🔍 Keyword search. Free-text search across titles and abstracts.
  • 🔗 URL mode. Paste a direct Semantic Scholar search URL.
  • 👤 Author filter. Search by author name.
  • 📅 Year range. Min and max publication year.
  • 📄 PDF filter. Only papers with available PDFs.
  • 🏛️ Venue filter. Conference or journal name.

Each paper record includes title, abstract, authors (with IDs), citation count, reference count, year, venue, DOI, fields of study, PDF URL, and Semantic Scholar URL.

💡 Why it matters: searching for papers one at a time on Semantic Scholar or Google Scholar is slow and doesn't support bulk export. This Actor downloads structured academic data at scale for systematic reviews, bibliometric analysis, or research intelligence.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
searchQuerystring""Keyword search across titles and abstracts.
startUrlstring""Direct Semantic Scholar URL.
authorstring""Author name filter.
yearMininteger-Minimum publication year.
yearMaxinteger-Maximum publication year.
hasPdfbooleanfalseOnly papers with available PDFs.
venuesarray[]Conference or journal names.
maxItemsinteger10Max papers. Free: limited. Paid: up to 1,000,000.

Example: recent AI papers with PDFs available.

{
"searchQuery": "large language models",
"yearMin": 2024,
"hasPdf": true,
"maxItems": 100
}

Example: papers by a specific author.

{
"author": "Yoshua Bengio",
"maxItems": 50
}

📊 Output

Each paper record contains 20+ fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
📝 titlestring"Attention Is All You Need"
📄 abstractstring"We propose a new simple network..."
👤 authorsarray[{ name, authorId }]
📊 citationCountnumber95000
📚 referenceCountnumber38
📅 yearnumber2017
🏛️ venuestring"NeurIPS"
🔗 doistring"10.5555/3295222.3295349"
📂 fieldsOfStudyarray["Computer Science"]
📄 pdfUrlstring | null"https://arxiv.org/pdf/1706.03762"
🔗 urlstring"https://www.semanticscholar.org/paper/..."
🕒 scrapedAtISO 8601"2026-04-16T00:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
📚200M+ papers indexed. Full Semantic Scholar database.
🔍6 search filters. Keyword, author, year, venue, PDF, and URL.
📊Citation and reference counts. Quantitative impact metrics.
📄PDF links. Direct download URLs when available.
👤Author profiles. Name and Semantic Scholar ID per author.
Scalable. From single-paper lookups to full topic sweeps.
🚫No authentication. No API key needed.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ Semantic Scholar Scraper (this Actor)$5 free credit, then pay-per-use200M+ papersLive per runkeyword, author, year, venue, PDF⚡ 2 min
Semantic Scholar API (direct)Free with rate limitsFullReal-timeMany⏳ Hours (API setup)
Google ScholarFreeBroadManualLimited🕒 Per search
Paid academic databases$1,000-50,000/yearMulti-sourceVariesMany🐢 Weeks

Pick this Actor when you want academic paper metadata on demand, with filters, without writing API client code.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Semantic Scholar Scraper page on the Apify Store.
  3. 🎯 Set input. Enter a search query, author, or year range.
  4. 🚀 Run it. Click Start.
  5. 📥 Download. Grab results in the Dataset tab.

⏱️ Total time: 3-5 minutes. No coding required.


💼 Business use cases

📊 Literature Reviews & Bibliometrics

  • Build systematic review datasets
  • Analyze citation networks by topic
  • Track research trends over time
  • Compare venue impact by field

🏢 R&D & Industry Research

  • Monitor competitor publications
  • Track emerging technologies by keyword
  • Build prior-art search databases
  • Identify expert authors by citation count

🔌 Automating Semantic Scholar Scraper

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

❓ Frequently Asked Questions


🔌 Integrate with any app


💡 Pro Tip: browse the complete ParseForge collection for more research and data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Semantic Scholar or the Allen Institute for AI. All trademarks mentioned are the property of their respective owners. Only publicly available academic metadata is collected.