Semantic Scholar Scraper
Pricing
Pay per event
Semantic Scholar Scraper
Extract detailed academic paper data from Semantic Scholar, including abstracts, citations, authors, and publication details. Ideal for researchers, academics, and analysts who need structured scholarly data for literature reviews, research workflows, and large-scale academic analysis.
Pricing
Pay per event
Rating
5.0
(1)
Developer
ParseForge
Actor stats
0
Bookmarked
20
Total users
4
Monthly active users
5 hours ago
Last modified
Categories
Share

📚 Semantic Scholar Scraper
🚀 Collect academic paper data from Semantic Scholar in minutes. Search by keyword, author, venue, or year range. Export titles, abstracts, citations, authors, and PDF links. No coding, no API key required.
🕒 Last updated: 2026-04-16 · 📊 20+ fields per paper · 🔍 6 search filters · 📄 PDF availability · 🚫 No auth required
The Semantic Scholar Scraper collects academic paper data from Semantic Scholar, returning 20+ fields per paper: title, abstract, authors, citation count, reference count, year, venue, DOI, PDF URL, and paper URL. Filter by keyword, author, venue, year range, and PDF availability. Runs support up to 1,000,000 papers on a paid plan.
Semantic Scholar indexes over 200 million academic papers. This Actor queries its database with 6 filters and returns structured results ready for literature reviews, citation analysis, or research dashboards.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Academic researchers, data scientists, R&D teams, librarians, science journalists, bibliometric analysts | Literature reviews, citation analysis, research trend tracking, author profiling, venue benchmarking |
📋 What the Semantic Scholar Scraper does
Six search filters:
- 🔍 Keyword search. Free-text search across titles and abstracts.
- 🔗 URL mode. Paste a direct Semantic Scholar search URL.
- 👤 Author filter. Search by author name.
- 📅 Year range. Min and max publication year.
- 📄 PDF filter. Only papers with available PDFs.
- 🏛️ Venue filter. Conference or journal name.
Each paper record includes title, abstract, authors (with IDs), citation count, reference count, year, venue, DOI, fields of study, PDF URL, and Semantic Scholar URL.
💡 Why it matters: searching for papers one at a time on Semantic Scholar or Google Scholar is slow and doesn't support bulk export. This Actor downloads structured academic data at scale for systematic reviews, bibliometric analysis, or research intelligence.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
searchQuery | string | "" | Keyword search across titles and abstracts. |
startUrl | string | "" | Direct Semantic Scholar URL. |
author | string | "" | Author name filter. |
yearMin | integer | - | Minimum publication year. |
yearMax | integer | - | Maximum publication year. |
hasPdf | boolean | false | Only papers with available PDFs. |
venues | array | [] | Conference or journal names. |
maxItems | integer | 10 | Max papers. Free: limited. Paid: up to 1,000,000. |
Example: recent AI papers with PDFs available.
{"searchQuery": "large language models","yearMin": 2024,"hasPdf": true,"maxItems": 100}
Example: papers by a specific author.
{"author": "Yoshua Bengio","maxItems": 50}
📊 Output
Each paper record contains 20+ fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
📝 title | string | "Attention Is All You Need" |
📄 abstract | string | "We propose a new simple network..." |
👤 authors | array | [{ name, authorId }] |
📊 citationCount | number | 95000 |
📚 referenceCount | number | 38 |
📅 year | number | 2017 |
🏛️ venue | string | "NeurIPS" |
🔗 doi | string | "10.5555/3295222.3295349" |
📂 fieldsOfStudy | array | ["Computer Science"] |
📄 pdfUrl | string | null | "https://arxiv.org/pdf/1706.03762" |
🔗 url | string | "https://www.semanticscholar.org/paper/..." |
🕒 scrapedAt | ISO 8601 | "2026-04-16T00:00:00.000Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 📚 | 200M+ papers indexed. Full Semantic Scholar database. |
| 🔍 | 6 search filters. Keyword, author, year, venue, PDF, and URL. |
| 📊 | Citation and reference counts. Quantitative impact metrics. |
| 📄 | PDF links. Direct download URLs when available. |
| 👤 | Author profiles. Name and Semantic Scholar ID per author. |
| ⚡ | Scalable. From single-paper lookups to full topic sweeps. |
| 🚫 | No authentication. No API key needed. |
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ Semantic Scholar Scraper (this Actor) | $5 free credit, then pay-per-use | 200M+ papers | Live per run | keyword, author, year, venue, PDF | ⚡ 2 min |
| Semantic Scholar API (direct) | Free with rate limits | Full | Real-time | Many | ⏳ Hours (API setup) |
| Google Scholar | Free | Broad | Manual | Limited | 🕒 Per search |
| Paid academic databases | $1,000-50,000/year | Multi-source | Varies | Many | 🐢 Weeks |
Pick this Actor when you want academic paper metadata on demand, with filters, without writing API client code.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the Semantic Scholar Scraper page on the Apify Store.
- 🎯 Set input. Enter a search query, author, or year range.
- 🚀 Run it. Click Start.
- 📥 Download. Grab results in the Dataset tab.
⏱️ Total time: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating Semantic Scholar Scraper
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
❓ Frequently Asked Questions
🔌 Integrate with any app
- Make - Automate workflows
- Zapier - Connect 5,000+ apps
- Slack - Get notifications
- Airbyte - Data pipelines
- GitHub - Trigger from commits
- Google Drive - Export to Sheets
🔗 Recommended Actors
- 📚 Rate My Professors Scraper - Professor ratings
- 🏥 ClinicalTrials.gov Scraper - Clinical trial data
- 📰 PR Newswire Scraper - Press releases
- 📊 Indexmundi Scraper - Global indicators
- 🔗 Broken Link Checker - URL validation
💡 Pro Tip: browse the complete ParseForge collection for more research and data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Semantic Scholar or the Allen Institute for AI. All trademarks mentioned are the property of their respective owners. Only publicly available academic metadata is collected.

