Pricing

Pay per event

Try for free

Go to Apify Store

Semantic Scholar Scraper

Try for free

Extract detailed academic paper data from Semantic Scholar, including abstracts, citations, authors, and publication details. Ideal for researchers, academics, and analysts who need structured scholarly data for literature reviews, research workflows, and large-scale academic analysis.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

25 days ago

Last modified

📚 Semantic Scholar Scraper

🚀 Supercharge your academic research with our comprehensive Semantic Scholar scraper! Automate collection of detailed academic paper data with advanced filtering capabilities.

Extract comprehensive academic paper data from Semantic Scholar - one of the world's largest academic search engines. Search and collect detailed information about research papers including abstracts, citations, authors, publication details, and more. Perfect for researchers, academics, and data analysts who need structured academic data for literature reviews, research analysis, and academic intelligence gathering.

Target Audience: Researchers, academics, data analysts, students, librarians, research institutions
Primary Use Cases: Literature reviews, research analysis, academic intelligence gathering, citation analysis, publication tracking

How to use the Semantic Scholar Scraper - Full Demo

Watch this demo to see how easy it is to get started!

What Does Semantic Scholar Scraper Do?

This tool collects detailed academic paper data from Semantic Scholar, supporting both search query-based scraping and custom URL scraping. It delivers:

Complete paper metadata - Titles, abstracts, TLDR summaries, publication dates, and years
Author information - Author names, IDs, affiliations, and author profile links
Citation metrics - Citation counts, reference counts, and influential citation counts
Publication details - Venues, journals, publication types, and open access status
Access information - Direct PDF links, paper URLs, DOI identifiers, and external IDs (ArXiv, PubMed, etc.)
Research classifications - Fields of study and research areas
Journal information - Journal names, volumes, and page numbers
And much more - Comprehensive academic intelligence in one scrape

Business Value: Make informed research decisions, track academic trends, and identify relevant papers with comprehensive, up-to-date academic intelligence that saves hours of manual research.

Input

To start Semantic Scholar web scraping, simply fill in the input form. You can scrape Semantic Scholar using two different methods (choose one):

Method 1: Search Query-Based Scraping (Recommended) 🎯

searchQuery - Enter a research topic or paper title (e.g., "machine learning", "neural networks", "quantum computing")
- Required if startUrl is not provided
- Prefill value: "machine learning"
yearMin - Filter papers published on or after this year (optional)
- Example: 2020
yearMax - Filter papers published on or before this year (optional)
- Example: 2024
hasPdf - Only include papers that have an open access PDF available (optional)
- Checkbox option, default: false
maxItems - Set the maximum number of papers to collect (up to 1,000,000). Leave empty for unlimited. Prefill value: 10.

Suggestion-Based Filters (Note: These are treated as suggestions by the API, not strict filters):

author - Filter papers by author name (optional)
- Example: "John Smith"
- Note: Results may include papers that match the search query but may not have the specified author
venues - Filter papers by publication venue (journal or conference) (optional)
- Example: "Nature", "IEEE"
- Note: Results may include papers that match the search query but may not match the specified venue

The scraper will automatically search Semantic Scholar and collect all matching papers.

Method 2: Custom URL Scraping 🔗

startUrl - Use Semantic Scholar search URLs in this format:
- Required if searchQuery is not provided
- Cannot be used together with searchQuery or any other API filters
- Example: https://www.semanticscholar.org/search?q=machine+learning&sort=relevance
maxItems - Set the maximum number of papers to collect (up to 1,000,000). Leave empty for unlimited. Prefill value: 10.

✅ Supported URL Formats:

Search Results Pages:

https://www.semanticscholar.org/search?q=machine+learning&sort=relevance
https://www.semanticscholar.org/search?q=neural+networks&year=2020-2024
https://www.semanticscholar.org/search?q=quantum+computing&openAccessPdf=

⚠️ Important Input Rules:

Choose One Method: You must use either search query-based scraping OR custom URL scraping, not both
Required Fields:
- Either searchQuery OR startUrl must be provided
Mutual Exclusivity:
- If using startUrl, you cannot use searchQuery or any other API filters
- If using searchQuery, you cannot use startUrl
Suggestion-Based Filters: author and venues are treated as suggestions by the API and may not strictly filter results

Here's what the filled-out input configuration looks like in JSON:

{
    "searchQuery": "machine learning",
    "yearMin": 2020,
    "yearMax": 2024,
    "hasPdf": true,
    "maxItems": 50
}

Example 1: Search Query-Based Scraping (Recommended)

{
    "searchQuery": "neural networks",
    "yearMin": 2020,
    "yearMax": 2024,
    "hasPdf": true,
    "maxItems": 100
}

Example 2: Custom URL Scraping

{
    "startUrl": "https://www.semanticscholar.org/search?q=machine+learning&sort=relevance",
    "maxItems": 50
}

Example 3: Advanced Search with Filters

{
    "searchQuery": "quantum computing",
    "yearMin": 2022,
    "hasPdf": true,
    "author": "John Smith",
    "maxItems": 200
}

Pro Tips:

For Search Query-Based Scraping (Recommended):

🎯 Be specific with queries - Use precise research terms for best results
📅 Filter by year range - Focus on recent papers or specific time periods
📄 Use PDF filter - Get only papers with available PDFs for easier access
⚡ Faster than manual search - No need to browse through multiple pages manually

For Custom URL Scraping:

Go to Semantic Scholar
Use the search functionality to find papers on your topic
Apply any filters you want (year, open access, etc.)
Copy the URL and paste it into the startUrl field

Output

After the Actor finishes its run, you'll get a dataset with the output. The length of the dataset depends on the amount of results you've set. You can download those results as an Excel, HTML, XML, JSON, and CSV document.

Here's an example of scraped Semantic Scholar data you'll get if you decide to scrape academic papers:

{
    "paperId": "1234567890",
    "title": "Deep Learning for Natural Language Processing: A Comprehensive Survey",
    "authors": [
        {
            "name": "John Smith",
            "url": "https://www.semanticscholar.org/author/123456",
            "authorId": "123456",
            "affiliations": ["Stanford University"]
        },
        {
            "name": "Jane Doe",
            "url": "https://www.semanticscholar.org/author/789012",
            "authorId": "789012",
            "affiliations": ["MIT"]
        }
    ],
    "year": 2023,
    "publicationVenue": "Nature Machine Intelligence",
    "publicationDate": "2023-05-15",
    "abstract": "This paper presents a comprehensive survey of deep learning techniques for natural language processing...",
    "tldr": "A survey of deep learning methods for NLP tasks including transformers, attention mechanisms, and pre-trained models.",
    "citationCount": 245,
    "referenceCount": 89,
    "influentialCitationCount": 12,
    "isOpenAccess": true,
    "hasPdf": true,
    "detailUrl": "https://www.semanticscholar.org/paper/1234567890",
    "pdfUrl": "https://example.com/paper.pdf",
    "doi": "10.1038/s42256-023-00123-4",
    "corpusId": "1234567890",
    "externalIds": {
        "DOI": "10.1038/s42256-023-00123-4",
        "ArXiv": "2305.12345",
        "PubMed": "12345678",
        "PubMedCentral": "PMC1234567",
        "MAG": "123456789",
        "ACL": "2023.acl-main.123",
        "DBLP": "conf/nature/2023",
        "CorpusId": "1234567890"
    },
    "fieldsOfStudy": ["Computer Science", "Machine Learning", "Natural Language Processing"],
    "s2FieldsOfStudy": ["Computer Science"],
    "publicationTypes": ["JournalArticle"],
    "journal": {
        "name": "Nature Machine Intelligence",
        "volume": "5",
        "pages": "123-145"
    },
    "scrapedTimestamp": "2025-01-12T23:29:22.172Z"
}

What You Get:

📄 Complete Paper Information - Titles, abstracts, and TLDR summaries for quick understanding
👥 Detailed Author Data - Author names, IDs, affiliations, and profile links
📊 Citation Metrics - Total citations, references, and influential citation counts
🔗 Access Links - Direct PDF links, paper URLs, and DOI identifiers
🏛️ Publication Details - Venues, journals, publication types, and open access status
🔍 Research Classifications - Fields of study and research areas
📚 External Identifiers - ArXiv, PubMed, ACL, DBLP, and other database IDs
📅 Publication Metadata - Years, dates, journal volumes, and page numbers

Download Options: CSV, Excel, or JSON formats for easy analysis in your research tools

Why Choose the Semantic Scholar Scraper?

🎯 Comprehensive Data: Get all available paper information in one scrape - citations, abstracts, authors, and more
🔍 Flexible Search: Search by query or use custom URLs with advanced filtering options
📅 Year Filtering: Filter papers by publication year range for targeted research
📄 PDF Access: Filter for papers with available PDFs for easier access
👥 Author Information: Get complete author details including affiliations and profile links
📊 Citation Metrics: Access citation counts, reference counts, and influential citation metrics
🔗 Multiple Identifiers: Get DOI, ArXiv, PubMed, and other external database IDs
🚫 No Duplicates: Automatically skips papers already in your dataset
⚡ User-Friendly: No coding needed - just input your search query and go
🔄 Sequential Processing: Processes papers one by one for maximum data quality

Time Savings: Save 4-6 hours per week compared to manual paper research
Cost Efficiency: Fraction of the cost of hiring a research assistant or using expensive academic databases

How to Use

Sign Up: Create a free account w/ $5 credit (takes 2 minutes)
Find the Scraper: Visit the Semantic Scholar Scraper page
Set Input:
- Free users: Can process up to 50 items (maxItems required, maximum value: 50)
- Paid users:
  - Option A (Recommended): Enter a search query and apply filters (year range, PDF availability, etc.)
  - Option B: Add your custom Semantic Scholar search URL
  - Set max items (optional, prefill value: 10, up to 1,000,000)
Run It: Click "Start" and let it collect your data
Download Data: Get your results in the "Dataset" tab as CSV, Excel, or JSON

Total Time: 3 minutes setup, 10-30 minutes for data collection
No Technical Skills Required: Everything is point-and-click

Business Use Cases

🔬 Researchers:

Conduct comprehensive literature reviews
Track citations and research impact
Find relevant papers for research projects
Monitor new publications in your field

👨‍🏫 Academics:

Build reference databases for courses
Track publication trends in your discipline
Identify collaboration opportunities
Analyze research impact metrics

📚 Librarians:

Build comprehensive paper collections
Support researchers with data access
Track publication trends and patterns
Create subject-specific databases

📊 Data Analysts:

Analyze academic publication trends
Build research intelligence databases
Track citation networks
Support policy decisions with data

🎓 Students:

Find papers for thesis and dissertation research
Build comprehensive reference lists
Track citations for academic writing
Discover relevant research in your field

Using Semantic Scholar Scraper with the Apify API

For advanced users who want to automate this process, you can control the scraper programmatically with the Apify API. This allows you to schedule regular data collection and integrate with your existing research tools.

Example API Usage:

// Node.js example
const { ApifyApi } = require('apify-client');

const client = new ApifyApi({
    token: 'YOUR_API_TOKEN',
});

// Run with search query
await client.actor('YOUR_ACTOR_ID').call({
    searchQuery: "machine learning",
    yearMin: 2020,
    yearMax: 2024,
    hasPdf: true,
    maxItems: 100
});

// Run with custom URL
await client.actor('YOUR_ACTOR_ID').call({
    startUrl: "https://www.semanticscholar.org/search?q=machine+learning&sort=relevance",
    maxItems: 50
});

Node.js: Install the apify-client NPM package
Python: Use the apify-client PyPI package
See the Apify API reference for full details

Frequently Asked Questions

Q: How does it work? A: Semantic Scholar Scraper is easy to use and requires no technical knowledge. Simply enter your search query or paste a Semantic Scholar URL, configure your filters, and let the tool collect the data automatically.

Q: How accurate is the data? A: We collect data directly from Semantic Scholar's official API in real-time, ensuring the most up-to-date and accurate academic paper information available.

Q: Can I filter by specific authors or venues? A: Yes! You can use the author and venues filters. Note that these are treated as suggestions by the Semantic Scholar API, so results may include papers that match your search query but may not strictly match these filters.

Q: What URL formats are supported? A: We support Semantic Scholar search URLs. See the Input section for specific examples.

Q: Can I schedule regular runs? A: Yes! Use the Apify API to schedule daily, weekly, or monthly runs automatically. Perfect for ongoing research monitoring and publication tracking.

Q: What if I need help? A: Our support team is available 24/7. Contact us through the Apify platform.

Q: Is my data secure? A: Absolutely. All data is encrypted in transit and at rest. We never share your data with third parties.

Q: Are there limits for free users? A: Free users can process up to 50 items per run (maxItems is required and must be 50 or less). Paid users can process up to 1,000,000 items per run.

Integrate Semantic Scholar Scraper with any app and automate your workflow

Last but not least, Semantic Scholar Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform.

These includes:

Alternatively, you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever Semantic Scholar Scraper successfully finishes a run.

🔗 Recommended Actors

Looking for more data collection tools? Check out these related actors:

Actor	Description	Link
GSA eLibrary Scraper	Collects government publication data from GSA eLibrary	https://apify.com/parseforge/gsa-elibrary-scraper
PR Newswire Scraper	Extracts press releases and news data from PR Newswire	https://apify.com/parseforge/pr-newswire-scraper
Hugging Face Model Scraper	Collects AI model data from Hugging Face	https://apify.com/parseforge/hugging-face-model-scraper
Hubspot Marketplace Scraper	Extracts business app data from HubSpot marketplace	https://apify.com/parseforge/hubspot-marketplace-scraper
Smart Apify Actor Scraper	Collects comprehensive actor data from Apify with quality metrics	https://apify.com/parseforge/smart-apify-actor-scraper

Pro Tip: 💡 Browse our complete collection of data collection actors to find the perfect tool for your business needs.

Need Help? Our support team is here to help you get the most out of this tool.

Contact us to request a new scraper, propose a custom data project, or report a technical issue with this actor at https://tally.so/r/BzdKgA

⚠️ Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Semantic Scholar or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.

Google Scholar Search Scraper

ecomscrape/google-scholar-search-scraper

Extract comprehensive academic data from Google Scholar including research papers, citations, author information, and PDF links. Automate your literature review process with advanced scraping capabilities for researchers and academics.

ecomscrape

arXiv Search Scraper 📚

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. 🎓📚

EasyApi

5.0

OpenAlex Scraper

parseforge/openalex-scraper

Optimize your academic research with our comprehensive OpenAlex scraper! Obtain complete academic information, including publication dates, DOI links, open access status, and citation metrics. Ideal for researchers, academic institutions, and data analysts who need accurate data without manual work.

ParseForge

5.0

Google Scholar Scraper: Articles, Citations & PDFs

primeparse/google-scholar-scraper

Extract academic data from Google Scholar: titles, authors, years, citations, abstracts, PDF links. Supports queries, year filters (1900-2100), pagination (up to 5 pages). Rate-limited for safety. Ideal for research, citations, datasets, AI. Clean JSON output. Run on Apify with proxies.

PrimeParse

Google Scholar Scraper

easyapi/google-scholar-scraper

Powerful Google Scholar scraper collect up to 5000 scholarly results per run with flexible search options, citation filtering. Perfect for academic research, bibliometric analysis, and scientific trend tracking. 🎓🔍

EasyApi

321

3.5

Google Scholar Scraper

marco.gullo/google-scholar-scraper

Scrape publication details from scholar.google.com. Add your query, time range, and optionally document type (PDF or HTML only). Extract information about articles such as titles, authors, links, related articles, and more.

Marco Gullo

1.6K

5.0

Academic Paper Search (Semantic Scholar)

nexgendata/google-scholar-scraper

Scrape academic papers, citations, author profiles, and h-index data from Google Scholar. Essential for literature reviews, research monitoring, and academic analytics.

Stephan Corbeil

📚 arXiv Article Metadata Scraper - Pay per results

scrapestorm/arxiv-article-metadata-scraper---pay-per-results

Discover top arXiv papers with ⚡fast metadata extraction! Sort by 🔥 relevance 🕒 submission date or 📚 subject area. Get key info like titles, abstracts, authors, PDF links & more. Perfect for 📊 literature reviews, trend tracking, academic research & building high-quality AI training datasets!

Storm_Scraper

5.0

Semantic Scholar Paper Scraper

agenscrape/semantic-scholar-paper-scraper

Scrape academic papers from Semantic Scholar. Search by keyword and extract paper titles, abstracts, authors, citation counts, publication dates, DOIs, open access PDFs... Perfect for literature reviews, citation analysis, and research databases. Real time data output with pagination support.