Semantic Scholar Scraper
Pricing
Pay per event
Semantic Scholar Scraper
Extract detailed academic paper data from Semantic Scholar, including abstracts, citations, authors, and publication details. Ideal for researchers, academics, and analysts who need structured scholarly data for literature reviews, research workflows, and large-scale academic analysis.
Pricing
Pay per event
Rating
5.0
(1)
Developer

ParseForge
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
π Semantic Scholar Scraper
π Supercharge your academic research with our comprehensive Semantic Scholar scraper! Automate collection of detailed academic paper data with advanced filtering capabilities.
Extract comprehensive academic paper data from Semantic Scholar - one of the world's largest academic search engines. Search and collect detailed information about research papers including abstracts, citations, authors, publication details, and more. Perfect for researchers, academics, and data analysts who need structured academic data for literature reviews, research analysis, and academic intelligence gathering.
Target Audience: Researchers, academics, data analysts, students, librarians, research institutions
Primary Use Cases: Literature reviews, research analysis, academic intelligence gathering, citation analysis, publication tracking
What Does Semantic Scholar Scraper Do?
This tool collects detailed academic paper data from Semantic Scholar, supporting both search query-based scraping and custom URL scraping. It delivers:
- Complete paper metadata - Titles, abstracts, TLDR summaries, publication dates, and years
- Author information - Author names, IDs, affiliations, and author profile links
- Citation metrics - Citation counts, reference counts, and influential citation counts
- Publication details - Venues, journals, publication types, and open access status
- Access information - Direct PDF links, paper URLs, DOI identifiers, and external IDs (ArXiv, PubMed, etc.)
- Research classifications - Fields of study and research areas
- Journal information - Journal names, volumes, and page numbers
- And much more - Comprehensive academic intelligence in one scrape
Business Value: Make informed research decisions, track academic trends, and identify relevant papers with comprehensive, up-to-date academic intelligence that saves hours of manual research.
How to use the Semantic Scholar Scraper - Full Demo
[YouTube video embed or link]
Watch this 3-minute demo to see how easy it is to get started!
Input
To start Semantic Scholar web scraping, simply fill in the input form. You can scrape Semantic Scholar using two different methods (choose one):
Method 1: Search Query-Based Scraping (Recommended) π―
- searchQuery - Enter a research topic or paper title (e.g., "machine learning", "neural networks", "quantum computing")
- Required if startUrl is not provided
- Prefill value: "machine learning"
- yearMin - Filter papers published on or after this year (optional)
- Example: 2020
- yearMax - Filter papers published on or before this year (optional)
- Example: 2024
- hasPdf - Only include papers that have an open access PDF available (optional)
- Checkbox option, default: false
- maxItems - Set the maximum number of papers to collect (up to 1,000,000). Leave empty for unlimited. Prefill value: 10.
Suggestion-Based Filters (Note: These are treated as suggestions by the API, not strict filters):
- author - Filter papers by author name (optional)
- Example: "John Smith"
- Note: Results may include papers that match the search query but may not have the specified author
- venues - Filter papers by publication venue (journal or conference) (optional)
- Example: "Nature", "IEEE"
- Note: Results may include papers that match the search query but may not match the specified venue
The scraper will automatically search Semantic Scholar and collect all matching papers.
Method 2: Custom URL Scraping π
- startUrl - Use Semantic Scholar search URLs in this format:
- Required if searchQuery is not provided
- Cannot be used together with searchQuery or any other API filters
- Example:
https://www.semanticscholar.org/search?q=machine+learning&sort=relevance
- maxItems - Set the maximum number of papers to collect (up to 1,000,000). Leave empty for unlimited. Prefill value: 10.
β Supported URL Formats:
Search Results Pages:
https://www.semanticscholar.org/search?q=machine+learning&sort=relevancehttps://www.semanticscholar.org/search?q=neural+networks&year=2020-2024https://www.semanticscholar.org/search?q=quantum+computing&openAccessPdf=
β οΈ Important Input Rules:
- Choose One Method: You must use either search query-based scraping OR custom URL scraping, not both
- Required Fields:
- Either
searchQueryORstartUrlmust be provided
- Either
- Mutual Exclusivity:
- If using
startUrl, you cannot usesearchQueryor any other API filters - If using
searchQuery, you cannot usestartUrl
- If using
- Suggestion-Based Filters:
authorandvenuesare treated as suggestions by the API and may not strictly filter results
Here's what the filled-out input configuration looks like in JSON:
{"searchQuery": "machine learning","yearMin": 2020,"yearMax": 2024,"hasPdf": true,"maxItems": 50}
Example 1: Search Query-Based Scraping (Recommended)
{"searchQuery": "neural networks","yearMin": 2020,"yearMax": 2024,"hasPdf": true,"maxItems": 100}
Example 2: Custom URL Scraping
{"startUrl": "https://www.semanticscholar.org/search?q=machine+learning&sort=relevance","maxItems": 50}
Example 3: Advanced Search with Filters
{"searchQuery": "quantum computing","yearMin": 2022,"hasPdf": true,"author": "John Smith","maxItems": 200}
Pro Tips:
For Search Query-Based Scraping (Recommended):
- π― Be specific with queries - Use precise research terms for best results
- π Filter by year range - Focus on recent papers or specific time periods
- π Use PDF filter - Get only papers with available PDFs for easier access
- β‘ Faster than manual search - No need to browse through multiple pages manually
For Custom URL Scraping:
- Go to Semantic Scholar
- Use the search functionality to find papers on your topic
- Apply any filters you want (year, open access, etc.)
- Copy the URL and paste it into the startUrl field
Output
After the Actor finishes its run, you'll get a dataset with the output. The length of the dataset depends on the amount of results you've set. You can download those results as an Excel, HTML, XML, JSON, and CSV document.
Here's an example of scraped Semantic Scholar data you'll get if you decide to scrape academic papers:
{"paperId": "1234567890","title": "Deep Learning for Natural Language Processing: A Comprehensive Survey","authors": [{"name": "John Smith","url": "https://www.semanticscholar.org/author/123456","authorId": "123456","affiliations": ["Stanford University"]},{"name": "Jane Doe","url": "https://www.semanticscholar.org/author/789012","authorId": "789012","affiliations": ["MIT"]}],"year": 2023,"publicationVenue": "Nature Machine Intelligence","publicationDate": "2023-05-15","abstract": "This paper presents a comprehensive survey of deep learning techniques for natural language processing...","tldr": "A survey of deep learning methods for NLP tasks including transformers, attention mechanisms, and pre-trained models.","citationCount": 245,"referenceCount": 89,"influentialCitationCount": 12,"isOpenAccess": true,"hasPdf": true,"detailUrl": "https://www.semanticscholar.org/paper/1234567890","pdfUrl": "https://example.com/paper.pdf","doi": "10.1038/s42256-023-00123-4","corpusId": "1234567890","externalIds": {"DOI": "10.1038/s42256-023-00123-4","ArXiv": "2305.12345","PubMed": "12345678","PubMedCentral": "PMC1234567","MAG": "123456789","ACL": "2023.acl-main.123","DBLP": "conf/nature/2023","CorpusId": "1234567890"},"fieldsOfStudy": ["Computer Science", "Machine Learning", "Natural Language Processing"],"s2FieldsOfStudy": ["Computer Science"],"publicationTypes": ["JournalArticle"],"journal": {"name": "Nature Machine Intelligence","volume": "5","pages": "123-145"},"scrapedTimestamp": "2025-01-12T23:29:22.172Z"}
What You Get:
- π Complete Paper Information - Titles, abstracts, and TLDR summaries for quick understanding
- π₯ Detailed Author Data - Author names, IDs, affiliations, and profile links
- π Citation Metrics - Total citations, references, and influential citation counts
- π Access Links - Direct PDF links, paper URLs, and DOI identifiers
- ποΈ Publication Details - Venues, journals, publication types, and open access status
- π Research Classifications - Fields of study and research areas
- π External Identifiers - ArXiv, PubMed, ACL, DBLP, and other database IDs
- π Publication Metadata - Years, dates, journal volumes, and page numbers
Download Options: CSV, Excel, or JSON formats for easy analysis in your research tools
Why Choose the Semantic Scholar Scraper?
- π― Comprehensive Data: Get all available paper information in one scrape - citations, abstracts, authors, and more
- π Flexible Search: Search by query or use custom URLs with advanced filtering options
- π Year Filtering: Filter papers by publication year range for targeted research
- π PDF Access: Filter for papers with available PDFs for easier access
- π₯ Author Information: Get complete author details including affiliations and profile links
- π Citation Metrics: Access citation counts, reference counts, and influential citation metrics
- π Multiple Identifiers: Get DOI, ArXiv, PubMed, and other external database IDs
- π« No Duplicates: Automatically skips papers already in your dataset
- β‘ User-Friendly: No coding needed - just input your search query and go
- π Sequential Processing: Processes papers one by one for maximum data quality
Time Savings: Save 4-6 hours per week compared to manual paper research
Cost Efficiency: Fraction of the cost of hiring a research assistant or using expensive academic databases
How to Use
- Sign Up: Create a free account w/ $5 credit (takes 2 minutes)
- Find the Scraper: Visit the Semantic Scholar Scraper page
- Set Input:
- Free users: Can process up to 50 items (maxItems required, maximum value: 50)
- Paid users:
- Option A (Recommended): Enter a search query and apply filters (year range, PDF availability, etc.)
- Option B: Add your custom Semantic Scholar search URL
- Set max items (optional, prefill value: 10, up to 1,000,000)
- Run It: Click "Start" and let it collect your data
- Download Data: Get your results in the "Dataset" tab as CSV, Excel, or JSON
Total Time: 3 minutes setup, 10-30 minutes for data collection
No Technical Skills Required: Everything is point-and-click
Business Use Cases
π¬ Researchers:
- Conduct comprehensive literature reviews
- Track citations and research impact
- Find relevant papers for research projects
- Monitor new publications in your field
π¨βπ« Academics:
- Build reference databases for courses
- Track publication trends in your discipline
- Identify collaboration opportunities
- Analyze research impact metrics
π Librarians:
- Build comprehensive paper collections
- Support researchers with data access
- Track publication trends and patterns
- Create subject-specific databases
π Data Analysts:
- Analyze academic publication trends
- Build research intelligence databases
- Track citation networks
- Support policy decisions with data
π Students:
- Find papers for thesis and dissertation research
- Build comprehensive reference lists
- Track citations for academic writing
- Discover relevant research in your field
Using Semantic Scholar Scraper with the Apify API
For advanced users who want to automate this process, you can control the scraper programmatically with the Apify API. This allows you to schedule regular data collection and integrate with your existing research tools.
Example API Usage:
// Node.js exampleconst { ApifyApi } = require('apify-client');const client = new ApifyApi({token: 'YOUR_API_TOKEN',});// Run with search queryawait client.actor('YOUR_ACTOR_ID').call({searchQuery: "machine learning",yearMin: 2020,yearMax: 2024,hasPdf: true,maxItems: 100});// Run with custom URLawait client.actor('YOUR_ACTOR_ID').call({startUrl: "https://www.semanticscholar.org/search?q=machine+learning&sort=relevance",maxItems: 50});
- Node.js: Install the apify-client NPM package
- Python: Use the apify-client PyPI package
- See the Apify API reference for full details
Frequently Asked Questions
Q: How does it work? A: Semantic Scholar Scraper is easy to use and requires no technical knowledge. Simply enter your search query or paste a Semantic Scholar URL, configure your filters, and let the tool collect the data automatically.
Q: How accurate is the data? A: We collect data directly from Semantic Scholar's official API in real-time, ensuring the most up-to-date and accurate academic paper information available.
Q: Can I filter by specific authors or venues?
A: Yes! You can use the author and venues filters. Note that these are treated as suggestions by the Semantic Scholar API, so results may include papers that match your search query but may not strictly match these filters.
Q: What URL formats are supported? A: We support Semantic Scholar search URLs. See the Input section for specific examples.
Q: Can I schedule regular runs? A: Yes! Use the Apify API to schedule daily, weekly, or monthly runs automatically. Perfect for ongoing research monitoring and publication tracking.
Q: What if I need help? A: Our support team is available 24/7. Contact us through the Apify platform.
Q: Is my data secure? A: Absolutely. All data is encrypted in transit and at rest. We never share your data with third parties.
Q: Are there limits for free users? A: Free users can process up to 50 items per run (maxItems is required and must be 50 or less). Paid users can process up to 1,000,000 items per run.
Integrate Semantic Scholar Scraper with any app and automate your workflow
Last but not least, Semantic Scholar Scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform.
These includes:
Alternatively, you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever Semantic Scholar Scraper successfully finishes a run.
π Recommended Actors
Looking for more data collection tools? Check out these related actors:
| Actor | Description | Link |
|---|---|---|
| GSA eLibrary Scraper | Collects government publication data from GSA eLibrary | https://apify.com/parseforge/gsa-elibrary-scraper |
| PR Newswire Scraper | Extracts press releases and news data from PR Newswire | https://apify.com/parseforge/pr-newswire-scraper |
| Hugging Face Model Scraper | Collects AI model data from Hugging Face | https://apify.com/parseforge/hugging-face-model-scraper |
| Hubspot Marketplace Scraper | Extracts business app data from HubSpot marketplace | https://apify.com/parseforge/hubspot-marketplace-scraper |
| Smart Apify Actor Scraper | Collects comprehensive actor data from Apify with quality metrics | https://apify.com/parseforge/smart-apify-actor-scraper |
Pro Tip: π‘ Browse our complete collection of data collection actors to find the perfect tool for your business needs.
Need Help? Our support team is here to help you get the most out of this tool.
β οΈ Disclaimer: This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Semantic Scholar or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.