Semantic Scholar Scraper avatar

Semantic Scholar Scraper

Pricing

Pay per event

Go to Apify Store
Semantic Scholar Scraper

Semantic Scholar Scraper

Extract detailed academic paper data from Semantic Scholar, including abstracts, citations, authors, and publication details. Ideal for researchers, academics, and analysts who need structured scholarly data for literature reviews, research workflows, and large-scale academic analysis.

Pricing

Pay per event

Rating

5.0

(1)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

14

Total users

2

Monthly active users

15 days ago

Last modified

Share

ParseForge Banner

πŸ“š Semantic Scholar Scraper

Search and extract comprehensive academic paper data from Semantic Scholar without needing a paid subscription. Collect detailed information about research papers, including abstracts, citations, authors, publication details, and more. Perfect for researchers, academics, data analysts, and students who need to build datasets for literature reviews, academic intelligence gathering, and research analysis.

The Semantic Scholar Scraper collects academic paper data from Semantic Scholar up to 1,000,000 papers per run with 40+ data fields per paper without coding, no subscription required.

✨ What Does It Do

  • πŸ“ Paper Title - Extract official paper titles for indexing and bibliographic organization
  • πŸ‘₯ Authors - Collect author names and their links to build author-paper relationships for network analysis
  • πŸ“Š Citation Count - Get total citation numbers to identify influential papers and measure research impact
  • πŸŽ“ Influential Citations - Track highly influential citations separately to understand breakthrough moments in academic fields
  • πŸ”— Paper URL - Download direct links to paper detail pages for quick reference and further exploration
  • πŸ“„ PDF URL - Access open access PDF links directly when available to read full papers instantly
  • πŸ“… Publication Date - Extract publication dates to filter papers by research timeline and track field evolution
  • πŸ† Publication Venue - Identify journal or conference names to assess publication prestige and venue impact
  • πŸ“‹ Abstract - Collect full abstracts to understand research scope without reading entire papers
  • 🌍 Fields of Study - Get subject classifications to organize papers by discipline and find related work
  • βœ… Open Access Status - Filter for freely available papers to reduce research costs
  • πŸ’Ύ Research Metadata - Access DOI, corpus ID, reference counts, and publication types for complete citation management

πŸ”§ Input

  • Search Query - Enter keywords to search for papers on Semantic Scholar. Examples: "machine learning", "quantum computing", "neural networks"
  • Start URL - Provide a Semantic Scholar search URL for custom searches with specific filters. Cannot be combined with other filters. Example: https://www.semanticscholar.org/search?q=machine+learning&sort=relevance
  • Minimum Year - Filter papers published on or after this year to focus on recent research. Example: 2020
  • Maximum Year - Filter papers published on or before this year to set an upper bound on publication date. Example: 2024
  • Has PDF - Check this box to only include papers with open access PDFs available for download
  • Author - Filter papers by author name. Note: This is treated as a suggestion, not a strict filter
  • Venues - Filter papers by publication venue such as journal or conference names. This is also a suggestion-based filter
  • Max Items - Specify the maximum number of papers to collect (1-50 for free users, up to 1,000,000 for paid users). Leave empty for unlimited

Example input:

{
"searchQuery": "machine learning",
"yearMin": 2020,
"yearMax": 2024,
"hasPdf": true,
"maxItems": 100
}

πŸ“Š Output

Each paper includes up to 40 data fields covering complete bibliographic and academic metrics. Download as JSON, CSV, or Excel.

πŸ“ Paper TitleπŸ‘₯ AuthorsπŸ“Š CitationsπŸŽ“ Influential Citations
πŸ”— Paper URLπŸ“„ PDF URLπŸ“… Publication DateπŸ† Publication Venue
πŸ“‹ Abstract🌍 Fields of Studyβœ… Open AccessπŸ’Ύ DOI
🏒 Journal NameπŸ“ˆ Reference CountπŸ“š Publication TypesπŸ†” Paper ID
πŸ†” Corpus IDπŸ”Ž Research SummaryπŸ“Œ External IDsπŸ“… Scraped Timestamp

πŸ’Ž Why Choose the Semantic Scholar Scraper?

FeatureSemantic Scholar ScraperSimilar Scrapers
No subscription requiredβœ”οΈβŒ
Collect up to 1,000,000 papersβœ”οΈPartial
Direct access to PDF URLsβœ”οΈβŒ
Author and venue filteringβœ”οΈβœ”οΈ
Publication year filteringβœ”οΈβœ”οΈ
Citation metrics includedβœ”οΈPartial
Open access status indicatorβœ”οΈβŒ
Research summary (TLDR)βœ”οΈβŒ
Fields of study classificationβœ”οΈβŒ
Influential citation trackingβœ”οΈβŒ
Paper metadata export formatsβœ”οΈPartial
Real-time publication indexingβœ”οΈPartial

πŸ“‹ How to Use

No technical skills required. Follow these simple steps:

  1. Sign Up: Create a free account with $5 credit
  2. Find the Tool: Search for "Semantic Scholar Scraper" in the Apify Store and configure your search query
  3. Run It: Click "Start" and watch your academic paper data appear

That's it. No coding, no setup, no complicated configuration. Now you can export your data in CSV, Excel, or JSON format.

🎯 Business Use Cases

  • πŸ“Š Academic Researchers - Search papers on specific topics with year filters to build targeted literature review datasets for dissertation research or grant proposals
  • πŸ“ˆ Data Analysts - Collect citation metrics and publication venues to analyze research trends, identify emerging fields, and benchmark academic influence within disciplines
  • πŸ’Ό Institutional Research Teams - Monitor faculty publications and research output across departments to evaluate productivity, identify collaboration patterns, and support tenure reviews

❓ FAQ

πŸ” How does it work? The Semantic Scholar Scraper searches for papers using your keywords and filters, then extracts complete bibliographic data including abstracts, authors, citations, and PDF links in seconds without requiring a subscription.

πŸ“Š How accurate is the data? Data comes directly from Semantic Scholar's database, which aggregates information from publishers, institutional repositories, and open access sources. Citation counts and publication details are as accurate as Semantic Scholar's index, which is continuously updated.

πŸ“… Can I schedule runs to monitor new papers? Yes. You can integrate this scraper with Make or Zapier to automatically run searches on a schedule, allowing you to monitor newly published papers in your field weekly or monthly.

βš–οΈ Is collecting this data legal? Semantic Scholar publishes its data publicly and allows automated access for research. As long as you comply with Semantic Scholar's terms of service and use the data for legitimate research purposes, collection is permitted. It's your responsibility to comply with local laws and Semantic Scholar's policies.

πŸ›‘οΈ Will Semantic Scholar block me? Semantic Scholar is designed to be openly accessible to researchers and does not require authentication. The scraper respects rate limits and uses standard practices, so blocks are unlikely. If you run into limits, the scraper will pause and retry automatically.

⚑ How long does a run take? Typical runs take 10-30 seconds per 100 papers depending on your internet connection and Semantic Scholar's server response time. Large runs with 10,000+ papers may take several minutes but run efficiently in the background.

⚠️ Are there any limits? Free users can collect up to 50 results per run. Paid users can collect up to 1,000,000 results per run.

πŸ”— Integrate Semantic Scholar Scraper with any app

πŸ’‘ More ParseForge Actors

Browse our complete collection of data extraction tools for more.

πŸš€ Ready to Start?

Create a free account with $5 credit and collect your first 50 papers for free. No coding, no setup.

πŸ†˜ Need Help?

  • Check the FAQ section above for common questions
  • Visit the Apify support page for documentation and tutorials
  • Contact us to request a new scraper, propose a custom project, or report an issue at Tally contact form

⚠️ Disclaimer

This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Semantic Scholar or any of its subsidiaries. All trademarks mentioned are the property of their respective owners.