π Academic Paper Scraper β Research & Citations
Pricing
from $20.00 / 1,000 results
π Academic Paper Scraper β Research & Citations
Scrape academic papers, research articles, citations, author profiles, and h-index data from Google Scholar. Extract abstracts, publication dates, journal names, and citation counts for literature reviews.
Pricing
from $20.00 / 1,000 results
Rating
0.0
(0)
Developer
Stephan Corbeil
Maintained by CommunityActor stats
0
Bookmarked
4
Total users
2
Monthly active users
8 days ago
Last modified
Categories
Share
π Academic Paper Scraper β Semantic Scholar, Connected Papers & Scite Alternative
Search and extract structured metadata from academic papers across major open repositories β title, authors, affiliations, abstract, DOI, citation count, publication venue, year, and reference list. Built as a pay-per-result alternative to Semantic Scholar API (rate-limited), Connected Papers (UI-only), Scite ($20-100/mo), Web of Science (institutional license), and Scopus for systematic reviews, citation tracking, and research-trend monitoring.
Why Academic Paper Scraper Beats Semantic Scholar, Connected Papers, Scite & Web of Science
| Feature | NexGenData Academic Paper Scraper | Semantic Scholar API | Connected Papers | Scite | Web of Science |
|---|---|---|---|---|---|
| Cost | $0.005 / paper, pay-per-result | Free + rate-limited | Free (UI only) | $20-100 / month | Institutional license |
| Bulk search by keyword / topic | Yes | Yes (rate-limited) | UI only | Yes | Yes |
| Full reference list per paper | Yes | Yes | Visualization only | Yes | Yes |
| Citation counts | Yes | Yes | Yes | Yes | Yes |
| Cross-repository coverage | arXiv, PubMed, DOAJ, OpenAlex, etc. | Multi-source | Multi-source | Multi-source | Multi-source |
| Bulk export | JSON / CSV / Excel | DIY pagination | None | CSV (plan-gated) | Plan-gated |
| API access | Apify REST + SDKs | Yes (free + rate-limited) | None | Paid plan | Institutional |
| Auth required | Apify token | Optional API key | None | Account + plan | Institutional login |
| Monthly minimum | None | None | None | $20+ | Institutional |
Most academic + R&D teams pick this actor instead of the Semantic Scholar API because the free tier rate-limits cap any systematic review past ~5K papers. Cheaper than Scite for the bulk-metadata use case and a drop-in alternative to Web of Science for teams without an institutional license.
What You Get Per Paper
title,abstract,doi,arxiv_id,pubmed_id,openalex_idauthorsβ array of{name, affiliation, orcid}recordspublished_date,year,venue,journal,conferencecitation_count,reference_count,influential_citation_countreferencesβ array of cited papers{title, doi, year}topicsβ MeSH terms / arXiv categories / OpenAlex conceptsis_open_access,pdf_url,landing_page_urlfunding_acknowledgementsβ when extractable
Use Cases
- Systematic reviews β pull all papers on a topic with full metadata for PRISMA workflows
- Citation tracking β monitor papers citing your work or competitor work weekly
- Grant writing β assemble prior-art bibliographies + recent-literature surveys
- Research-trend monitoring β detect rising topics in a field by citation velocity
- Pharma / biotech competitive intel β track competitor publications and KOL output
- AI / NLP training β bulk-export labeled paper abstracts for scientific-language models
- Patent prior art β surface academic publications relevant to a patent application
Quick Start (Python)
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("nexgendata/academic-paper-scraper").call(run_input={"queries": ["large language models", "graph neural networks"],"year_from": 2023,"max_per_query": 500,"include_references": True})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["title"], item["citation_count"], item["doi"])
Pricing β Pay Per Paper
- Actor start: $0.005
- Paper: $0.005
A 500-paper literature review = $2.505. A weekly 100-paper citation-tracker = $0.505/run. No monthly minimum.
Related NexGenData Academic + Research Actors
| Use case | Actor |
|---|---|
| arXiv preprint search + metadata | arxiv-scraper |
| PubMed biomedical search | pubmed-research-search |
| Google Scholar search | google-scholar-scraper |
| Academic research MCP server (AI / Claude) | academic-research-mcp-server |
| NIH RePORTER grants database | nih-reporter-grants-scraper |
| IRS 990 nonprofit research funding | irs-990-nonprofit-explorer-scraper |
| SEC EDGAR filings (corporate R&D) | sec-edgar-scraper |
| Federal Register rules (regulatory science) | federal-register-rules-scraper |
| Hacker News scraper (CS / ML discourse) | hacker-news-scraper |
FAQ
Q: Coverage scope? A: Multi-source β arXiv, PubMed, DOAJ, OpenAlex, and Semantic Scholar's graph. Each paper is deduped across sources by DOI / arXiv ID.
Q: Citation counts β how fresh? A: Pulled live per run from the source graph (Semantic Scholar / OpenAlex). Typically within 24-48h of the source's own refresh.
Q: Closed-access papers?
A: Metadata + abstract are returned. PDF is included only when the paper is open-access (is_open_access: true). For closed-access PDFs, follow the doi URL with your institutional access.
Q: Full-text mining?
A: This actor returns metadata + abstract. For full-text PDFs, use the pdf_url field with a separate PDF-parser actor.
Q: How does it differ from pubmed-research-search? A: PubMed search is biomedical-only with deeper MeSH coverage. This actor is cross-discipline (CS, physics, bio, social sciences) via the OpenAlex + Semantic Scholar graph.
How NexGenData Pricing Works
Every NexGenData actor uses pay-per-event pricing β you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.
- Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
- Result / item: charged per item written to the default dataset
- No charge for retries, internal proxy rotation, or failed sub-requests β those are absorbed by the platform
Apify Platform Bonus
New to Apify? Sign up with the NexGenData referral link β you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.
Integration Surface
Every actor in the NexGenData catalog can be triggered from:
- Apify console β point-and-click run
- Apify API β REST + webhooks
- Apify Python / JS SDKs β programmatic batch
- Zapier, Make.com, n8n β official integrations
- MCP β many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
- Schedules β built-in cron for daily / weekly / monthly runs
- Webhooks β POST results to any HTTPS endpoint on dataset write
Support
NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome β high-demand features ship in the next version.
Home: thenextgennexus.com Full catalog: apify.com/nexgendata
