πŸ“„ Academic Paper Scraper β€” Research & Citations avatar

πŸ“„ Academic Paper Scraper β€” Research & Citations

Pricing

from $20.00 / 1,000 results

Go to Apify Store
πŸ“„ Academic Paper Scraper β€” Research & Citations

πŸ“„ Academic Paper Scraper β€” Research & Citations

Scrape academic papers, research articles, citations, author profiles, and h-index data from Google Scholar. Extract abstracts, publication dates, journal names, and citation counts for literature reviews.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

Stephan Corbeil

Stephan Corbeil

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

8 days ago

Last modified

Share

πŸ“š Academic Paper Scraper β€” Semantic Scholar, Connected Papers & Scite Alternative

Search and extract structured metadata from academic papers across major open repositories β€” title, authors, affiliations, abstract, DOI, citation count, publication venue, year, and reference list. Built as a pay-per-result alternative to Semantic Scholar API (rate-limited), Connected Papers (UI-only), Scite ($20-100/mo), Web of Science (institutional license), and Scopus for systematic reviews, citation tracking, and research-trend monitoring.

Why Academic Paper Scraper Beats Semantic Scholar, Connected Papers, Scite & Web of Science

FeatureNexGenData Academic Paper ScraperSemantic Scholar APIConnected PapersSciteWeb of Science
Cost$0.005 / paper, pay-per-resultFree + rate-limitedFree (UI only)$20-100 / monthInstitutional license
Bulk search by keyword / topicYesYes (rate-limited)UI onlyYesYes
Full reference list per paperYesYesVisualization onlyYesYes
Citation countsYesYesYesYesYes
Cross-repository coveragearXiv, PubMed, DOAJ, OpenAlex, etc.Multi-sourceMulti-sourceMulti-sourceMulti-source
Bulk exportJSON / CSV / ExcelDIY paginationNoneCSV (plan-gated)Plan-gated
API accessApify REST + SDKsYes (free + rate-limited)NonePaid planInstitutional
Auth requiredApify tokenOptional API keyNoneAccount + planInstitutional login
Monthly minimumNoneNoneNone$20+Institutional

Most academic + R&D teams pick this actor instead of the Semantic Scholar API because the free tier rate-limits cap any systematic review past ~5K papers. Cheaper than Scite for the bulk-metadata use case and a drop-in alternative to Web of Science for teams without an institutional license.

What You Get Per Paper

  • title, abstract, doi, arxiv_id, pubmed_id, openalex_id
  • authors β€” array of {name, affiliation, orcid} records
  • published_date, year, venue, journal, conference
  • citation_count, reference_count, influential_citation_count
  • references β€” array of cited papers {title, doi, year}
  • topics β€” MeSH terms / arXiv categories / OpenAlex concepts
  • is_open_access, pdf_url, landing_page_url
  • funding_acknowledgements β€” when extractable

Use Cases

  • Systematic reviews β€” pull all papers on a topic with full metadata for PRISMA workflows
  • Citation tracking β€” monitor papers citing your work or competitor work weekly
  • Grant writing β€” assemble prior-art bibliographies + recent-literature surveys
  • Research-trend monitoring β€” detect rising topics in a field by citation velocity
  • Pharma / biotech competitive intel β€” track competitor publications and KOL output
  • AI / NLP training β€” bulk-export labeled paper abstracts for scientific-language models
  • Patent prior art β€” surface academic publications relevant to a patent application

Quick Start (Python)

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("nexgendata/academic-paper-scraper").call(run_input={
"queries": ["large language models", "graph neural networks"],
"year_from": 2023,
"max_per_query": 500,
"include_references": True
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], item["citation_count"], item["doi"])

Pricing β€” Pay Per Paper

  • Actor start: $0.005
  • Paper: $0.005

A 500-paper literature review = $2.505. A weekly 100-paper citation-tracker = $0.505/run. No monthly minimum.

Use caseActor
arXiv preprint search + metadataarxiv-scraper
PubMed biomedical searchpubmed-research-search
Google Scholar searchgoogle-scholar-scraper
Academic research MCP server (AI / Claude)academic-research-mcp-server
NIH RePORTER grants databasenih-reporter-grants-scraper
IRS 990 nonprofit research fundingirs-990-nonprofit-explorer-scraper
SEC EDGAR filings (corporate R&D)sec-edgar-scraper
Federal Register rules (regulatory science)federal-register-rules-scraper
Hacker News scraper (CS / ML discourse)hacker-news-scraper

FAQ

Q: Coverage scope? A: Multi-source β€” arXiv, PubMed, DOAJ, OpenAlex, and Semantic Scholar's graph. Each paper is deduped across sources by DOI / arXiv ID.

Q: Citation counts β€” how fresh? A: Pulled live per run from the source graph (Semantic Scholar / OpenAlex). Typically within 24-48h of the source's own refresh.

Q: Closed-access papers? A: Metadata + abstract are returned. PDF is included only when the paper is open-access (is_open_access: true). For closed-access PDFs, follow the doi URL with your institutional access.

Q: Full-text mining? A: This actor returns metadata + abstract. For full-text PDFs, use the pdf_url field with a separate PDF-parser actor.

Q: How does it differ from pubmed-research-search? A: PubMed search is biomedical-only with deeper MeSH coverage. This actor is cross-discipline (CS, physics, bio, social sciences) via the OpenAlex + Semantic Scholar graph.


How NexGenData Pricing Works

Every NexGenData actor uses pay-per-event pricing β€” you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.

  • Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
  • Result / item: charged per item written to the default dataset
  • No charge for retries, internal proxy rotation, or failed sub-requests β€” those are absorbed by the platform

Apify Platform Bonus

New to Apify? Sign up with the NexGenData referral link β€” you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.

Integration Surface

Every actor in the NexGenData catalog can be triggered from:

  • Apify console β€” point-and-click run
  • Apify API β€” REST + webhooks
  • Apify Python / JS SDKs β€” programmatic batch
  • Zapier, Make.com, n8n β€” official integrations
  • MCP β€” many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
  • Schedules β€” built-in cron for daily / weekly / monthly runs
  • Webhooks β€” POST results to any HTTPS endpoint on dataset write

Support

NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome β€” high-demand features ship in the next version.

Home: thenextgennexus.com Full catalog: apify.com/nexgendata