Europe PMC Literature Scraper avatar

Europe PMC Literature Scraper

Pricing

from $27.60 / 1,000 results

Go to Apify Store
Europe PMC Literature Scraper

Europe PMC Literature Scraper

Scrape Europe PMC for biomedical research papers. Search by title, author, MeSH terms, journal. Get DOI, abstract, full-text URLs, citations, references, open-access status. No API key required.

Pricing

from $27.60 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

17 hours ago

Last modified

Share

ParseForge Banner

🧬 Europe PMC Literature Scraper

🚀 Export the biomedical literature index in seconds. Search 40+ million records across PubMed, PubMed Central, life-science preprints, agricultural literature, and patents. Filter by title, author, MeSH term, DOI, journal, open access, or free-text. No API key, no registration.

🕒 Last updated: 2026-05-13 · 📊 40+ fields per record · 🧬 40M+ biomedical records · 📚 PubMed + PMC + preprints + patents

The Europe PMC Literature Scraper wraps the official Europe PMC REST API (ebi.ac.uk/europepmc/webservices/rest/search) and returns one row per article with 40+ fields, including DOI, PMID, PMCID, abstract, full-text URLs, MeSH terms, keywords, journal, citation count, open-access status, and licensing. The underlying corpus is published by Europe PMC, the European mirror of PubMed Central, maintained by EMBL-EBI and funded by 32 life-science research funders worldwide.

The index covers MEDLINE/PubMed, PubMed Central (full text), Agricola (USDA agricultural literature), bioRxiv and medRxiv preprints, CTX patents, and Europe PMC-curated content. Free-text and field-qualified queries (TITLE, AUTH, MESH, DOI, PMID, AFFILIATION, JOURNAL) compose freely with boolean operators. This Actor returns structured records ready to download as CSV, Excel, JSON, or XML.

🎯 Target Audience💡 Primary Use Cases
Biomedical researchers, systematic-review teams, bibliometrics analysts, pharma intelligence, scientific publishers, science journalists, OA advocacy, ML training pipelinesLiterature reviews, MeSH-term mining, author publication tracking, journal impact studies, drug-target evidence harvesting, training-set assembly

📋 What the Europe PMC Scraper does

One programmable interface to the full Europe PMC search service:

  • 🔍 Field-qualified queries. TITLE:, AUTH:, AFFILIATION:, JOURNAL:, MESH:, DOI:, PMID:, PMCID:, OPEN_ACCESS:, plus boolean operators (AND, OR, NOT) and quoted phrases.
  • 📚 Three response shapes. core returns the full record with abstract, full-text URLs, and metadata. lite returns compact fields. idlist returns IDs only for ultra-fast scans.
  • ⏱️ Sort options. Relevance (default), newest first, oldest first, or most cited.
  • 🔁 Cursor-mark pagination. Fully automatic. Walks the entire result set efficiently for large queries.

Output captures the publication metadata (PMID, PMCID, DOI, source, journal title, ISSN, volume, issue, page info, publication year and date), full author list, abstract text, affiliation, language, publication types, MeSH headings, keywords, grant count, citation count, full-text URLs, license, open-access flag, and indexing dates.

💡 Why it matters: Europe PMC is the deepest open-access biomedical literature index in the world. The web UI is great for one-off lookups, but systematic reviews, bibliometric studies, and ML training-set assembly need flat rows. This Actor turns the search service into a downloadable dataset in one run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to build a MeSH query and export a literature review dataset.


⚙️ Input

InputTypeDefaultBehavior
querystring"cancer immunotherapy"Europe PMC query string. Supports field qualifiers and booleans.
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
resultTypeenum"core"core = full record, lite = compact, idlist = IDs only.
sortenum""Empty = relevance, date_desc = newest, date_asc = oldest, cited = most cited.

Example: 50 most-cited CRISPR papers.

{
"query": "CRISPR",
"sort": "cited",
"resultType": "core",
"maxItems": 50
}

Example: every open-access mRNA-vaccine paper by Jennifer Doudna.

{
"query": "AUTH:\"Doudna J\" AND mRNA AND OPEN_ACCESS:Y",
"sort": "date_desc",
"resultType": "core",
"maxItems": 100
}

⚠️ Good to Know: field qualifiers are case-sensitive (use AUTH, not auth). Quoted phrases preserve word order ("breast cancer"). The MESH: qualifier matches MeSH headings exactly. Use the Europe PMC search UI at europepmc.org to prototype complex queries before plugging them in.


📊 Output

Each record contains 40+ fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema (selected fields)

FieldTypeExample
📛 titlestring"RPA Combined With CRISPR/Cas12a for Rapid... MRSA Detection"
🆔 idstring"42002396"
🏷️ sourcestring | null"MED"
🔗 urlstring"https://europepmc.org/article/MED/42002396"
🆔 pmidstring"42002396"
🆔 pmcidstring | null"PMC13092367"
🆔 doistring | null"10.1002/jmr.70035"
👤 authorStringstring | null"Chen L, Luo J, Zhang H, Zhao P."
👥 authorListstring[]["Chen L", "Luo J", "Zhang H", "Zhao P"]
📓 journalTitlestring | null"Journal of molecular recognition : JMR"
🆔 journalIssnstring | null"0952-3499"
📅 pubYearstring | null"2026"
📅 pubDateISO 8601 | null"2026-05-01"
📝 abstractTextstring | null"The increasing issue of infections caused by..."
🏢 affiliationstring | null"Department of Laboratory Medicine, Yuebei People's Hospital..."
🌐 languagestring | null"eng"
🏷️ publicationTypesstring[]["research-article", "Journal Article"]
🧬 meshTermsstring[]["CRISPR-Cas Systems", "Methicillin-Resistant Staphylococcus aureus", ...]
🏷️ keywordsstring[]["Detection", "RPA", "MRSA", "Crispr/cas12a"]
💰 grantsCountinteger | null8
📊 citedByCountinteger | null0
📄 hasPDFstring | null"Y"
🔓 isOpenAccessstring | null"Y"
📜 licensestring | null"cc by"
🔗 fullTextUrlsstring[]["https://doi.org/10.1002/jmr.70035", "https://europepmc.org/articles/PMC13092367", ...]
📅 firstIndexDateISO 8601 | null"2026-04-20"
📅 firstPublicationDateISO 8601 | null"2026-05-01"
🕒 scrapedAtISO 8601"2026-05-12T21:31:34.280Z"

Additional fields when present: journalVolume, issue, pageInfo, publicationStatus, hasBook, hasSuppl, hasReferences, hasTextMinedTerms, hasDbCrossReferences, inEPMC, inPMC, dateOfRevision.

📦 Sample record


✨ Why choose this Actor

Capability
🧬40M+ records. PubMed, PMC, preprints, patents, Agricola. The full Europe PMC index.
🔍Field-qualified queries. TITLE, AUTH, MESH, DOI, PMID, AFFILIATION, JOURNAL, OPEN_ACCESS, plus booleans.
📚3 response shapes. Full record (core), compact (lite), or IDs only (idlist).
📊40+ fields per record. DOI, abstract, full-text URLs, MeSH, keywords, citation count, license.
🔓Open-access aware. isOpenAccess, license, and full-text URLs surfaced per record.
⏱️Cursor-mark pagination. Efficient walk across the entire result set, automatic.
Fast. 10 articles in under 3 seconds, 10,000 records in under a minute.
🚫No authentication. Europe PMC publishes under open licenses. No API key needed.

📊 Europe PMC is the canonical European mirror of biomedical literature. The REST API is the source, this Actor turns it into rows.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ Europe PMC Scraper (this Actor)$5 free credit, then pay-per-use40M+ recordsLive per runfield qualifiers, booleans, open-access, sort⚡ 2 min
Manual export from europepmc.orgFree1,000 records per exportOn demandUI filters🐢 Slow, no automation
PubMed eutils (NCBI)FreePubMed only, rate-limitedLiveCustom XML🛠️ Hours of engineering
Commercial bibliographic databases$$$$CuratedVendor cadenceVendor-specific⏳ Days

Pick this Actor when you want a programmable interface to the full Europe PMC index with consistent flat-row output.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Europe PMC Literature Scraper page on the Apify Store.
  3. 🔍 Build a query. Free-text or use field qualifiers (AUTH:"Doudna J" AND CRISPR).
  4. 📚 Pick a response shape. core for full metadata, lite for compact, idlist for IDs only.
  5. 🚀 Run it. Click Start and let the Actor collect your data.
  6. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

💊 Pharma & Biotech Intelligence

  • Drug-target evidence harvesting
  • Competitor pipeline literature monitoring
  • Clinical-trial-adjacent reference sets
  • KOL (key opinion leader) discovery via affiliation

📚 Systematic Reviews

  • Inclusion / exclusion screening pools
  • MeSH-term traversal for review protocols
  • Cited-by tracking for snowball sampling
  • Open-access full-text URL collection

📊 Bibliometrics & Research Ops

  • Citation networks for impact studies
  • Journal-level publication trends
  • Institution affiliation analysis
  • Funding-source landscape reports

🤖 ML & NLP Training

  • Biomedical NER training corpora
  • Abstract-summarization training sets
  • MeSH-classification benchmark assembly
  • Author-disambiguation training data

🔌 Automating Europe PMC Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily literature alerts on a saved query keep your research front-of-mind.


🌟 Beyond business use cases

The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Reproducible literature-search appendices for papers
  • Open-data assignments for bibliometrics coursework
  • Cross-disciplinary citation network studies
  • Funding-impact evaluations for grant reports

🎨 Personal and creative

  • Side projects on biomedical knowledge graphs
  • Science-communication content backed by real papers
  • Reading-list builders for graduate students
  • Personal alerting on niche research topics

🤝 Non-profit and civic

  • Open-access advocacy benchmarks
  • Public-interest journalism on drug research
  • Patient-advocacy literature collections
  • NGO reports on global-health publication trends

🧪 Experimentation

  • Train domain-specific LLMs on biomedical abstracts
  • Validate RAG pipelines with real citations
  • Prototype agents that answer literature questions
  • Test recommender systems with citation graphs

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

The Actor hits the official Europe PMC REST search endpoint (ebi.ac.uk/europepmc/webservices/rest/search), uses cursor-mark pagination to walk through every result, and returns one structured row per article. No HTML scraping, no captcha, no setup.

🔍 Which query qualifiers are supported?

TITLE:, AUTH:, AFFILIATION:, JOURNAL:, MESH:, DOI:, PMID:, PMCID:, OPEN_ACCESS:, FIRST_PDATE:, plus boolean operators AND, OR, NOT and quoted phrases. Mix freely. Example: AUTH:"Doudna J" AND (mRNA OR vaccine) AND OPEN_ACCESS:Y.

📚 What does each result type return?

  • core returns the full record with abstract, full-text URLs, MeSH terms, keywords, license, and all metadata.
  • lite returns compact fields (title, authors, journal, DOI, year) for fast scans.
  • idlist returns IDs only, useful for downstream batch queries against other services.

📊 How big is the Europe PMC index?

Europe PMC indexes over 40 million biomedical literature records as of 2026, including all of MEDLINE/PubMed, full-text PubMed Central content, life-science preprints from bioRxiv and medRxiv, Agricola, and patents.

🔓 How do I find only open-access papers?

Add AND OPEN_ACCESS:Y to your query. The output also includes per-record isOpenAccess, license, and fullTextUrls fields for fine-grained filtering.

⏱️ How does sorting work?

date_desc sorts by first publication date descending (newest first). date_asc sorts ascending (oldest first). cited sorts by citation count descending. Empty sort uses Europe PMC relevance ranking.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval. Daily alerts on a saved query keep your research current.

Europe PMC content is freely available for research and educational use. Full-text articles may be under various open licenses (CC BY, CC BY-NC, etc.). The license field in each record tells you which one. Review the specific license before commercial redistribution of full text.

💼 Can I use this data commercially?

Metadata (title, abstract, authors, DOI) is generally freely usable. Full-text reuse depends on the per-article license. Open-access articles with CC BY are the safest for commercial use.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit to 1,000,000 records.

🔁 What happens if a run fails?

Apify automatically retries transient errors. If a run still fails, you can inspect the log in the Runs tab, fix the input (usually a malformed query), and re-run.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

Europe PMC Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Daily literature alerts in your channels
  • Airbyte - Pipe abstracts into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh abstracts into your knowledge base, or alert your team in Slack on new papers.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Europe PMC, EMBL-EBI, the European Bioinformatics Institute, the National Center for Biotechnology Information, or any of the 32 funders supporting Europe PMC. All trademarks mentioned are the property of their respective owners. Only publicly available open data from the official Europe PMC REST API is collected.