OpenAIRE Scraper | Open Access Research Records avatar

OpenAIRE Scraper | Open Access Research Records

Pricing

from $19.00 / 1,000 results

Go to Apify Store
OpenAIRE Scraper | Open Access Research Records

OpenAIRE Scraper | Open Access Research Records

Search OpenAIRE for open access publications, datasets, software, and funded projects with titles, authors, affiliations, DOI, abstracts, funders, and links. Power academic discovery, research analytics, bibliographic tooling, and science observatories with structured scholarly data.

Pricing

from $19.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

🔬 OpenAIRE Research Publications Scraper

🚀 Export open access research papers from OpenAIRE with abstracts, DOIs, and author data in seconds. No login required. No API key. Pure open science.

🕒 Last updated: 2026-05-22 · 📊 12 fields per record · 📚 100M+ publications · 🌍 180+ countries

The OpenAIRE Research Publications Scraper extracts structured metadata from OpenAIRE Explore, the European Open Science platform aggregating research outputs from thousands of repositories, journals, and data sources worldwide. All data is pulled in real time from the OpenAIRE public API.

OpenAIRE indexes over 100 million research objects from PubMed, arXiv, Crossref, DBLP, Zenodo, and institutional repositories across 180+ countries. This Actor lets you query by keyword and year range to pull down publication metadata at scale, covering academic papers, conference proceedings, books, preprints, and datasets.

🎯 Target Audience💡 Primary Use Cases
Academic researchers, data scientists, science journalists, grant managers, librarians, startupsLiterature reviews, NLP training corpora, publication trend tracking, grant compliance reporting, discovery system feeds, research landscape mapping

📋 What the OpenAIRE Scraper does

  • 🔎 Keyword search. Full-text search across titles, abstracts, and metadata.
  • 📅 Year filtering. Narrow to exact publication date ranges with fromYear / toYear.
  • 🧾 Rich metadata. Title, DOI, authors, abstract, publisher, year, access type, language.
  • 🔁 Auto pagination. 25 results per request, paginated up to your maxItems target.
  • 🔗 Source link back. Every record points to the OpenAIRE Explore page for full detail.
  • 🛑 Clean free tier. Free users get a 10-record preview before upgrading.

💡 Why it matters: OpenAIRE is the most comprehensive open science aggregator in the world, mandated by the European Commission. No other source indexes this breadth of research in a single queryable API.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
searchQuerystring"machine learning"Keywords to search across titles, abstracts, and metadata.
fromYearintegernullEarliest publication year (inclusive).
toYearintegernullLatest publication year (inclusive).
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.

Example: basic search.

{
"searchQuery": "machine learning",
"maxItems": 100
}

Example: filtered by year range.

{
"searchQuery": "climate change",
"fromYear": 2020,
"toYear": 2024,
"maxItems": 500
}

⚠️ Good to Know: the fromYear and toYear filters map to the dateofacceptance field in OpenAIRE, which represents when a paper was accepted for publication. This can differ slightly from the print publication date.


📊 Output

Each record contains 12 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeDescription
📌 titlestringFull publication title
🔗 urlstringLink to OpenAIRE Explore page
🆔 doistring | nullDigital Object Identifier
👥 authorsarrayList of author full names
📝 abstractstring | nullAbstract or description
🏛️ publisherstring | nullJournal or publisher name
📅 yearinteger | nullPublication year
🔓 accessTypestring | nullOpen Access, Closed Access, etc.
🌐 languagestring | nullLanguage of the publication
🔑 openAireIdstringInternal OpenAIRE identifier
🕒 scrapedAtstringISO 8601 timestamp
errorstring | nullError message if scraping failed

📦 Sample record


✨ Why choose this Actor

Capability
🌍Global coverage. 100M+ records from 180+ countries.
🔓No auth required. Public API, zero login friction.
Fast pagination. 25 results per request, auto-paginated.
📋Rich metadata. DOI, abstract, authors, access type in one shot.
🎯Keyword precision. Full-text search across titles and abstracts.
📅Year filtering. Narrow to exact publication ranges.
🆓Free preview. 10 items free to verify output quality.

📈 How it compares to alternatives

ApproachCostCoverageRefreshStructuredSetup
⭐ OpenAIRE Scraper (this Actor)$5 free credit, then pay-per-use100M+ recordsLive per runYes, 12 fields⚡ 2 min
Manual OpenAIRE browseFreeFullManualNo🐢 Hours
Semantic Scholar APIFree, rate limitedPartialReal timePartial⏳ Moderate
PubMed APIFreeMedicine onlyReal timePartial⏳ Moderate
Scopus APISubscriptionBroadReal timeYes🗝️ Auth gated

Pick this Actor when you want the broadest open science index in a single structured pull.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the OpenAIRE Research Publications Scraper page on the Apify Store.
  3. 🎯 Set input. Enter your searchQuery, optional year range, and maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

📚 Academic Literature Reviews

  • Automate systematic literature reviews
  • Pull 500+ relevant papers in seconds
  • Seed reference managers with structured records
  • Filter by year, language, and access type

🧠 NLP and AI Training Data

  • Build text corpora from scientific abstracts
  • Filter Open Access English for redistribution
  • Train summarization and classification models
  • Power retrieval pipelines for research agents

💰 Grant Monitoring and Compliance

  • Track research output from EU-funded projects
  • Compile evidence of scientific activity
  • Build reporting dashboards for funders
  • Monitor topic-level output by year
  • Spot emerging research trends early
  • Compare publication volumes over time
  • Identify accelerating fields by keyword
  • Back articles with quantified research signals

🔌 Automating OpenAIRE Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor weekly to track new publications in your field over time.


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • PhD literature bootstrapping for new topics
  • Reproducible bibliometric studies
  • Cross-disciplinary research mapping
  • Citation graph seed datasets

🎨 Personal and creative

  • Personal reading lists from niche topics
  • Curiosity-driven topic exploration
  • Hobbyist science writing projects
  • Open-access reading recommendations

🤝 Non-profit and civic

  • Science communication for the public
  • Open knowledge access initiatives
  • NGO research reviews on policy topics
  • Civic data literacy programs

🧪 Experimentation

  • Build AI research-discovery agents
  • Prototype academic search interfaces
  • Train scientific text classifiers
  • Explore long-tail interdisciplinary topics

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

📚 What does OpenAIRE index?

OpenAIRE aggregates publications from PubMed, arXiv, Crossref, DBLP, Zenodo, European institutional repositories, and thousands of open access journals. It covers science, technology, humanities, and social sciences.

🔓 Is this data freely available?

Yes. OpenAIRE is a public platform funded by the European Commission. The API is open and free to use without authentication.

📝 Why are some abstracts null?

Abstracts are not always deposited with the publication metadata. Many records sourced from Crossref or DBLP contain only title, author, and DOI. Open Access repositories tend to have more complete records.

📦 How many publications can I collect?

Free users get 10 per run as a preview. Paid users can collect up to 1,000,000 records per run. The OpenAIRE API has over 100 million indexed items.

📅 How accurate are the year filters?

The fromYear and toYear filters map to the dateofacceptance field in OpenAIRE. This can differ slightly from the print publication date.

🌐 Can I filter by language?

Not directly in the input, but the language field in the output lets you filter the downloaded dataset by language after the run completes.

🆔 Are DOIs always present?

No. DOIs are only present when the source repository deposited them. Conference papers and preprints often lack DOIs. Expect roughly 60-80% DOI coverage depending on the search topic.

⚡ How fast is it?

Approximately 25 records per API call with a 300ms delay between pages. Expect 1,000 records in around 15-20 seconds.

⏰ Can I run this on a schedule?

Yes. Use Apify Schedules to run weekly searches and track new publications in your field.

Yes. This Actor uses the official OpenAIRE public API in compliance with their terms of service.

📥 Can I export to Excel or Google Sheets?

Yes. Apify datasets export to CSV, JSON, Excel (XLSX), XML, and JSONL. CSV imports directly into Google Sheets.

🆘 What if I need help?

Our support team is here to help. Use the Tally form linked below to reach out.


🔌 Integrate with any app

OpenAIRE Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe publication data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes.


💡 Pro Tip: browse the complete ParseForge collection for more open-data and reference scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by OpenAIRE or the European Commission. All trademarks mentioned are the property of their respective owners. Only publicly available data from the OpenAIRE public API is collected. Data accuracy depends on what depositors provide to OpenAIRE.