OpenAIRE Scraper | Open Access Research Records
Pricing
from $19.00 / 1,000 results
OpenAIRE Scraper | Open Access Research Records
Search OpenAIRE for open access publications, datasets, software, and funded projects with titles, authors, affiliations, DOI, abstracts, funders, and links. Power academic discovery, research analytics, bibliographic tooling, and science observatories with structured scholarly data.
Pricing
from $19.00 / 1,000 results
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share

🔬 OpenAIRE Research Publications Scraper
🚀 Export open access research papers from OpenAIRE with abstracts, DOIs, and author data in seconds. No login required. No API key. Pure open science.
🕒 Last updated: 2026-05-22 · 📊 12 fields per record · 📚 100M+ publications · 🌍 180+ countries
The OpenAIRE Research Publications Scraper extracts structured metadata from OpenAIRE Explore, the European Open Science platform aggregating research outputs from thousands of repositories, journals, and data sources worldwide. All data is pulled in real time from the OpenAIRE public API.
OpenAIRE indexes over 100 million research objects from PubMed, arXiv, Crossref, DBLP, Zenodo, and institutional repositories across 180+ countries. This Actor lets you query by keyword and year range to pull down publication metadata at scale, covering academic papers, conference proceedings, books, preprints, and datasets.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Academic researchers, data scientists, science journalists, grant managers, librarians, startups | Literature reviews, NLP training corpora, publication trend tracking, grant compliance reporting, discovery system feeds, research landscape mapping |
📋 What the OpenAIRE Scraper does
- 🔎 Keyword search. Full-text search across titles, abstracts, and metadata.
- 📅 Year filtering. Narrow to exact publication date ranges with
fromYear/toYear. - 🧾 Rich metadata. Title, DOI, authors, abstract, publisher, year, access type, language.
- 🔁 Auto pagination. 25 results per request, paginated up to your
maxItemstarget. - 🔗 Source link back. Every record points to the OpenAIRE Explore page for full detail.
- 🛑 Clean free tier. Free users get a 10-record preview before upgrading.
💡 Why it matters: OpenAIRE is the most comprehensive open science aggregator in the world, mandated by the European Commission. No other source indexes this breadth of research in a single queryable API.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
searchQuery | string | "machine learning" | Keywords to search across titles, abstracts, and metadata. |
fromYear | integer | null | Earliest publication year (inclusive). |
toYear | integer | null | Latest publication year (inclusive). |
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
Example: basic search.
{"searchQuery": "machine learning","maxItems": 100}
Example: filtered by year range.
{"searchQuery": "climate change","fromYear": 2020,"toYear": 2024,"maxItems": 500}
⚠️ Good to Know: the
fromYearandtoYearfilters map to thedateofacceptancefield in OpenAIRE, which represents when a paper was accepted for publication. This can differ slightly from the print publication date.
📊 Output
Each record contains 12 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Description |
|---|---|---|
📌 title | string | Full publication title |
🔗 url | string | Link to OpenAIRE Explore page |
🆔 doi | string | null | Digital Object Identifier |
👥 authors | array | List of author full names |
📝 abstract | string | null | Abstract or description |
🏛️ publisher | string | null | Journal or publisher name |
📅 year | integer | null | Publication year |
🔓 accessType | string | null | Open Access, Closed Access, etc. |
🌐 language | string | null | Language of the publication |
🔑 openAireId | string | Internal OpenAIRE identifier |
🕒 scrapedAt | string | ISO 8601 timestamp |
❌ error | string | null | Error message if scraping failed |
📦 Sample record
✨ Why choose this Actor
| Capability | |
|---|---|
| 🌍 | Global coverage. 100M+ records from 180+ countries. |
| 🔓 | No auth required. Public API, zero login friction. |
| ⚡ | Fast pagination. 25 results per request, auto-paginated. |
| 📋 | Rich metadata. DOI, abstract, authors, access type in one shot. |
| 🎯 | Keyword precision. Full-text search across titles and abstracts. |
| 📅 | Year filtering. Narrow to exact publication ranges. |
| 🆓 | Free preview. 10 items free to verify output quality. |
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Structured | Setup |
|---|---|---|---|---|---|
| ⭐ OpenAIRE Scraper (this Actor) | $5 free credit, then pay-per-use | 100M+ records | Live per run | Yes, 12 fields | ⚡ 2 min |
| Manual OpenAIRE browse | Free | Full | Manual | No | 🐢 Hours |
| Semantic Scholar API | Free, rate limited | Partial | Real time | Partial | ⏳ Moderate |
| PubMed API | Free | Medicine only | Real time | Partial | ⏳ Moderate |
| Scopus API | Subscription | Broad | Real time | Yes | 🗝️ Auth gated |
Pick this Actor when you want the broadest open science index in a single structured pull.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the OpenAIRE Research Publications Scraper page on the Apify Store.
- 🎯 Set input. Enter your
searchQuery, optional year range, andmaxItems. - 🚀 Run it. Click Start and let the Actor collect your data.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating OpenAIRE Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor weekly to track new publications in your field over time.
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
📚 What does OpenAIRE index?
OpenAIRE aggregates publications from PubMed, arXiv, Crossref, DBLP, Zenodo, European institutional repositories, and thousands of open access journals. It covers science, technology, humanities, and social sciences.
🔓 Is this data freely available?
Yes. OpenAIRE is a public platform funded by the European Commission. The API is open and free to use without authentication.
📝 Why are some abstracts null?
Abstracts are not always deposited with the publication metadata. Many records sourced from Crossref or DBLP contain only title, author, and DOI. Open Access repositories tend to have more complete records.
📦 How many publications can I collect?
Free users get 10 per run as a preview. Paid users can collect up to 1,000,000 records per run. The OpenAIRE API has over 100 million indexed items.
📅 How accurate are the year filters?
The fromYear and toYear filters map to the dateofacceptance field in OpenAIRE. This can differ slightly from the print publication date.
🌐 Can I filter by language?
Not directly in the input, but the language field in the output lets you filter the downloaded dataset by language after the run completes.
🆔 Are DOIs always present?
No. DOIs are only present when the source repository deposited them. Conference papers and preprints often lack DOIs. Expect roughly 60-80% DOI coverage depending on the search topic.
⚡ How fast is it?
Approximately 25 records per API call with a 300ms delay between pages. Expect 1,000 records in around 15-20 seconds.
⏰ Can I run this on a schedule?
Yes. Use Apify Schedules to run weekly searches and track new publications in your field.
⚖️ Is this scraping legal?
Yes. This Actor uses the official OpenAIRE public API in compliance with their terms of service.
📥 Can I export to Excel or Google Sheets?
Yes. Apify datasets export to CSV, JSON, Excel (XLSX), XML, and JSONL. CSV imports directly into Google Sheets.
🆘 What if I need help?
Our support team is here to help. Use the Tally form linked below to reach out.
🔌 Integrate with any app
OpenAIRE Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe publication data into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes.
🔗 Recommended Actors
- 🩺 ClinicalTrials Scraper - Clinical trial registry data
- 📊 EPA AQS Air Quality Scraper - U.S. EPA air quality measurements
- 🔬 CDC WONDER Mortality Scraper - Public-health mortality datasets
- 📈 BLS Scraper - U.S. Bureau of Labor Statistics data
- 🏛️ FINRA BrokerCheck Scraper - FINRA broker and firm records
💡 Pro Tip: browse the complete ParseForge collection for more open-data and reference scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by OpenAIRE or the European Commission. All trademarks mentioned are the property of their respective owners. Only publicly available data from the OpenAIRE public API is collected. Data accuracy depends on what depositors provide to OpenAIRE.