Pricing

from $2.00 / 1,000 paper fetcheds

OpenAlex Research Paper Search

Search 250M+ academic papers, journal articles & scholarly works via OpenAlex API. Filter by keyword, publication year, citation count & open access. Returns authors, affiliations, DOI, concepts. Free, no API key.

Pricing

from $2.00 / 1,000 paper fetcheds

Rating

0.0

(0)

Developer

ryan clinton

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

Why use OpenAlex Research Paper Search?

No API key or account needed -- OpenAlex is a free, open scholarly database. This actor handles all API communication, pagination, and data normalization out of the box.
Structured, analysis-ready output -- Raw API responses are cleaned and transformed into a consistent JSON schema with author names, affiliations, journal details, citation metrics, DOIs, and open access links.
Scalable extraction -- Retrieve up to 10,000 papers per run with automatic multi-page pagination. The actor manages rate limiting and page fetching transparently.
Cloud execution with scheduling -- Run on Apify infrastructure without installing anything locally. Schedule recurring searches to monitor new publications weekly or monthly.
Seamless integrations -- Output feeds directly into Google Sheets, Slack, Zapier, Make, webhooks, and other downstream tools via Apify's built-in integration ecosystem.
Extremely cost-effective -- Runs on just 256 MB memory and completes in seconds. The underlying OpenAlex API is free, so you only pay minimal Apify compute costs.

Key features

Full-text search across titles, abstracts, and full text of 250M+ scholarly works
Publication year filter to focus results on a specific year (e.g., only 2025 papers)
Citation threshold filter to surface only high-impact, highly-cited research above a minimum count
Open access filter to return only freely available papers with direct PDF/download URLs
Flexible sorting by relevance score, citation count, or publication date
Rich metadata extraction -- authors, institutional affiliations, journal name, publisher, DOI, open access URL, and top 5 research concepts per paper
Configurable result limits from 1 to 10,000 papers per run
Automatic pagination with 200 results per API page for efficient large-scale collection
Concept tagging -- each paper includes the top 5 OpenAlex concepts ranked by relevance score
Deduplication of affiliations -- institutional affiliations are extracted and deduplicated across all authors

How to use

Apify Console

Go to the OpenAlex Research Paper Search actor page on Apify.
Click Start to open the input configuration.
Enter your Search Query -- this searches across titles, abstracts, and full text.
Optionally set Publication Year, Minimum Citations, or enable Open Access Only.
Choose a Sort By option: Relevance (default), Most Cited, or Most Recent.
Set Max Results to control how many papers to retrieve (default: 100).
Click Start and wait for the run to complete.
Download results from the Dataset tab in JSON, CSV, or Excel format.

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("kbV7IqCW7tszfXB96").call(run_input={
    "searchQuery": "CRISPR gene editing",
    "publicationYear": 2024,
    "minCitations": 10,
    "openAccessOnly": True,
    "sortBy": "cited_by_count:desc",
    "maxResults": 200
})

for paper in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{paper['title']} ({paper['citedByCount']} citations)")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("kbV7IqCW7tszfXB96").call({
    searchQuery: "CRISPR gene editing",
    publicationYear: 2024,
    minCitations: 10,
    openAccessOnly: true,
    sortBy: "cited_by_count:desc",
    maxResults: 200,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((paper) => {
    console.log(`${paper.title} (${paper.citedByCount} citations)`);
});

Input parameters

Parameter	Type	Required	Default	Description
`searchQuery`	String	Yes	--	Keyword search across titles, abstracts, and full text. Example: `"machine learning healthcare"`
`publicationYear`	Integer	No	--	Filter results to a specific publication year. Example: `2024`
`minCitations`	Integer	No	--	Only include papers with at least this many citations. Example: `50`
`openAccessOnly`	Boolean	No	`false`	When enabled, returns only papers that are freely available as open access
`sortBy`	String	No	`relevance_score:desc`	Sort order: `relevance_score:desc`, `cited_by_count:desc`, or `publication_date:desc`
`maxResults`	Integer	No	`100`	Maximum number of papers to return, between 1 and 10,000

Example input (JSON)

{
    "searchQuery": "transformer neural network architecture",
    "publicationYear": 2023,
    "minCitations": 25,
    "openAccessOnly": true,
    "sortBy": "cited_by_count:desc",
    "maxResults": 500
}

Tips for input configuration

Multi-word queries like "deep learning medical imaging" return more relevant results than single broad keywords.
Combine filters -- use publication year and minimum citations together to find high-impact recent papers.
Start small -- begin with 100 results to validate your query, then increase maxResults for comprehensive collection.
Sort by citations when exploring a new field to identify landmark papers and key references first.

Output

Each paper in the output dataset contains 14 structured fields:

{
    "openAlexId": "https://openalex.org/W2741809807",
    "doi": "https://doi.org/10.1038/s41586-021-03819-2",
    "title": "Highly accurate protein structure prediction with AlphaFold",
    "publicationYear": 2021,
    "citedByCount": 18542,
    "type": "article",
    "authors": [
        "John Jumper",
        "Richard Evans",
        "Alexander Pritzel",
        "Tim Green",
        "Michael Figurnov"
    ],
    "authorAffiliations": [
        "DeepMind Technologies",
        "European Molecular Biology Laboratory"
    ],
    "journalName": "Nature",
    "publisherName": "Springer Nature",
    "isOpenAccess": true,
    "oaUrl": "https://www.nature.com/articles/s41586-021-03819-2.pdf",
    "concepts": [
        "Protein structure prediction",
        "Computational biology",
        "Artificial intelligence",
        "Deep learning",
        "Structural biology"
    ],
    "extractedAt": "2026-02-19T14:30:00.000Z"
}

Output fields reference

Field	Type	Description
`openAlexId`	String	Unique OpenAlex identifier URL for the work
`doi`	String/null	Digital Object Identifier URL, if available
`title`	String	Full title of the paper
`publicationYear`	Integer	Year the paper was published
`citedByCount`	Integer	Total number of citations recorded in OpenAlex
`type`	String	Work type (e.g., `article`, `book-chapter`, `dissertation`, `preprint`)
`authors`	String[]	List of author display names
`authorAffiliations`	String[]	Deduplicated list of institutional affiliations across all authors
`journalName`	String/null	Name of the journal or venue
`publisherName`	String/null	Name of the publisher organization
`isOpenAccess`	Boolean	Whether the paper is freely available
`oaUrl`	String/null	Direct URL to the open access version, if available
`concepts`	String[]	Top 5 OpenAlex concepts ranked by relevance score
`extractedAt`	String	ISO 8601 timestamp of when the data was extracted

Use cases

Systematic literature reviews -- Search and collect thousands of papers on a specific topic with structured metadata for analysis in tools like Zotero, Mendeley, or custom databases.
Bibliometric analysis -- Track citation counts, identify top-cited papers, and analyze publication trends across years, journals, and institutions.
Research trend monitoring -- Schedule recurring runs to detect new publications in your field and receive alerts when papers matching your criteria appear.
Grant writing and proposals -- Quickly gather evidence of the research landscape in a specific area, including key authors, institutions, and publication volumes.
Academic data pipelines -- Feed structured paper data into data warehouses, dashboards, or machine learning models for research intelligence applications.
Open access discovery -- Filter for open access papers to build reading lists of freely available research without journal subscription barriers.
Competitive intelligence for R&D -- Monitor what competitors, partner institutions, or specific research groups are publishing and how often their work is cited.
Citation network analysis -- Use citation counts and concept tags to map relationships between research topics and identify emerging interdisciplinary fields.
Course material curation -- Educators can search for highly-cited papers on specific topics to build reading lists and course bibliographies.
Journalism and science communication -- Reporters can quickly find authoritative, highly-cited sources on scientific topics for fact-checking and story research.

API & integration

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("kbV7IqCW7tszfXB96").call(run_input={
    "searchQuery": "climate change mitigation strategies",
    "minCitations": 100,
    "sortBy": "cited_by_count:desc",
    "maxResults": 50
})

dataset = client.dataset(run["defaultDatasetId"])
for paper in dataset.iterate_items():
    authors = ", ".join(paper["authors"][:3])
    print(f"[{paper['citedByCount']}] {paper['title']} -- {authors}")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("kbV7IqCW7tszfXB96").call({
    searchQuery: "climate change mitigation strategies",
    minCitations: 100,
    sortBy: "cited_by_count:desc",
    maxResults: 50,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((paper) => {
    const authors = paper.authors.slice(0, 3).join(", ");
    console.log(`[${paper.citedByCount}] ${paper.title} -- ${authors}`);
});

cURL

curl "https://api.apify.com/v2/acts/kbV7IqCW7tszfXB96/runs" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -d '{
    "searchQuery": "climate change mitigation strategies",
    "minCitations": 100,
    "sortBy": "cited_by_count:desc",
    "maxResults": 50
  }'

Available integrations

Google Sheets -- Export paper data to spreadsheets for collaborative literature review
Slack / Email -- Get notifications when new papers matching your criteria are published
Zapier / Make -- Route results into custom workflows for research tracking
Webhooks -- Push data to your own API endpoints for automated processing
Amazon S3 / Google Cloud Storage -- Archive results to cloud storage for long-term analysis

How it works

Input validation -- The actor validates the search query and configuration parameters.
URL construction -- Builds the OpenAlex API request URL with search terms, filters (year, citations, open access), sort order, and field selection.
API pagination -- Fetches results in pages of 200 using the OpenAlex polite pool (mailto parameter) for reliable throughput.
Data transformation -- Each raw API result is cleaned and normalized: authorships are flattened to author name arrays, institutional affiliations are extracted and deduplicated, top 5 concepts are selected by score, and journal/publisher metadata is extracted from primary location data.
Incremental storage -- Transformed papers are pushed to the Apify dataset in batches as each page completes.
Completion -- The actor stops when it reaches the requested maximum results, exhausts available matches, or hits the 10,000-result API limit.

Input Query + Filters
        |
        v
  [Build API URL]
        |
        v
  [Fetch Page 1..N] -----> OpenAlex API (200 per page)
        |
        v
  [Transform Results]
   - Flatten authors
   - Deduplicate affiliations
   - Extract top 5 concepts
   - Map journal/publisher
        |
        v
  [Push to Dataset] -----> JSON / CSV / Excel
        |
        v
  [Next Page or Done]

Performance & cost

The actor runs on 256 MB memory and calls a free API, making it extremely cost-effective.

Scenario	Papers	Approx. Duration	Approx. Cost
Quick search	100	5-10 seconds	< $0.01
Medium batch	500	15-20 seconds	~$0.01
Large batch	1,000	20-40 seconds	~$0.01
Full extraction	5,000	2-4 minutes	~$0.02-0.03
Maximum run	10,000	4-8 minutes	~$0.03-0.05

Costs reflect Apify platform compute charges only. The OpenAlex API itself is completely free. Actual costs depend on your Apify subscription plan and current platform pricing.

Limitations

10,000 results maximum -- The OpenAlex API limits paginated access to 10,000 results per query. For larger datasets, split searches by publication year or use multiple targeted queries.
No author-specific search -- The search query matches across titles, abstracts, and full text. There is no dedicated author name filter; include author names in the search query for approximate matching.
Single year filter -- Publication year filtering supports a single year only, not date ranges. Run separate queries for multi-year analysis.
Citation counts are cumulative -- Citation numbers reflect totals at the time of the API call and may differ from counts on publisher websites due to different data sources.
Concept coverage varies -- OpenAlex assigns concepts algorithmically. Newer or niche papers may have fewer or less precise concept tags.
Rate limiting -- While the actor uses the OpenAlex polite pool for higher throughput, very rapid successive runs may encounter temporary rate limits from the API.
English-centric search -- The full-text search works best with English-language queries. Non-English papers are indexed but search relevance may vary.

Responsible use

Respect OpenAlex terms of service -- This actor accesses a free, open API. Use reasonable request volumes and avoid unnecessary repeated extraction of the same data.
Cite sources properly -- When using extracted paper data in research, publications, or applications, cite the original papers using the DOI links provided in the output.
Do not misrepresent data -- Citation counts and metadata are snapshots at extraction time. Do not present them as definitive or real-time metrics without noting the extraction date.
Schedule responsibly -- When setting up recurring runs, use reasonable intervals (daily or weekly) rather than continuous polling to be a good citizen of the OpenAlex API ecosystem.
Comply with copyright -- This actor extracts metadata about scholarly works, not full-text content. Accessing full papers via open access URLs is subject to each publisher's terms.

FAQ

Do I need an API key to use this actor? No. OpenAlex is a free, open academic database that requires no API key or authentication. This actor works immediately with no additional setup.

How current is the OpenAlex data? OpenAlex updates its index continuously and typically includes new papers within days of publication. The database covers works from the 1600s through the present, with the most comprehensive coverage of modern scholarly literature.

What types of academic works does this actor find? OpenAlex indexes journal articles, conference papers, book chapters, dissertations, preprints, datasets, and other scholarly work types. The work type appears in the type field of each output record.

Can I search for papers by a specific author? Include the author's name in the search query for approximate matching. For more targeted author-based searches, consider using the OpenAlex Research Papers actor.

What is the maximum number of results per run? You can retrieve up to 10,000 papers per run. For larger datasets, run multiple searches with different filter combinations, such as splitting by publication year.

How are concepts assigned to papers? OpenAlex uses an automated algorithm to tag each paper with relevant concepts and a confidence score. This actor returns the top 5 concepts by score for each paper.

Can I filter by journal or publisher? Not directly through this actor's input. However, you can post-process the output dataset to filter by journalName or publisherName fields.

How do I get only open access papers? Set the openAccessOnly input parameter to true. The output will include only papers where isOpenAccess is true, and the oaUrl field will contain a direct link to the freely available version.

Can I schedule this actor to run automatically? Yes. Use Apify's built-in scheduling to run the actor daily, weekly, or at any custom interval. Combine with Slack or email integrations to receive notifications when new papers are found.

How accurate are the citation counts? Citation counts come from the OpenAlex database, which aggregates data from multiple sources. They are generally reliable for comparative analysis but may differ from counts on Google Scholar, Scopus, or Web of Science due to different coverage and update frequencies.

What happens if my search returns no results? The actor will complete successfully with an empty dataset. Try broadening your search query, removing filters, or checking for typos in your search terms.

Can I export results to CSV or Excel? Yes. After the run completes, go to the Dataset tab in Apify Console and download in JSON, CSV, Excel, XML, or RSS format. You can also access the data programmatically via the Apify API.

Actor	Description
OpenAlex Research Papers	Alternative OpenAlex actor with additional search and filtering options
PubMed Biomedical Literature Search	Search biomedical and life science papers via the PubMed/NCBI database
Semantic Scholar Paper Search	Search papers using Semantic Scholar's AI-powered academic knowledge graph
Crossref Academic Paper Search	Search scholarly metadata via the Crossref DOI registry
ArXiv Preprint Paper Search	Search preprints on arXiv across physics, mathematics, computer science, and more
CORE Open Access Papers	Search millions of open access research papers from repositories worldwide
DBLP Publication Search	Search computer science publications from the DBLP bibliography database
Europe PMC Literature Search	Search European biomedical and life science literature via Europe PMC

OpenAlex Academic Research Scraper - Scholarly Papers

cloud9_ai/openalex-scraper

Search and extract academic papers, authors, institutions, and research topics from OpenAlex. Free open API covering 250M+ scholarly works. Get citations, abstracts, open access URLs.

cloud9

Openalex Scraper

automation-lab/openalex-scraper

Extract research papers from OpenAlex — titles, authors, citations, institutions, and open access links.

Stas Persiianenko

Crossref Academic Paper Search

ryanclinton/crossref-paper-search

Search 150M+ scholarly papers via Crossref API. Filter by keywords, author, journal, DOI prefix, publication type, and year range. Returns DOIs, citations, authors with ORCID, abstracts, funding data, and publisher metadata. Free, no API key needed.

ryan clinton

OpenAlex Scraper

shahidirfan/OpenAlex-Scraper

Extract scholarly data from OpenAlex—titles, authors, institutions, venues, concepts—using this fast Apify actor. Get academic research in bulk via API, and export results as CSV, Excel, or HTML datasets for research, analytics, or discovery.

Shahid Irfan

5.0

OpenAlex Scraper – Cheap 📚🪄✨

scrapestorm/openalex-scraper---cheap

🔍 Easily extract OpenAlex research Collect structured academic data from OpenAlex, including publication titles, authors, institutions, sources, years, citations, funding details, & entity URLs 📚📊 Ideal for bibliometric analysis, research intelligence, funding analysis & academic insights 🌍🧠

Storm_Scraper

OpenAlex Scraper

parseforge/openalex-scraper

Optimize your academic research with our comprehensive OpenAlex scraper! Obtain complete academic information, including publication dates, DOI links, open access status, and citation metrics. Ideal for researchers, academic institutions, and data analysts who need accurate data without manual work.

ParseForge

5.0

OpenAlex Works Scraper

powerai/openalex-works-scraper

Collect scholarly works from OpenAlex search results by URL, with automatic pagination and structured data (title, authors, venue, citations, PDF link).

PowerAI

Semantic Scholar Paper Search

ryanclinton/semantic-scholar-search

Search and extract data from 200M+ academic papers via Semantic Scholar API. Filter by keyword, year, venue, field of study, citation count, and open access. Returns titles, abstracts, AI summaries (TLDR), authors, DOIs, ArXiv IDs, and PDF links. No API key required.

ryan clinton

CORE Open Access Paper Search

ryanclinton/core-academic-search

Search 300M+ open access academic papers via CORE API. Find research papers by keywords, year range & language. Extract titles, authors, abstracts, DOIs, citation counts, journal names, fields of study & PDF download links. Ideal for literature reviews & research monitoring.

ryan clinton

Academic Paper Scraper

labrat011/academic-paper-scraper

Search MILLIONS of academic papers from Semantic Scholar and arXiv by keyword, DOI, or citation graph. Returns titles, authors, abstracts, citation counts, and open access PDFs as clean JSON. Works as an MCP tool for AI agents.