# OpenAlex Institutions Scraper (`parseforge/openalex-institutions-scraper`) Actor

Gather structured records from Openalex Institutions with names, identifiers, dates, descriptions, status flags and source links. Loved by research, intelligence and operational dashboards. Run on demand or on a recurring schedule and feed every row into your favourite analytics or workflow stack.

- **URL**: https://apify.com/parseforge/openalex-institutions-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Other, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 🎓 OpenAlex Institutions Scraper

> 🚀 **Pull 124,000+ research institutions in seconds.** Universities, hospitals, companies, government labs, archives and facilities from the OpenAlex scholarly graph, with citation counts, ROR identifiers, geo coordinates and homepage URLs.

> 🕒 **Last updated:** 2026-05-27 · **📊 22 fields** per record · **124K+ institutions** · **Global coverage**

OpenAlex is the open replacement for Microsoft Academic Graph, indexing every research institution on Earth with citation analytics and rich metadata. This scraper turns the institutions endpoint into a clean dataset you can pull into spreadsheet, Power BI, Tableau or your own database - no API key, no rate-limit headaches, no scraping HTML.

Every record is normalised to a ROR identifier and tagged with country, type, works count, h-index, i10 index and full geo data. Filter by country code, institution type or any keyword search, and stream as many as a million records into your downstream pipeline.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Academic researchers | Build a directory of universities and labs in a country |
| Research intelligence teams | Benchmark institutional output and citation impact |
| Grant funders | Identify and segment partner research bodies |
| EdTech and SaaS sales | Source TAM lists of higher-ed and healthcare institutions |
| Data scientists | Enrich author and paper data with institutional metadata |

### 📋 What the OpenAlex Institutions Scraper does

- Streams institution records from the official OpenAlex `/institutions` endpoint
- Supports full-text `search`, ISO country code filter and institution `type` filter
- Returns normalised metadata: works count, citations, h-index, i10 index, ROR, GRID, Wikipedia, Wikidata
- Adds geo data: city, region, country, latitude and longitude
- No login, no API key, no usage quotas

> 💡 **Why it matters:** institutional metadata is the bedrock of every research analytics, TAM-building and academic partnership workflow. OpenAlex is the most complete open source, and this actor delivers it in the records your stack already understands.

### 🎬 Full Demo (_🚧 Coming soon_)

### ⚙️ Input

<table>
<tr><th>Field</th><th>Type</th><th>Description</th></tr>
<tr><td>search</td><td>string</td><td>Full-text search on institution name</td></tr>
<tr><td>maxItems</td><td>integer</td><td>How many records to return (free plan capped at 10)</td></tr>
<tr><td>country</td><td>string</td><td>ISO 2-letter country code, lowercase (us, gb, de, jp)</td></tr>
<tr><td>type</td><td>enum</td><td>education, healthcare, company, archive, nonprofit, government, facility, other</td></tr>
</table>

```structured
{ "search": "stanford", "maxItems": 5 }
````

```structured
{ "country": "de", "type": "education", "maxItems": 50 }
```

> ⚠️ **Good to Know:** OpenAlex caps `per-page` at 200 records and uses cursor pagination. The actor handles cursors transparently - you only set `maxItems`.

### 📊 Output

<table>
<tr><th>Field</th><th>Description</th></tr>
<tr><td>🖼 imageUrl</td><td>Logo thumbnail if available</td></tr>
<tr><td>📛 displayName</td><td>Canonical institution name</td></tr>
<tr><td>🏷 type</td><td>Institution type</td></tr>
<tr><td>🌍 countryCode</td><td>ISO 2-letter country code</td></tr>
<tr><td>📚 worksCount</td><td>Total publications</td></tr>
<tr><td>📈 citedByCount</td><td>Total citations</td></tr>
<tr><td>🏆 hIndex</td><td>Institutional h-index</td></tr>
<tr><td>🔗 rorUrl</td><td>ROR identifier URL</td></tr>
<tr><td>🌐 homepageUrl</td><td>Official homepage</td></tr>
<tr><td>📍 city / region / country / latitude / longitude</td><td>Geo metadata</td></tr>
<tr><td>🆔 acronyms / alternativeNames</td><td>Other known names</td></tr>
<tr><td>📚 wikipediaUrl / wikidataUrl / gridId</td><td>Cross-references</td></tr>
<tr><td>🔗 sourceUrl</td><td>OpenAlex canonical URL</td></tr>
<tr><td>🕒 scrapedAt</td><td>ISO timestamp</td></tr>
</table>

### ✨ Why choose this Actor

- 🆓 No API key, no auth, no rate-limit drama
- 📡 Direct hit on the official OpenAlex API
- 🧭 Cursor pagination handled automatically
- 🏷 ROR, GRID, Wikidata cross-references in every record
- 📦 Pull as structured records

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Setup time |
|---|---|---|---|
| Manual tabular retrieves from ROR | Free | Names only | Hours |
| OpenAlex API directly | Free | Full | Code required |
| ParseForge OpenAlex Institutions Scraper | Pay-per-result | Full + structured | Minutes |

### 🚀 How to use

1. [Create a free Apify account](https://console.apify.com/sign-up?fpr=vmoqkp) (includes $5 credit).
2. Open the OpenAlex Institutions Scraper.
3. Set `search`, `country` or `type` filters.
4. Click **Start** and retrieves tabular / spreadsheet / structured / structured.
5. Schedule daily, weekly or trigger from Make / Zapier.

### 💼 Business use cases

**Academic CRM enrichment** - match institution names against ROR to deduplicate and segment your contact database.

**Research partnership scouting** - find every healthcare institute in Germany by citation impact.

**EdTech go-to-market** - build a target list of universities by country and discipline.

**Grant funding analytics** - benchmark institutional output across your portfolio.

### 🔌 Automating OpenAlex Institutions Scraper

Hook into Make, Zapier, n8n, Airbyte, Pipedream, Slack, Google Drive, GitHub Actions or any HTTP webhook to schedule recurring runs and pipe data straight to your warehouse.

### 🌟 Beyond business use cases

- **Research:** track the geographic spread of every healthcare facility on the planet.
- **Personal:** explore your alma mater's citation network.
- **Non-profit:** map underrepresented research institutions in the Global South.
- **Experimentation:** combine with OpenAlex works data to compute institutional rankings.

### 🤖 Ask an AI assistant about this scraper

Ask ChatGPT, Claude, Perplexity or Copilot: "How do I pull every healthcare institution in France from OpenAlex using the ParseForge Apify actor?"

### ❓ Frequently Asked Questions

**Do I need an OpenAlex API key?**
No. OpenAlex is fully open. The actor sends a polite User-Agent on your behalf.

**How many institutions are in OpenAlex?**
124,000+ as of 2026-05-26.

**Can I filter by city?**
Filter by country code; city is returned as metadata for downstream filtering.

**What's the difference between OpenAlex and ROR?**
ROR is an identifier registry. OpenAlex is the full research graph that uses ROR.

**Are h-index values official?**
They are OpenAlex's computed institutional h-index, recomputed monthly.

**Does the actor follow lineage?**
Lineage is returned as a list of parent OpenAlex IDs per institution.

**Does the actor work for non-Latin scripts?**
Yes. OpenAlex stores Japanese, Chinese, Arabic and Cyrillic names as alternatives.

**Can I pull to Google Sheets?**
Yes, via the Apify Google Sheets integration.

**Are deduplications handled?**
OpenAlex already deduplicates against ROR. The actor returns one record per ROR.

**Is the data refreshed?**
OpenAlex refreshes monthly. Re-run the actor to pick up new institutions.

### 🔌 Integrate with any app

Apify, Make, Zapier, n8n, Pipedream, Slack, Airbyte, GitHub, Google Drive, Power Automate, AWS Lambda, REST webhook.

### 🔗 Recommended Actors

| Actor | What it does |
|---|---|
| [EU Clinical Trials Register Scraper](https://apify.com/parseforge/eu-clinical-trials-register-scraper) | Pull every EU CTIS clinical trial |
| [OurAirports Scraper](https://apify.com/parseforge/ourairports-scraper) | Global airport database |
| [FINRA BrokerCheck Scraper](https://apify.com/parseforge/finra-brokercheck-scraper) | Broker disclosure records |

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more research and business data scrapers.

**🆘 Need Help?** [Open our contact form](https://tally.so/r/BzdKgA)

> **⚠️ Disclaimer:** independent tool, not affiliated with OpenAlex. Only publicly available open data is collected.

# Actor input Schema

## `search` (type: `string`):

Full-text search on institution display name. Leave blank for all institutions sorted by works count.

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `country` (type: `string`):

Filter by ISO 2-letter country code (lowercase). Leave blank for any country.

## `type` (type: `string`):

Filter by institution type. Leave blank for any.

## Actor input object example

```json
{
  "maxItems": 10,
  "country": "",
  "type": ""
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "maxItems": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/openalex-institutions-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "maxItems": 10 }

# Run the Actor and wait for it to finish
run = client.actor("parseforge/openalex-institutions-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "maxItems": 10
}' |
apify call parseforge/openalex-institutions-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/openalex-institutions-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "OpenAlex Institutions Scraper",
        "description": "Gather structured records from Openalex Institutions with names, identifiers, dates, descriptions, status flags and source links. Loved by research, intelligence and operational dashboards. Run on demand or on a recurring schedule and feed every row into your favourite analytics or workflow stack.",
        "version": "0.1",
        "x-build-id": "GUa1yISfFcyKkAODG"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~openalex-institutions-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-openalex-institutions-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~openalex-institutions-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-openalex-institutions-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~openalex-institutions-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-openalex-institutions-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "search": {
                        "title": "Search term",
                        "type": "string",
                        "description": "Full-text search on institution display name. Leave blank for all institutions sorted by works count."
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "country": {
                        "title": "Country",
                        "type": "string",
                        "description": "Filter by ISO 2-letter country code (lowercase). Leave blank for any country.",
                        "default": ""
                    },
                    "type": {
                        "title": "Institution type",
                        "enum": [
                            "",
                            "education",
                            "healthcare",
                            "company",
                            "archive",
                            "nonprofit",
                            "government",
                            "facility",
                            "other"
                        ],
                        "type": "string",
                        "description": "Filter by institution type. Leave blank for any.",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
