# ORCID Researcher Profile Scraper (`automation-lab/orcid-researcher-profile-scraper`) Actor

🔎 Extract public ORCID researcher profiles, affiliations, funding, works, identifiers, keywords, and contact links from the official API.

- **URL**: https://apify.com/automation-lab/orcid-researcher-profile-scraper.md
- **Developed by:** [Stas Persiianenko](https://apify.com/automation-lab) (community)
- **Categories:** Education
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## ORCID Researcher Profile Scraper

Search ORCID public profiles and extract structured researcher identity, affiliation, funding, publication, keyword, and public contact data from the official ORCID public API.

Use it to enrich academic CRMs, map institutional researchers, monitor public profile changes, and build research-intelligence datasets without browser automation or login flows.

### What does ORCID Researcher Profile Scraper do?

ORCID Researcher Profile Scraper turns ORCID public API search results and ORCID iDs into clean Apify dataset rows.

It can search by free-text terms, names, affiliations, keywords, or any ORCID-supported Lucene query.

It can also fetch a list of exact ORCID iDs or ORCID profile URLs.

Each output row represents one researcher profile.

The actor normalizes sparse public ORCID records into predictable fields.

### Who is it for?

🎓 University research offices can enrich faculty and researcher databases.

📚 Scholarly publishers can validate author identities and public research links.

💼 Academic recruiters can discover researchers by affiliation or topic.

🧪 Grant intelligence teams can map public funding and work summaries.

🧩 CRM and data vendors can append ORCID identifiers to existing profiles.

### Why use this ORCID scraper?

It uses the official ORCID public API rather than scraping rendered web pages.

That makes runs fast, low cost, and reliable.

It does not require an ORCID account, OAuth token, captcha solving, or a browser.

It emits one stable dataset schema for easy exports to CSV, JSON, Excel, BigQuery, or your own API pipeline.

### What data can you extract?

The exact fields depend on what each researcher has made public in ORCID.

| Group | Example fields |
| --- | --- |
| Identity | ORCID iD, ORCID URL, given names, family name, credit name, display name |
| Profile | biography, keywords, websites, researcher URLs, public emails, countries |
| Identifiers | external identifiers with type, value, and URL |
| Affiliations | employments, educations, memberships, services |
| Research activity | funding summaries, work/publication summaries, counts |
| Metadata | last modified date, source, search query, fetch timestamp |

### How much does it cost to extract ORCID researcher profiles?

This actor uses pay-per-event pricing.

There is a small start event per run and a per-profile event for each saved researcher profile.

Current validated pricing is $0.005 per run plus tiered per-profile pricing.

The BRONZE per-profile price is $0.00004907, with lower prices on higher Apify plans.

For a small test, keep `maxItems` at 10.

For enrichment jobs, raise `maxItems` after confirming your query returns the right population.

### Input options

The actor accepts four main inputs.

`searchQuery` is an ORCID public API Lucene query.

`orcidIds` is an optional list of ORCID iDs or ORCID profile URLs.

`maxItems` caps the number of researcher profiles saved.

`detailDepth` controls how much nested public profile detail is normalized.

### ORCID search query examples

Search by institution:

```json
{
  "searchQuery": "affiliation-org-name:\"Stanford University\"",
  "maxItems": 25,
  "detailDepth": "activities"
}
````

Search by name:

```json
{
  "searchQuery": "family-name:Smith AND given-names:Jane",
  "maxItems": 10,
  "detailDepth": "profileOnly"
}
```

Search by topic:

```json
{
  "searchQuery": "machine learning",
  "maxItems": 100,
  "detailDepth": "works"
}
```

### Fetch exact ORCID iDs

Use `orcidIds` when you already have profile identifiers.

```json
{
  "orcidIds": [
    "0000-0002-1825-0097",
    "https://orcid.org/0000-0002-9510-6777"
  ],
  "maxItems": 2,
  "detailDepth": "activities"
}
```

The actor deduplicates ORCID iDs across search results and explicit IDs.

### Detail depth

`profileOnly` extracts public identity, biography, keywords, URLs, emails, countries, and summary counts.

`activities` adds affiliations and funding summaries.

`works` adds publication/work summaries.

Choose `works` for research intelligence exports.

Choose `profileOnly` for fast identity enrichment.

### Output example

```json
{
  "orcidId": "0000-0002-1825-0097",
  "orcidUri": "https://orcid.org/0000-0002-1825-0097",
  "displayName": "Example Researcher",
  "keywords": ["machine learning"],
  "employmentsCount": 2,
  "worksCount": 18,
  "source": "orcid-public-api",
  "detailDepth": "works",
  "fetchedAt": "2026-06-29T00:00:00.000Z"
}
```

Nested arrays contain affiliations, funding summaries, works, and external identifiers.

### Tips for better ORCID results

Use specific institution names for affiliation searches.

Quote multi-word organizations in the ORCID Lucene query.

Start with `maxItems` 10 to validate query quality.

Use exact ORCID iDs when you need deterministic enrichment.

Expect sparse fields because ORCID users control public visibility.

### Integrations

Export results to Google Sheets for research-office review.

Send dataset rows to a CRM enrichment workflow.

Join ORCID iDs with Crossref, PubMed, OpenAlex, or internal publication records.

Schedule recurring Apify runs to monitor public profile changes.

Use webhooks to trigger downstream compliance or data-quality checks.

### API usage with Node.js

```js
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('automation-lab/orcid-researcher-profile-scraper').call({
  searchQuery: 'affiliation-org-name:"Stanford University"',
  maxItems: 25,
  detailDepth: 'activities'
});
console.log(run.defaultDatasetId);
```

### API usage with Python

```python
from apify_client import ApifyClient

client = ApifyClient('MY-APIFY-TOKEN')
run = client.actor('automation-lab/orcid-researcher-profile-scraper').call(run_input={
    'searchQuery': 'machine learning',
    'maxItems': 50,
    'detailDepth': 'works',
})
print(run['defaultDatasetId'])
```

### API usage with cURL

```bash
curl -X POST 'https://api.apify.com/v2/acts/automation-lab~orcid-researcher-profile-scraper/runs?token=MY-APIFY-TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{"searchQuery":"machine learning","maxItems":25,"detailDepth":"works"}'
```

### MCP usage

Use this actor from Claude Desktop, Claude Code, or another MCP-capable client through Apify MCP.

MCP URL:

```text
https://mcp.apify.com/?tools=automation-lab/orcid-researcher-profile-scraper
```

Claude Code setup:

```bash
claude mcp add apify-orcid --transport http "https://mcp.apify.com/?tools=automation-lab/orcid-researcher-profile-scraper"
```

Claude Desktop, Cursor, and VS Code JSON setup:

Add this server entry to your MCP configuration file. For Claude Desktop, use the app's `claude_desktop_config.json`. For Cursor and VS Code, add the same `mcpServers` block to the editor MCP settings JSON.

```json
{
  "mcpServers": {
    "apify-orcid": {
      "transport": "http",
      "url": "https://mcp.apify.com/?tools=automation-lab/orcid-researcher-profile-scraper"
    }
  }
}
```

Example prompts:

- "Find public ORCID profiles for researchers affiliated with Stanford University."
- "Fetch these ORCID IDs and summarize their public affiliations."
- "Export ORCID profiles related to machine learning with publication summary counts."

### Legality and ethical use

This actor reads public ORCID API data.

Only collect and process data for lawful purposes.

Respect ORCID public visibility settings and applicable privacy obligations.

Do not use public researcher data for spam, harassment, or discriminatory profiling.

### FAQ

#### Why are some ORCID fields empty?

ORCID users control visibility. Empty fields usually mean the researcher did not make that profile section public.

#### Can this actor access private ORCID data?

No. It only uses public ORCID API responses and does not use OAuth or private credentials.

#### Which detail depth should I choose?

Use `profileOnly` for enrichment, `activities` for affiliation/funding mapping, and `works` for publication intelligence.

### Troubleshooting

If a profile field is empty, the researcher may not have made that field public.

If a search returns few results, try a broader ORCID Lucene query.

If you receive no results, verify the query syntax in ORCID's public API documentation.

If a run is slow, lower `detailDepth` or `maxItems`.

### Related scrapers

Explore related Automation Lab actors for research and publication workflows.

- https://apify.com/automation-lab/pubmed-search-scraper
- https://apify.com/automation-lab/crossref-paper-search
- https://apify.com/automation-lab/openalex-works-scraper
- https://apify.com/automation-lab/nih-reporter-grant-search-scraper

### Dataset columns

The main table view includes ORCID iD, display name, name fields, URL, keywords, countries, counts, last modified date, source, query, detail depth, and fetched timestamp.

Full JSON exports include all nested arrays.

### Performance

The actor runs as a lightweight API actor with 256 MB memory.

It uses conservative sequential requests and backoff for HTTP 429 responses.

No proxy is expected for normal ORCID public API usage.

### Limits

`maxItems` is capped at 1000 per run.

The actor saves only records that can be fetched from the public API.

Private ORCID fields are not available.

### Changelog

Initial version supports public ORCID search, exact ORCID iD fetches, profile fields, affiliations, funding summaries, and work summaries.

# Actor input Schema

## `searchQuery` (type: `string`):

ORCID public API Lucene query. Examples: machine learning, family-name:Smith AND given-names:Jane, affiliation-org-name:"Stanford University".

## `orcidIds` (type: `array`):

Optional explicit ORCID iDs or ORCID URLs to fetch. These are combined with search results and deduplicated.

## `maxItems` (type: `integer`):

Maximum number of ORCID profiles to save. Keep this low for tests; increase for enrichment jobs.

## `detailDepth` (type: `string`):

Choose how much public profile detail to normalize. Works mode includes publication summaries and is best for research intelligence exports.

## Actor input object example

```json
{
  "searchQuery": "affiliation-org-name:\"Stanford University\"",
  "orcidIds": [
    "0000-0002-1825-0097"
  ],
  "maxItems": 10,
  "detailDepth": "activities"
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQuery": "affiliation-org-name:\"Stanford University\"",
    "orcidIds": [
        "0000-0002-1825-0097"
    ],
    "maxItems": 10,
    "detailDepth": "activities"
};

// Run the Actor and wait for it to finish
const run = await client.actor("automation-lab/orcid-researcher-profile-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQuery": "affiliation-org-name:\"Stanford University\"",
    "orcidIds": ["0000-0002-1825-0097"],
    "maxItems": 10,
    "detailDepth": "activities",
}

# Run the Actor and wait for it to finish
run = client.actor("automation-lab/orcid-researcher-profile-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQuery": "affiliation-org-name:\\"Stanford University\\"",
  "orcidIds": [
    "0000-0002-1825-0097"
  ],
  "maxItems": 10,
  "detailDepth": "activities"
}' |
apify call automation-lab/orcid-researcher-profile-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=automation-lab/orcid-researcher-profile-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "ORCID Researcher Profile Scraper",
        "description": "🔎 Extract public ORCID researcher profiles, affiliations, funding, works, identifiers, keywords, and contact links from the official API.",
        "version": "0.1",
        "x-build-id": "mG14pO29kU02SApM1"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/automation-lab~orcid-researcher-profile-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-automation-lab-orcid-researcher-profile-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/automation-lab~orcid-researcher-profile-scraper/runs": {
            "post": {
                "operationId": "runs-sync-automation-lab-orcid-researcher-profile-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/automation-lab~orcid-researcher-profile-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-automation-lab-orcid-researcher-profile-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQuery": {
                        "title": "ORCID search query",
                        "type": "string",
                        "description": "ORCID public API Lucene query. Examples: machine learning, family-name:Smith AND given-names:Jane, affiliation-org-name:\"Stanford University\"."
                    },
                    "orcidIds": {
                        "title": "ORCID iDs",
                        "type": "array",
                        "description": "Optional explicit ORCID iDs or ORCID URLs to fetch. These are combined with search results and deduplicated.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Maximum researcher profiles",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of ORCID profiles to save. Keep this low for tests; increase for enrichment jobs.",
                        "default": 25
                    },
                    "detailDepth": {
                        "title": "Detail depth",
                        "enum": [
                            "profileOnly",
                            "activities",
                            "works"
                        ],
                        "type": "string",
                        "description": "Choose how much public profile detail to normalize. Works mode includes publication summaries and is best for research intelligence exports.",
                        "default": "activities"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
