# DOE OSTI Scientific Reports Scraper (`compute-edge/osti-scientific-reports-scraper`) Actor

Extract metadata for U.S. Department of Energy OSTI scientific & technical reports. Filter by subject, research organization, product type, and publication year. Clean JSON for prior-art search, RAG pipelines, grant tracking.

- **URL**: https://apify.com/compute-edge/osti-scientific-reports-scraper.md
- **Developed by:** [Compute Edge](https://apify.com/compute-edge) (community)
- **Categories:** Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## DOE OSTI Scientific & Technical Reports Scraper

Extract structured metadata for **U.S. Department of Energy (DOE) Office of Scientific and Technical Information (OSTI)** reports — the authoritative repository of DOE-funded research output covering fusion energy, nuclear science, renewable energy, climate, materials science, high-energy physics, biology, and more.

OSTI hosts over **3 million scientific records**, including technical reports, journal articles, conference papers, theses, and patents produced by DOE national laboratories and grantees. This Actor turns that catalog into clean JSON ready for competitive intelligence, prior-art search, RAG pipelines, and research-trend analysis.

### Key Features

- **Full-text search** — Free-text query across titles, abstracts, and authors
- **Subject filtering** — Narrow to specific DOE subject categories (fusion, solar, fission, etc.)
- **Research org filter** — Pull records from a specific national lab (Oak Ridge, Argonne, NREL, LBNL, etc.)
- **Product-type filter** — Technical Reports, Journal Articles, Theses, Conference Papers, Patents
- **Year range filtering** — Filter publications by date range
- **No authentication required** — Public OSTI API, no keys needed
- **Rich metadata** — Authors, sponsor orgs, research orgs, DOE contract numbers, DOIs, subjects
- **Direct OSTI URL** — Link to the full record on osti.gov for each result

### Output Data Fields

| Field | Description |
|-------|-------------|
| `ostiId` | OSTI record identifier |
| `title` | Report title |
| `doi` | Digital Object Identifier |
| `publicationDate` | Publication date |
| `productType` | Report type (Technical Report, Journal Article, etc.) |
| `country` | Country of publication |
| `language` | Language |
| `authors` | List of author names |
| `sponsorOrgs` | DOE sponsoring organizations |
| `researchOrgs` | Research organizations (national labs, universities) |
| `subjects` | DOE subject categories |
| `doeContractNumber` | DOE contract / grant number |
| `description` | Abstract text |
| `fullTextUrl` | Link to the full text PDF (when available) |
| `ostiUrl` | Direct link to the OSTI record page |

### How to Scrape OSTI Scientific Reports

1. Open the **DOE OSTI Reports Scraper** on Apify Store
2. (Optional) Enter a search query (e.g., "small modular reactor")
3. (Optional) Filter by subject, research organization, product type, or year range
4. Set max results
5. Click **Start** — clean JSON is written to the default dataset

### Pricing

This Actor uses pay-per-result pricing. The OSTI API is fast and bandwidth-light, so compute costs per run are small. Typical 1,000-record run finishes in well under a minute.

### Use Cases

- **Competitive intelligence** — Track which national labs publish in your domain
- **Prior-art search** — Find DOE-funded prior art for patent prosecution
- **Grant analysis** — See which DOE contracts produce specific research outputs
- **RAG pipelines** — Build domain-specific knowledge bases on energy / physics / materials
- **Researcher discovery** — Identify domain experts and collaborators at national labs

### Legal & Disclaimer

This Actor only accesses publicly available metadata via the official OSTI public API. It does not bypass authentication or access controls. OSTI content is U.S. Government work and generally in the public domain in the United States. Users are responsible for compliance with OSTI terms of service and any third-party rights in linked full-text documents. This tool is provided "as is" without warranty of any kind.

# Actor input Schema

## `query` (type: `string`):

Free-text search across title, abstract, authors (OSTI 'q' param).
## `subject` (type: `string`):

Subject category filter (e.g., 'fusion energy', 'solar', 'nuclear').
## `researchOrg` (type: `string`):

Filter by research organization (e.g., 'Oak Ridge National Laboratory').
## `productType` (type: `string`):

Filter by product type (e.g., 'Technical Report', 'Journal Article', 'Thesis/Dissertation').
## `publicationYearFrom` (type: `string`):

Earliest publication year (YYYY).
## `publicationYearTo` (type: `string`):

Latest publication year (YYYY).
## `maxResults` (type: `integer`):

Maximum records to return (0 for unlimited up to OSTI page cap).

## Actor input object example

```json
{
  "query": "",
  "subject": "",
  "researchOrg": "",
  "productType": "",
  "publicationYearFrom": "",
  "publicationYearTo": "",
  "maxResults": 200
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("compute-edge/osti-scientific-reports-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("compute-edge/osti-scientific-reports-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call compute-edge/osti-scientific-reports-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=compute-edge/osti-scientific-reports-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "DOE OSTI Scientific Reports Scraper",
        "description": "Extract metadata for U.S. Department of Energy OSTI scientific & technical reports. Filter by subject, research organization, product type, and publication year. Clean JSON for prior-art search, RAG pipelines, grant tracking.",
        "version": "0.1",
        "x-build-id": "GvyAgnexHsgaQYcgr"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/compute-edge~osti-scientific-reports-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-compute-edge-osti-scientific-reports-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/compute-edge~osti-scientific-reports-scraper/runs": {
            "post": {
                "operationId": "runs-sync-compute-edge-osti-scientific-reports-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/compute-edge~osti-scientific-reports-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-compute-edge-osti-scientific-reports-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Free-text search across title, abstract, authors (OSTI 'q' param).",
                        "default": ""
                    },
                    "subject": {
                        "title": "Subject",
                        "type": "string",
                        "description": "Subject category filter (e.g., 'fusion energy', 'solar', 'nuclear').",
                        "default": ""
                    },
                    "researchOrg": {
                        "title": "Research Organization",
                        "type": "string",
                        "description": "Filter by research organization (e.g., 'Oak Ridge National Laboratory').",
                        "default": ""
                    },
                    "productType": {
                        "title": "Product Type",
                        "type": "string",
                        "description": "Filter by product type (e.g., 'Technical Report', 'Journal Article', 'Thesis/Dissertation').",
                        "default": ""
                    },
                    "publicationYearFrom": {
                        "title": "Publication Year From",
                        "type": "string",
                        "description": "Earliest publication year (YYYY).",
                        "default": ""
                    },
                    "publicationYearTo": {
                        "title": "Publication Year To",
                        "type": "string",
                        "description": "Latest publication year (YYYY).",
                        "default": ""
                    },
                    "maxResults": {
                        "title": "Max Results",
                        "minimum": 0,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum records to return (0 for unlimited up to OSTI page cap).",
                        "default": 200
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
