# ETF Holdings Overlap Intelligence Scraper (`leafy-dev-jr/etf-holdings-overlap-intelligence-scraper`) Actor

Reveal hidden ETF overlap by scraping holdings data, normalizing fund exposures, and showing which stocks your ETFs secretly share.

- **URL**: https://apify.com/leafy-dev-jr/etf-holdings-overlap-intelligence-scraper.md
- **Developed by:** [Leafy](https://apify.com/leafy-dev-jr) (community)
- **Categories:** Other, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## ETF Holdings & Overlap Intelligence Scraper

Collect ETF holdings from **public issuer sources**, normalize the many different
issuer formats into **one clean schema**, and calculate **ETF overlap, shared
holdings, and hidden concentration exposure**.

You give it a list of ETF tickers. It returns:

- Every holding inside each ETF (one dataset row per holding).
- Which stocks are shared across your ETFs.
- How much overlap exists between each pair of ETFs.
- Your true **combined weighted exposure** to each underlying company across your whole ETF portfolio.
- Basic ETF metadata when the source exposes it.

> ⚠️ **Disclaimer** — This actor provides ETF holdings data processing and overlap
> calculations **for research purposes only**. It does **not** provide financial
> advice, investment recommendations, or predictions. Overlap and exposure figures
> are **estimates** based on each issuer's publicly reported holdings.

---

### Who it's for

- Beginner and intermediate investors who want to see if their ETFs secretly own the same companies.
- ETF investors checking for hidden concentration (e.g. how much NVIDIA you *really* own across VOO + QQQ + SPY).
- Financial bloggers and personal-finance creators.
- Research analysts and developers building ETF comparison tools.

---

### What it does (the flow)

1. You provide ETF tickers.
2. For each ticker it picks the best public source (issuer-first).
3. It downloads the issuer's public holdings file (CSV / XLSX) or public data endpoint.
4. It normalizes every holding into one consistent schema.
5. It writes one dataset row per holding.
6. It calculates **pairwise overlap** between every pair of ETFs.
7. It calculates **combined weighted exposure** across all your ETFs, using your portfolio allocation.
8. It saves summary objects to the key-value store.
9. It records what worked, what failed, and which source was used per ticker.

A single failing ticker never crashes the run — it is reported in `FAILED_TICKERS` and the rest continue.

---

### Supported sources in this MVP

| Issuer | Method | Status |
| --- | --- | --- |
| **Vanguard** | Public portfolio-holdings JSON endpoint | ✅ Working (VOO, VTI, VUG, …) — top 500 holdings |
| **State Street / SPDR** | Public daily-holdings XLSX download | ✅ Working (SPY, XLF, XLK, DIA, …) |
| **Invesco** (bonus) | Public holdings JSON API (`dng-api.invesco.com`) | ✅ Working (QQQ, QQQM, RSP) |
| **Schwab** (bonus) | Public "all holdings" page (server-rendered HTML table) | ✅ Working (SCHD, SCHG, SCHB, …) |
| **iShares / BlackRock** | Public holdings CSV (product page resolved via the public iShares product screener) | ⚠️ Works with a US residential proxy; the CSV endpoint is geo/consent-gated on datacenter IPs |
| **SEC N-PORT** | Official monthly filings | 🚧 Planned — returns `not_implemented` |

**Source strategy:** issuer-first. The actor first tries the issuer it recognizes
for a ticker (see the built-in ticker map), then probes the other adapters as a
fallback. It deliberately does **not** use ETF.com, ETFdb, Yahoo, or Nasdaq as a
holdings source — the issuer is the authoritative source.

#### About the iShares geo/consent gating

iShares serves a location/consent **HTML interstitial** instead of the holdings
CSV when the request comes from a non-US or datacenter IP. The actor **detects**
this and reports `source_blocked` rather than saving garbage — it does **not**
attempt to bypass the consent gate, CAPTCHAs, logins, or bot protection. Running
on Apify with a **US residential proxy** (`RESIDENTIAL`, country `US`) is the
recommended way to reach iShares. Vanguard, SPDR, Invesco and Schwab all work on
the default proxy.

---

### Calculations performed

#### Pairwise overlap (`PAIRWISE_OVERLAP`)

For every pair of ETFs (A, B):

- `sharedHoldingsCount` — holdings appearing in both.
- `overlapByCountPercent` — `sharedHoldingsCount / etfAHoldingsCount * 100`.
- `overlapWeightInEtfA` — sum of A's weights for holdings also in B.
- `overlapWeightInEtfB` — sum of B's weights for holdings also in A.
- `estimatedOverlapWeight` — the average of the two weight overlaps. **This is an estimate**, not a precise financial overlap score.
- `topSharedHoldings` — the most heavily shared names, sorted by combined weight.

#### Combined weighted exposure (`COMBINED_EXPOSURE`)

Your true exposure to each underlying company across your whole ETF portfolio:

````

combined exposure(holding) = Σ  holdingWeightInEtf% × portfolioAllocation% / 100
(over every ETF you hold)

````

Example — holding at 7.25% of an ETF you allocate 40% of your portfolio to
contributes `7.25 × 0.40 = 2.90%` combined exposure.

Portfolio weights come from the weight column of the `portfolio` input. If you
don't provide any, all successful ETFs are weighted **equally**. Weights that
don't sum to 100 are **normalized**, and **failed tickers are ignored** (the
remaining weights are re-normalized to 100).

#### Holding matching

Holdings are matched across ETFs by a `normalizedHoldingKey`, in priority order:
**ticker → CUSIP → ISIN → cleaned company name**. The cleaner strips share-class
tokens and legal suffixes so `NVIDIA CORP` and `NVIDIA CORPORATION` match.

---

### Input

| Field | Type | Default | Description |
| --- | --- | --- | --- |
| `portfolio` | array | *(required)* | Your ETFs. In the UI this is a two-column list: **ticker** on the left, **weight** (optional) on the right. Stored as `[{ "key": "VOO", "value": "40" }, …]`. |
| `includeHoldings` | boolean | `true` | Fetch and save each ETF's holdings. |
| `includeMetadata` | boolean | `true` | Save basic ETF metadata when available. |
| `calculateOverlap` | boolean | `true` | Calculate pairwise overlap + combined exposure. |
| `maxHoldingsPerEtf` | integer | `0` | `0` = all holdings; `>0` = top N by weight per ETF. |
| `sourcePreference` | enum | `issuer` | `issuer` \| `auto` (both issuer-first) \| `sec_nport_later` (planned; falls back to issuer). |
| `proxyConfiguration` | object | `{ useApifyProxy: true }` | Proxy settings. A US residential proxy is recommended for iShares. |
| `debugMode` | boolean | `false` | Log source URLs, detected file types, and detected header columns. |

The ETF list and portfolio weights are a **single input** — a two-column list in
the UI (ticker | weight). Leave the weight column blank to weight ETFs equally,
or fill it to set allocations (they need not sum to 100 — they're normalized).

The parser also accepts a plain string list (`["VOO", "QQQ:30"]`) and the legacy
separate `tickers` (array) + `portfolioWeights` (object) inputs, for backward
compatibility and easy API use. `portfolio` is the recommended input.

#### Example input

```json
{
  "portfolio": [
    { "key": "VOO", "value": "40" },
    { "key": "QQQ", "value": "30" },
    { "key": "SCHD", "value": "30" }
  ],
  "includeHoldings": true,
  "includeMetadata": true,
  "calculateOverlap": true,
  "maxHoldingsPerEtf": 0,
  "sourcePreference": "issuer",
  "proxyConfiguration": { "useApifyProxy": true },
  "debugMode": false
}
````

***

### Output

#### Dataset — one row per holding

```json
{
  "etfTicker": "VOO",
  "etfName": "Vanguard S&P 500 ETF",
  "issuer": "Vanguard",
  "holdingTicker": "NVDA",
  "holdingName": "NVIDIA Corp.",
  "holdingIdentifier": null,
  "cusip": "67066G104",
  "isin": "US67066G1040",
  "sector": null,
  "assetClass": null,
  "country": null,
  "weight": 7.89,
  "shares": 636185341,
  "marketValue": 134324172898.74,
  "sourceType": "issuer",
  "sourceName": "Vanguard",
  "sourceUrl": "https://investor.vanguard.com/investment-products/etfs/profile/api/VOO/portfolio-holding/stock",
  "asOfDate": "2026-05-31",
  "rankInEtf": 1,
  "rawHoldingName": "NVIDIA Corp.",
  "rawHoldingTicker": "NVDA",
  "normalizedHoldingKey": "NVDA",
  "scrapedAt": "2026-07-03T00:26:38.807Z"
}
```

Unavailable fields are `null` — the actor never fabricates data.

#### Key-value store outputs

| Key | Contents |
| --- | --- |
| `SUMMARY` | High-level run summary: successes, failures, portfolio weights used, highest-overlap pair, top combined exposures, source notes. |
| `ETF_METADATA` | Array of per-ETF metadata (fund name, issuer, holdings count, as-of date, source URL). |
| `OVERLAP_MATRIX` | Compact matrix of `estimatedOverlapWeight` and `sharedHoldingsCount` keyed by ticker. |
| `PAIRWISE_OVERLAP` | Array of per-pair overlap objects (see above). |
| `COMBINED_EXPOSURE` | Array of holdings sorted by combined weighted exposure descending. |
| `FAILED_TICKERS` | Array of `{ ticker, errorType, errorMessage, sourceTried }` for anything that couldn't be fetched. |

***

### How to run locally

Requires Node.js 18+.

```bash
npm install

## Provide input (local Apify storage):
mkdir -p storage/key_value_stores/default
cat > storage/key_value_stores/default/INPUT.json <<'JSON'
{
  "portfolio": ["VOO:60", "SPY:40"],
  "proxyConfiguration": { "useApifyProxy": false },
  "debugMode": true
}
JSON

npm start
```

Holdings appear in `storage/datasets/default/`, and the summary files in
`storage/key_value_stores/default/`.

Run the analysis unit tests (offline, deterministic):

```bash
npm run test:analysis
```

### How to run on Apify

1. Push the actor: `apify push` (or import this repo on the Apify console).
2. Open the actor, fill in the input form (tickers, portfolio weights).
3. For iShares / Invesco coverage, set **Proxy** to Apify Proxy with the
   `RESIDENTIAL` group and country `US`.
4. Run. Read the **Dataset** tab for holdings and the **Storage / Key-value store**
   tab for `SUMMARY`, `PAIRWISE_OVERLAP`, `COMBINED_EXPOSURE`, etc.

***

### Project structure

```
src/
  main.js                     # Orchestrator: input → fetch → normalize → analyze → save
  sources/
    issuerRouter.js           # Ticker → issuer routing + probing
    ishares.js                # iShares CSV (product resolved via public screener)
    spdr.js                   # SPDR daily-holdings XLSX
    vanguard.js               # Vanguard public holdings JSON
    invesco.js                # Invesco holdings JSON API (bonus issuer, e.g. QQQ)
    schwab.js                 # Schwab server-rendered holdings table (bonus, e.g. SCHD)
    secNport.js               # SEC N-PORT placeholder (not_implemented)
  normalize/
    normalizeHolding.js       # Raw holding → canonical schema + matching key
    normalizeTicker.js        # Ticker cleanup + de-dup
    normalizePrice.js         # Number/weight parsing helpers
  analysis/
    calculateOverlap.js       # Pairwise overlap + overlap matrix
    calculateCombinedExposure.js  # Portfolio-weighted exposure
    calculateSummary.js       # SUMMARY builder
  utils/
    http.js                   # got-scraping client + error classification + HTML detection
    csv.js                    # CSV → rows
    excel.js                  # XLSX → rows
    html.js                   # HTML table → rows (for server-rendered holdings)
    table.js                  # Dynamic header detection + column mapping
    dates.js                  # Loose date → ISO
    logging.js                # Logger wrapper
INPUT_SCHEMA.json
Dockerfile
test/analysis.test.js         # Deterministic overlap/exposure/normalization tests
```

The source-adapter architecture means adding a new issuer is just a new file in
`src/sources/` plus a routing entry — the normalizer and analysis layers are shared.

***

### Known limitations

- Holdings availability depends on each issuer's public data. iShares gates its CSV behind a location/consent interstitial (see above); a US residential proxy is recommended for it.
- Data can be **delayed** depending on the source — Vanguard's public feed is typically month-end; SPDR is daily; Invesco is monthly; Schwab is a few days lagged.
- **Vanguard's public endpoint returns only the top ~500 holdings.** For concentrated funds (VOO ≈ 500 holdings) this is effectively complete, but for broad funds (VTI ≈ 3,600 holdings) overlap is measured on the top 500 (~85–90% of fund weight).
- Some ETFs are **not supported** in this MVP. Covered issuers: Vanguard, SPDR, Invesco, Schwab, and iShares (with proxy). First Trust, VanEck, Global X, WisdomTree, ProShares, etc. are future work.
- Schwab publishes market value abbreviated (e.g. "$4.2B"); those are parsed to approximate numbers, while weight and share counts are exact.
- Some holdings lack a ticker, sector, or weight in the source file; those fields are `null` rather than guessed.
- Different issuers publish different fields (e.g. SPY's file omits sector and market value; Vanguard's feed omits sector) — normalized output reflects only what the source provides.
- **Overlap and combined exposure are estimates** based on reported holdings and weights, not a precise financial overlap score.

***

### Compliance notes

- Uses only **public** issuer data and public downloadable holdings files.
- Does **not** scrape private data, bypass paywalls, logins, CAPTCHAs, or bot protection.
- Does **not** use paid APIs. (Morningstar and similar licensed sources are intentionally avoided.)
- Data sources and as-of dates are labeled on every row (`sourceName`, `sourceUrl`, `asOfDate`).
- Output is **data and calculations only — not financial advice.**

***

### Future improvements (not in this MVP)

- Full SEC N-PORT historical holdings parser.
- Sector / country / asset-class overlap breakdowns.
- Expense-ratio and dividend-yield comparison.
- More issuers (First Trust, VanEck, Global X, WisdomTree, ProShares).
- Enriching iShares metadata (expense ratio, AUM, inception date) from the product screener even when the holdings CSV is gated.
- Recurring monitoring and change alerts.

# Actor input Schema

## `portfolio` (type: `array`):

Add one ETF per row: ticker on the left, portfolio weight on the right. Enter 2+ ETFs so overlap can be calculated. The weight is optional — leave it blank to weight all ETFs equally. Weights don't need to add up to 100 (they're normalized automatically). Vanguard, SPDR, Invesco and Schwab funds work on the default proxy; iShares (IVV, IWM, AGG…) needs a US residential proxy.

## `includeHoldings` (type: `boolean`):

When true, fetch and save the individual holdings of each ETF.

## `includeMetadata` (type: `boolean`):

When true, collect basic ETF metadata (fund name, issuer, as-of date, holdings count) when it is available in the public source.

## `calculateOverlap` (type: `boolean`):

When true, calculate pairwise ETF overlap and combined weighted exposure across all input ETFs.

## `maxHoldingsPerEtf` (type: `integer`):

If 0, return all available holdings. If greater than 0, keep only the top N holdings per ETF after sorting by weight descending.

## `sourcePreference` (type: `string`):

Which data strategy to prefer. 'issuer' and 'auto' both use public issuer sources first. 'sec\_nport\_later' is reserved for a future SEC N-PORT parser and currently behaves like 'issuer'.

## `proxyConfiguration` (type: `object`):

Proxy settings. Using Apify Proxy is recommended for reliable access to issuer download endpoints.

## `debugMode` (type: `boolean`):

When true, log source URLs attempted, detected file types, and detected header columns.

## Actor input object example

```json
{
  "portfolio": [
    {
      "key": "VOO",
      "value": "40"
    },
    {
      "key": "QQQ",
      "value": "30"
    },
    {
      "key": "SCHD",
      "value": "30"
    }
  ],
  "includeHoldings": true,
  "includeMetadata": true,
  "calculateOverlap": true,
  "maxHoldingsPerEtf": 0,
  "sourcePreference": "issuer",
  "proxyConfiguration": {
    "useApifyProxy": true
  },
  "debugMode": false
}
```

# Actor output Schema

## `holdings` (type: `string`):

One row per ETF holding.

## `summary` (type: `string`):

High-level summary of the run.

## `pairwiseOverlap` (type: `string`):

Overlap between every pair of ETFs.

## `combinedExposure` (type: `string`):

Portfolio-weighted exposure to each underlying holding.

## `etfMetadata` (type: `string`):

Per-ETF metadata.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "portfolio": [
        {
            "key": "VOO",
            "value": "40"
        },
        {
            "key": "QQQ",
            "value": "30"
        },
        {
            "key": "SCHD",
            "value": "30"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("leafy-dev-jr/etf-holdings-overlap-intelligence-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "portfolio": [
        {
            "key": "VOO",
            "value": "40",
        },
        {
            "key": "QQQ",
            "value": "30",
        },
        {
            "key": "SCHD",
            "value": "30",
        },
    ] }

# Run the Actor and wait for it to finish
run = client.actor("leafy-dev-jr/etf-holdings-overlap-intelligence-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "portfolio": [
    {
      "key": "VOO",
      "value": "40"
    },
    {
      "key": "QQQ",
      "value": "30"
    },
    {
      "key": "SCHD",
      "value": "30"
    }
  ]
}' |
apify call leafy-dev-jr/etf-holdings-overlap-intelligence-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=leafy-dev-jr/etf-holdings-overlap-intelligence-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "ETF Holdings Overlap Intelligence Scraper",
        "description": "Reveal hidden ETF overlap by scraping holdings data, normalizing fund exposures, and showing which stocks your ETFs secretly share.",
        "version": "0.1",
        "x-build-id": "ZUArFArUZ0YiAnobe"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/leafy-dev-jr~etf-holdings-overlap-intelligence-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-leafy-dev-jr-etf-holdings-overlap-intelligence-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/leafy-dev-jr~etf-holdings-overlap-intelligence-scraper/runs": {
            "post": {
                "operationId": "runs-sync-leafy-dev-jr-etf-holdings-overlap-intelligence-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/leafy-dev-jr~etf-holdings-overlap-intelligence-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-leafy-dev-jr-etf-holdings-overlap-intelligence-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "portfolio"
                ],
                "properties": {
                    "portfolio": {
                        "title": "ETF portfolio",
                        "type": "array",
                        "description": "Add one ETF per row: ticker on the left, portfolio weight on the right. Enter 2+ ETFs so overlap can be calculated. The weight is optional — leave it blank to weight all ETFs equally. Weights don't need to add up to 100 (they're normalized automatically). Vanguard, SPDR, Invesco and Schwab funds work on the default proxy; iShares (IVV, IWM, AGG…) needs a US residential proxy.",
                        "items": {
                            "type": "object",
                            "required": [
                                "key",
                                "value"
                            ],
                            "properties": {
                                "key": {
                                    "type": "string",
                                    "title": "Key"
                                },
                                "value": {
                                    "type": "string",
                                    "title": "Value"
                                }
                            }
                        }
                    },
                    "includeHoldings": {
                        "title": "Include holdings",
                        "type": "boolean",
                        "description": "When true, fetch and save the individual holdings of each ETF.",
                        "default": true
                    },
                    "includeMetadata": {
                        "title": "Include metadata",
                        "type": "boolean",
                        "description": "When true, collect basic ETF metadata (fund name, issuer, as-of date, holdings count) when it is available in the public source.",
                        "default": true
                    },
                    "calculateOverlap": {
                        "title": "Calculate overlap",
                        "type": "boolean",
                        "description": "When true, calculate pairwise ETF overlap and combined weighted exposure across all input ETFs.",
                        "default": true
                    },
                    "maxHoldingsPerEtf": {
                        "title": "Max holdings per ETF",
                        "minimum": 0,
                        "type": "integer",
                        "description": "If 0, return all available holdings. If greater than 0, keep only the top N holdings per ETF after sorting by weight descending.",
                        "default": 0
                    },
                    "sourcePreference": {
                        "title": "Source preference",
                        "enum": [
                            "issuer",
                            "auto",
                            "sec_nport_later"
                        ],
                        "type": "string",
                        "description": "Which data strategy to prefer. 'issuer' and 'auto' both use public issuer sources first. 'sec_nport_later' is reserved for a future SEC N-PORT parser and currently behaves like 'issuer'.",
                        "default": "issuer"
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Proxy settings. Using Apify Proxy is recommended for reliable access to issuer download endpoints.",
                        "default": {
                            "useApifyProxy": true
                        }
                    },
                    "debugMode": {
                        "title": "Debug mode",
                        "type": "boolean",
                        "description": "When true, log source URLs attempted, detected file types, and detected header columns.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
