Similarity Graph From Embeddings
Pricing
Pay per usage
Similarity Graph From Embeddings
Under maintenanceBuilds a similarity graph from vector embeddings. Fetches vectors from URLs, computes pairwise cosine similarities using optimized linear algebra, and connects each point to its K nearest neighbors - revealing hidden clusters and relationships in your high-dimensional data.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Matej Hamas
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
12 days ago
Last modified
Categories
Share
Apify Actor that builds a similarity graph from vector embeddings using cosine similarity. It fetches vectors from provided URLs, computes pairwise cosine similarities, filters edges by configurable outgoing and incoming limits per node, and outputs a graph JSON.
How it works
- Fetch vectors — Downloads JSON data from each provided URL. Each URL must return a JSON object mapping IDs to float arrays:
{ "id1": [0.1, 0.2, ...], "id2": [0.3, 0.4, ...] }. - Validate — All vectors must have the same dimensionality. Duplicate IDs across URLs are not allowed.
- Compute similarities — Builds a full cosine similarity matrix using vectorized numpy operations (L2-normalize + single matrix multiply via BLAS).
- Filter edges — Applies global top-percentage threshold, then limits outgoing edges per node, then limits incoming edges per node. Each filter keeps only the highest-similarity edges using
argpartitionfor O(n) performance. - Build graph — Each vector becomes a node. Surviving edges become directed edges with cosine similarity as the weight.
- Store output — The graph is saved as
graph.jsonin the default key-value store. A link to the file is pushed to the default dataset and displayed on the output tab.
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
urls | string[] | Yes | — | List of URLs, each returning JSON of form { id: [float, float, ...] }. All vectors must have the same dimensionality. |
topPercentage | number | No | 100 | Keep only the top X% of all pairwise similarities (globally). Lower values produce sparser graphs. Applied before the per-node edge limits. |
maxOutgoingEdgesPerNode | integer | No | — | For each node, keep only the top K most similar neighbors as outgoing edges. If not set, all edges surviving the top percentage filter are kept. Applied before the incoming edges limit. |
maxIncomingEdgesPerNode | integer | No | — | For each node, keep only the top K highest-similarity incoming edges. If not set, incoming edges are not limited. Applied after the outgoing edges limit. |
keepAtLeastOneEdge | boolean | No | false | When enabled, each node always keeps its most similar neighbor regardless of other filtering. Prevents isolated nodes in the graph. |
Example input
{"urls": ["https://example.com/embeddings-part1.json","https://example.com/embeddings-part2.json"],"maxOutgoingEdgesPerNode": 10,"maxIncomingEdgesPerNode": 20}
Expected URL response format
Each URL must return a JSON object where keys are string IDs and values are arrays of floats (all the same length):
{"apple": [0.12, 0.85, 0.33, 0.67],"banana": [0.11, 0.82, 0.30, 0.71],"car": [0.90, 0.05, 0.88, 0.12]}
Output
Key-value store
The Actor stores a single file graph.json in the default key-value store. Example:
{"version": "1","nodes": [{ "id": "apple" },{ "id": "banana" },{ "id": "car" }],"edges": [{ "source": "apple", "target": "banana", "weight": 0.987 },{ "source": "banana", "target": "apple", "weight": 0.987 }]}
- Nodes — One per vector ID from the input data.
- Edges — Directed. Outgoing edges per node are limited by
maxOutgoingEdgesPerNode, incoming edges bymaxIncomingEdgesPerNode. Edge weight is the cosine similarity (0 to 1).
Graph JSON schema
The output graph.json conforms to the following JSON schema:
{"$schema": "http://json-schema.org/draft-07/schema#","title": "Similarity Graph","type": "object","required": ["version", "nodes", "edges"],"properties": {"version": {"type": "string","const": "1"},"nodes": {"type": "array","items": {"type": "object","required": ["id"],"properties": {"id": {"type": "string","description": "Vector ID from the input data."}}}},"edges": {"type": "array","items": {"type": "object","required": ["source", "target", "weight"],"properties": {"source": {"type": "string","description": "ID of the source node."},"target": {"type": "string","description": "ID of the target node."},"weight": {"type": "number","description": "Cosine similarity between source and target vectors."}}}}}}
Dataset
The default dataset contains a single record with the public URL of the graph JSON file:
{"graphUrl": "https://api.apify.com/v2/key-value-stores/<store-id>/records/graph.json"}

