Under maintenance

Pricing

Pay per usage

Try for free

Go to Apify Store

Similarity Graph From Embeddings

Under maintenance

Try for free

Builds a similarity graph from vector embeddings. Fetches vectors from URLs, computes pairwise cosine similarities using optimized linear algebra, and connects each point to its K nearest neighbors - revealing hidden clusters and relationships in your high-dimensional data.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Matej Hamas

Actor stats

Bookmarked

Total users

Monthly active users

12 days ago

Last modified

How it works

Fetch vectors — Downloads JSON data from each provided URL. Each URL must return a JSON object mapping IDs to float arrays: { "id1": [0.1, 0.2, ...], "id2": [0.3, 0.4, ...] }.
Validate — All vectors must have the same dimensionality. Duplicate IDs across URLs are not allowed.
Compute similarities — Builds a full cosine similarity matrix using vectorized numpy operations (L2-normalize + single matrix multiply via BLAS).
Filter edges — Applies global top-percentage threshold, then limits outgoing edges per node, then limits incoming edges per node. Each filter keeps only the highest-similarity edges using argpartition for O(n) performance.
Build graph — Each vector becomes a node. Surviving edges become directed edges with cosine similarity as the weight.
Store output — The graph is saved as graph.json in the default key-value store. A link to the file is pushed to the default dataset and displayed on the output tab.

Input

Field	Type	Required	Default	Description
`urls`	`string[]`	Yes	—	List of URLs, each returning JSON of form `{ id: [float, float, ...] }`. All vectors must have the same dimensionality.
`topPercentage`	`number`	No	`100`	Keep only the top X% of all pairwise similarities (globally). Lower values produce sparser graphs. Applied before the per-node edge limits.
`maxOutgoingEdgesPerNode`	`integer`	No	—	For each node, keep only the top K most similar neighbors as outgoing edges. If not set, all edges surviving the top percentage filter are kept. Applied before the incoming edges limit.
`maxIncomingEdgesPerNode`	`integer`	No	—	For each node, keep only the top K highest-similarity incoming edges. If not set, incoming edges are not limited. Applied after the outgoing edges limit.
`keepAtLeastOneEdge`	`boolean`	No	`false`	When enabled, each node always keeps its most similar neighbor regardless of other filtering. Prevents isolated nodes in the graph.

Example input

{
    "urls": [
        "https://example.com/embeddings-part1.json",
        "https://example.com/embeddings-part2.json"
    ],
    "maxOutgoingEdgesPerNode": 10,
    "maxIncomingEdgesPerNode": 20
}

Expected URL response format

Each URL must return a JSON object where keys are string IDs and values are arrays of floats (all the same length):

{
    "apple": [0.12, 0.85, 0.33, 0.67],
    "banana": [0.11, 0.82, 0.30, 0.71],
    "car": [0.90, 0.05, 0.88, 0.12]
}

Output

Key-value store

The Actor stores a single file graph.json in the default key-value store. Example:

{
    "version": "1",
    "nodes": [
        { "id": "apple" },
        { "id": "banana" },
        { "id": "car" }
    ],
    "edges": [
        { "source": "apple", "target": "banana", "weight": 0.987 },
        { "source": "banana", "target": "apple", "weight": 0.987 }
    ]
}

Nodes — One per vector ID from the input data.
Edges — Directed. Outgoing edges per node are limited by maxOutgoingEdgesPerNode, incoming edges by maxIncomingEdgesPerNode. Edge weight is the cosine similarity (0 to 1).

Graph JSON schema

The output graph.json conforms to the following JSON schema:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Similarity Graph",
    "type": "object",
    "required": ["version", "nodes", "edges"],
    "properties": {
        "version": {
            "type": "string",
            "const": "1"
        },
        "nodes": {
            "type": "array",
            "items": {
                "type": "object",
                "required": ["id"],
                "properties": {
                    "id": {
                        "type": "string",
                        "description": "Vector ID from the input data."
                    }
                }
            }
        },
        "edges": {
            "type": "array",
            "items": {
                "type": "object",
                "required": ["source", "target", "weight"],
                "properties": {
                    "source": {
                        "type": "string",
                        "description": "ID of the source node."
                    },
                    "target": {
                        "type": "string",
                        "description": "ID of the target node."
                    },
                    "weight": {
                        "type": "number",
                        "description": "Cosine similarity between source and target vectors."
                    }
                }
            }
        }
    }
}

Dataset

The default dataset contains a single record with the public URL of the graph JSON file:

{
    "graphUrl": "https://api.apify.com/v2/key-value-stores/<store-id>/records/graph.json"
}

Vector Embeddings Generator

mhamas/vector-embeddings-generator

Turn any text into semantic embedding vectors — perfect for search, similarity matching, clustering, and recommendations. Just feed your texts as JSON or a URL and get 768-dimensional vectors back. Powered by nomic-embed-text-v1.5 with 8K token context. No GPU needed.

Matej Hamas

Rag Embedding Generator

labrat011/rag-embedding-generator

Generate vector embeddings from text or chunked datasets using OpenAI or Cohere. Chains with RAG Content Chunker for end-to-end RAG pipelines. Outputs raw vectors ready for any vector database.

mick_

Website Links Graph Generator

crawlerbros/web-link-graph-visualizer

Creates an oriented graph visualizing links between webpages. Outputs: graph.png (visual network diagram) and graph.json (structured data) saved to Key-Value Store, plus detailed dataset of all crawled pages. Configure depth, boundaries, and layout.

Crawler Bros

5.0

Product Matching Vectorizer

tri_angle/product-matching-vectorizer

Builds a FAISS vector database from products in an Apify dataset using an ONNX embedding model. The resulting index is saved to a Key-Value Store for fast similarity search. After uploading your dataset to the vector database, use our E-commerce Product Matching Tool to find matching products.

Tri⟁angle

Docs to Markdown + AI Embeddings → Vector DB Crawler

badruddeen/docs-to-markdown-ai-embeddings---vector-db-crawler

Turn any documentation site into clean Markdown, intelligently chunked content with embeddings (Azure/OpenAI), and directly upsert into MongoDB Atlas, Pinecone, Weaviate, Qdrant, or Milvus — ready for RAG, AI assistants, and semantic search in minutes.

Badruddeen Naseem

5.0

Linear Mcp

red.cars/linear-mcp

AutomateLab

linear mcp nexus

adept-training-center/linear-mcp-nexus

Linear

Adept Training Center

RAG Pipeline

labrat011/rag-pipeline

One-click RAG pipeline: chunks text, generates embeddings, and stores vectors in Pinecone or Qdrant. Provide your content and API keys -- the orchestrator handles the rest.

mick_

Google Knowledge Graph

seemuapps/google-knowledge-graph

Enrich a list of entity names (people, companies, places, things) with metadata from the Google Knowledge Graph.

Andrew

Rag Vector Store Writer

labrat011/rag-vector-store-writer

Apify Actor that writes embedding vectors to Pinecone or Qdrant vector databases. Chains directly with RAG Embedding Generator output or accepts raw vectors with metadata. Handles batching, retries, collection creation, metadata mapping, and ID generation. Bring your own vector DB API key.

mick_