CogniGraph Weaver
Pricing
Pay per usage
CogniGraph Weaver
A powerful Apify Actor that converts web content into interactive knowledge graphs using artificial intelligence. This Python-based web crawler and AI system extracts content from websites, analyzes it with LLMs, and generates comprehensive knowledge graphs with learning paths.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Enrique Meza
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
A powerful Apify Actor that converts web content into interactive knowledge graphs using artificial intelligence. This Python-based web crawler and AI system extracts content from websites, analyzes it with LLMs, and generates comprehensive knowledge graphs with learning paths.
๐ Features
- Web Crawling: Extracts content from multiple web pages using
requestsandBeautifulSoup - AI-Powered Knowledge Graph Generation: Uses OpenAI or OpenRouter APIs to create structured knowledge graphs
- Graph Analysis: Analyzes graph topology, centrality, and connectivity
- Learning Path Generation: Automatically generates optimized learning paths through the knowledge graph
- Multilingual Support: Supports English, Spanish, French, German, and more
- Markdown Documentation: Generates comprehensive documentation in Markdown format
- Validation & Error Handling: Robust input validation and comprehensive error reporting
๐ Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ CogniGraph Weaver โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ 1. Input Validation โ 2. API Keys Config โ 3. Crawling โโ โโ 4. AI Graph Generation โ 5. Graph Analysis โ 6. Docs โโ โโ 7. Save Results โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Components
-
CrawlerService (
services/crawler_service.py)- Web content extraction
- HTML parsing with BeautifulSoup
- Metadata collection
- Content aggregation
-
AIService (
services/ai_service.py)- OpenAI API integration
- OpenRouter API integration
- Knowledge graph generation
- JSON response parsing
-
GraphService (
services/graph_service.py)- Graph analysis (centrality, density, components)
- Learning path generation using Dijkstra's algorithm
- Graph traversal algorithms
- Statistical analysis
-
Utilities
- Markdown Generator (
utils/markdown_generator.py): Documentation generation - Validators (
utils/validators.py): Input, graph, and path validation - Translations (
utils/translations.py): Multilingual support
- Markdown Generator (
๐ ๏ธ Installation
Prerequisites
- Python 3.11+
- pip
- Apify account and API token
Local Development
# Clone the repositorycd cogni# Install dependenciespip install -r requirements.txt# Create a .env file with your API keyscp .env.example .env# Edit .env and add your OpenAI/OpenRouter API keys# Run locallypython3 main.py
Deployment to Apify
# Login to Apifyapify login# Deploy the actorapify push# Run the actorapify call my-actor-2 --input='{"startUrls": [{"url": "https://example.com"}], "maxPages": 1}'
๐ API Keys
The actor requires at least one AI provider API key:
-
OpenAI API Key (for OpenAI provider)
- Get from: https://platform.openai.com/api-keys
- Environment variable:
OPENAI_API_KEY
-
OpenRouter API Key (for OpenRouter provider)
- Get from: https://openrouter.ai/keys
- Environment variable:
OPENROUTER_API_KEY
๐ฅ Input Schema
{"startUrls": [{"url": "https://example.com"}],"maxPages": 5,"outputLanguage": "en","aiProvider": "openrouter","openaiModel": "gpt-oss-20b","useBrowser": false,"generatePng": true,"generateHtml": true,"generateSvg": false,"generatePlotly": false,"pngWidth": 1200,"pngHeight": 800,"pngDpi": 100}
Parameters
-
startUrls (array, required): List of URLs to crawl
url(string): The URL to fetch
-
maxPages (integer, default: 5): Maximum number of pages to crawl
-
outputLanguage (string, default: "en"): Language for output
- Supported: "en", "es", "fr", "de", "pt", "it", "ja", "zh", "ko", "ar"
-
aiProvider (string, default: "openrouter"): AI provider
- Options: "openai", "openrouter"
-
openaiModel (string, default: "gpt-oss-20b"): Model to use
- Examples: "gpt-4", "gpt-3.5-turbo", "gpt-oss-20b"
-
useBrowser (boolean, default: false): Whether to use browser rendering
- Note: Currently not implemented, uses simple HTTP requests
Visualization Parameters
-
generatePng (boolean, default: true): Generate PNG image of the knowledge graph
- Creates a static visualization with node colors and relationships
-
generateHtml (boolean, default: true): Generate interactive HTML visualization
- Creates a fully interactive graph using PyVis with zoom, pan, and hover features
-
generateSvg (boolean, default: false): Generate SVG vector image
- Creates a scalable vector graphic suitable for high-resolution displays
-
generatePlotly (boolean, default: false): Generate Plotly interactive chart
- Creates an advanced interactive visualization with Plotly
-
pngWidth (integer, default: 1200): PNG image width in pixels
-
pngHeight (integer, default: 800): PNG image height in pixels
-
pngDpi (integer, default: 100): PNG image DPI (dots per inch)
๐ค Understanding the Outputs
CogniGraph Weaver generates 6 different datasets to provide you with comprehensive insights. Each output serves a specific purpose and is designed for different use cases.
๐ Output Datasets Overview
| Dataset | Purpose | Best For |
|---|---|---|
| Knowledge Graph (JSON) | Structured data | Developers, Data analysis, Visualization |
| Knowledge Graph (Markdown) | Human-readable report | End users, Documentation, Sharing |
| Visualizations | Graph images & interactive charts | Presentations, Reports, Exploration |
| Learning Paths (JSON) | Curated learning sequences | Educators, Students, Training |
| Analysis (JSON) | Graph statistics & insights | Researchers, Advanced analysis |
| Metadata (JSON) | Processing summary | Monitoring, Debugging, Overview |
๐ Detailed Output Breakdown
1. Knowledge Graph (JSON) - For Developers & Data Scientists
Why this matters: This is the core structured data that represents the extracted knowledge in a machine-readable format.
Use cases:
- Import into graph databases (Neo4j, ArangoDB)
- Create interactive visualizations (D3.js, vis.js, Cytoscape)
- Build recommendation systems
- Perform advanced graph analysis
- Train ML models on knowledge structures
Structure:
{"type": "knowledge_graph","format": "json","data": {"nodes": [{"id": "unique_identifier","label": "Concept Name","description": "Detailed explanation of the concept","type": "concept|term|topic","metadata": {"source": "which page it came from","importance": 0.85}}],"edges": [{"source": "node_id_1","target": "node_id_2","relationship": "is_a|part_of|relates_to|depends_on|causes","weight": 0.9,"description": "How these concepts relate"}]}}
Example use with Python:
import jsonfrom apify import ApifyClient# Load the knowledge graphclient = ApifyClient.init(token='YOUR_API_TOKEN')dataset = client.dataset('DATASET_ID')knowledge_graph = next(item for item in dataset.iterate_items()if item['type'] == 'knowledge_graph' and item['format'] == 'json')# Analyze centralitynodes = knowledge_graph['data']['nodes']edges = knowledge_graph['data']['edges']print(f"Found {len(nodes)} concepts with {len(edges)} relationships")
2. Knowledge Graph (Markdown) - For End Users & Documentation
Why this matters: A human-readable, beautifully formatted document that tells the complete story of what was extracted and analyzed.
Use cases:
- Share findings with non-technical stakeholders
- Create documentation for projects
- Reference materials for learning
- Report generation
- Print or export to PDF
What's included:
# CogniGraph Weaver Analysis ReportGenerated: 2025-11-06## Executive Summary- Graph contains 15 concepts with 23 relationships- Density: 0.21 (moderately connected)- Main topic: Machine Learning fundamentals## Knowledge Graph Structure### Nodes (Concepts)- **Machine Learning** (Central concept)Description: A field of computer science...Type: conceptConnected to: 7 other concepts- **Supervised Learning**Description: Learning from labeled data...Type: conceptConnected to: 5 other concepts### Relationships- Machine Learning โ **is_a** โ Artificial Intelligence (weight: 0.95)- Supervised Learning โ **part_of** โ Machine Learning (weight: 0.88)...## Learning Paths### Path 1: Beginner Introduction (45 minutes)1. Start with: Artificial Intelligence2. Then: Machine Learning3. Next: Supervised Learning4. Finally: Neural Networks### Path 2: Core Concepts (30 minutes)1. Start with: Data2. Then: Algorithms3. Next: Training4. Finally: Models## Graph Analysis- Most connected concepts: Machine Learning (8 connections)- Root concepts (prerequisites): Artificial Intelligence- Leaf concepts (end results): Deep Learning, Reinforcement Learning- Average path length: 3.2 concepts
How to access:
- Download directly from Apify console
- Parse from dataset and save to file
- Use in CI/CD to generate automatic reports
3. Visualizations - For Presentations & Exploration
Why this matters: Visual representations make complex knowledge graphs easy to understand, share, and explore. Different visualization formats serve different needs - from static reports to interactive exploration.
Use cases:
- Create presentations with visual knowledge maps
- Generate reports with graph images
- Interactive exploration of knowledge structures
- Share findings with non-technical stakeholders
- Embed in websites or documentation
- Print high-resolution diagrams
Available Formats:
PNG Image (Base64-encoded)
- Static visualization of the entire knowledge graph
- Color-coded nodes by type (concepts, terms, topics)
- Edge weights shown by line thickness
- Perfect for: Reports, presentations, printing
- Example access:
visualization = next(item for item in dataset.iterate_items()if item['type'] == 'visualization' and item['format'] == 'png')png_data = base64.b64decode(visualization['data'])with open('knowledge_graph.png', 'wb') as f:f.write(png_data)
Interactive HTML (PyVis)
- Fully interactive graph visualization
- Zoom, pan, and drag to explore
- Hover over nodes to see descriptions
- Click and drag nodes to reorganize
- Perfect for: Web embedding, interactive reports, exploration
- Example access:
visualization = next(item for item in dataset.iterate_items()if item['type'] == 'visualization' and item['format'] == 'html')with open('knowledge_graph.html', 'w') as f:f.write(visualization['data'])
SVG Vector Graphic
- Scalable vector format for high-resolution displays
- Infinitely scalable without quality loss
- Perfect for: Print materials, high-DPI screens, professional diagrams
- Example access:
visualization = next(item for item in dataset.iterate_items()if item['type'] == 'visualization' and item['format'] == 'svg')with open('knowledge_graph.svg', 'w') as f:f.write(visualization['data'])
Plotly Interactive Chart
- Advanced interactive visualization
- Enhanced hover information and controls
- Statistical overlays and metrics
- Perfect for: Data analysis, research, technical exploration
- Example access:
visualization = next(item for item in dataset.iterate_items()if item['type'] == 'visualization' and item['format'] == 'plotly')with open('knowledge_graph_plotly.html', 'w') as f:f.write(visualization['data'])
Visualization Metadata: Each visualization includes metadata with:
widthandheight: Image dimensions in pixelsdpi: Dots per inch (for PNG)nodes: Number of nodes in the graphedges: Number of edges in the graph
Performance Tips:
- PNG generation is fastest and most reliable
- HTML visualizations are best for interactive exploration
- SVG is best for print or high-resolution needs
- Plotly offers the most advanced features but may be slower for large graphs
4. Learning Paths (JSON) - For Educators & Students
Why this matters: This is the most valuable output for learning. AI automatically curates the optimal sequence to learn concepts based on their relationships in the knowledge graph.
Use cases:
- Create curricula for courses
- Design training programs
- Build adaptive learning systems
- Personalize education paths
- Content recommendation engines
Example:
{"type": "learning_paths","format": "json","data": [{"path_id": "path_beginner_1","title": "Introduction to Artificial Intelligence","description": "Start with the fundamentals and build up to complex topics","nodes": ["artificial_intelligence","machine_learning","supervised_learning","neural_networks"],"difficulty": "beginner","estimated_time": "1 hour 15 minutes","step_count": 4,"prerequisites": [],"learning_objectives": ["Understand what AI is","Learn the basics of ML","Grasp supervised learning concepts","Introduction to neural networks"]},{"path_id": "path_intermediate_1","title": "Deep Learning Specialization","description": "Dive deeper into advanced neural network architectures","nodes": ["neural_networks","deep_learning","cnns","rnns"],"difficulty": "intermediate","estimated_time": "2 hours 30 minutes","step_count": 4,"prerequisites": ["machine_learning"],"learning_objectives": ["Master deep learning concepts","Understand CNNs for vision","Learn RNNs for sequences","Build real projects"]}]}
How to use:
# For Educatorslearning_paths = next(item for item in dataset.iterate_items()if item['type'] == 'learning_paths')for path in learning_paths['data']:print(f"\n=== {path['title']} ===")print(f"Difficulty: {path['difficulty']}")print(f"Duration: {path['estimated_time']}")print("\nLearning sequence:")for i, node_id in enumerate(path['nodes'], 1):node_label = get_node_label(node_id) # You'd look this upprint(f" {i}. {node_label}")
4. Analysis (JSON) - For Researchers & Advanced Analysis
Why this matters: Provides statistical insights about the knowledge graph structure, helping you understand the domain complexity and learning landscape.
Use cases:
- Research on knowledge representation
- Curriculum design optimization
- Identifying knowledge gaps
- Graph quality assessment
- Comparative analysis across domains
Example:
{"type": "analysis","format": "json","data": {"stats": {"num_nodes": 15,"num_edges": 23,"density": 0.21,"is_connected": true,"num_components": 1,"avg_degree": 3.07},"root_nodes": [{"id": "artificial_intelligence","label": "Artificial Intelligence","degree": 7}],"leaf_nodes": [{"id": "deep_learning","label": "Deep Learning","degree": 4},{"id": "reinforcement_learning","label": "Reinforcement Learning","degree": 3}],"most_connected": [{"id": "machine_learning","label": "Machine Learning","degree": 8},{"id": "neural_networks","label": "Neural Networks","degree": 6}],"components": [{"id": 0,"nodes": ["ai", "ml", "dl", "nn", "supervised", ...],"size": 15}]}}
Interpretation guide:
- Density: How interconnected the domain is (0.0 = sparse, 1.0 = fully connected)
- Root nodes: Foundational concepts (no incoming edges) - these should be learned first
- Leaf nodes: End results (no outgoing edges) - these are advanced topics
- Most connected: Core concepts that link many ideas together
- Components: Separate knowledge clusters (ideally just 1 for cohesive learning)
5. Metadata (JSON) - For Monitoring & Debugging
Why this matters: Provides a summary of the entire process, useful for monitoring, debugging, and understanding what happened during execution.
Use cases:
- Monitor actor health
- Debug failed runs
- Track processing statistics
- Generate performance reports
- Quality assurance
Example:
{"type": "metadata","format": "json","data": {"status": "success","message": "Knowledge graph generated successfully","timestamp": 1731000000.0,"stats": {"pages_processed": 3,"nodes": 15,"edges": 23,"learning_paths": 3,"content_length": 45230,"execution_time_seconds": 18.5,"ai_provider": "openrouter","model": "gpt-oss-20b"},"source_urls": [{"url": "https://example.com/article1"},{"url": "https://example.com/article2"},{"url": "https://example.com/article3"}],"warnings": [],"errors": []}}
๐ฏ How to Use the Outputs
For End Users (Non-Technical)
- Start with the Markdown report - It's the most readable
- Review the learning paths - Choose one that matches your level
- Follow the recommended sequence - Use it as a study guide
- Check the analysis section - Understand the difficulty level
For Developers
- Parse the JSON outputs for integration into your systems
- Use the Graph JSON with visualization libraries (D3.js, vis.js, etc.)
- Implement learning path recommendations in your app
- Store in databases for further querying and analysis
For Educators & Trainers
- Use Learning Paths JSON to design curricula
- Extract concepts to create lesson plans
- Use difficulty levels to segment audiences
- Track prerequisites for proper sequencing
For Researchers
- Analyze the Graph structure for domain insights
- Compare graphs across different sources
- Study the relationship types for knowledge representation
- Research learning path algorithms
๐ก Practical Example: Building a Learning Platform
Here's how you might use all outputs together:
# 1. Get all datasetsdataset_items = list(dataset.iterate_items())# 2. Extract each componentgraph_json = find_by_type(dataset_items, 'knowledge_graph', 'json')learning_paths = find_by_type(dataset_items, 'learning_paths', 'json')analysis = find_by_type(dataset_items, 'analysis', 'json')markdown_doc = find_by_type(dataset_items, 'knowledge_graph', 'markdown')# 3. Build a learning platformfor path in learning_paths['data']:# Create course from learning pathcourse = {'title': path['title'],'difficulty': path['difficulty'],'estimated_hours': parse_time(path['estimated_time']),'modules': []}# Get detailed info for each node from the graphfor node_id in path['nodes']:node = next(n for n in graph_json['data']['nodes'] if n['id'] == node_id)course['modules'].append({'title': node['label'],'description': node['description'],'type': node['type']})# Save course to your platformsave_course(course)
๐ Accessing Outputs
All outputs are available in the Apify dataset created when the run completes. You can:
- Download directly from the Apify Console
- Access via API using the Apify Python/Node.js client
- Export to various formats (JSON, CSV, Excel)
- Integrate into workflows using webhooks or API calls
Example - Getting specific output:
from apify import ApifyClientclient = ApifyClient.init(token='YOUR_TOKEN')dataset = client.dataset('DATASET_ID')# Get only learning pathsfor item in dataset.iterate_items():if item['type'] == 'learning_paths':learning_paths = item['data']break
๐งช Testing
Run Locally
# With test input filepython3 main.py# Or specify a test filepython3 -c "import jsonwith open('test-input.json', 'r') as f:data = json.load(f)print(json.dumps(data, indent=2))"
Example Test Input
{"startUrls": [{"url": "https://en.wikipedia.org/wiki/Machine_learning"},{"url": "https://en.wikipedia.org/wiki/Artificial_intelligence"}],"maxPages": 2,"outputLanguage": "en","aiProvider": "openrouter","openaiModel": "gpt-oss-20b","useBrowser": false}
๐ง Configuration
Environment Variables
OPENAI_API_KEY: OpenAI API key (for OpenAI provider)OPENROUTER_API_KEY: OpenRouter API key (for OpenRouter provider)APIFY_TOKEN: Apify API token (for deployment)
Custom Models
For OpenAI provider:
- "gpt-4"
- "gpt-4-turbo"
- "gpt-3.5-turbo"
For OpenRouter provider:
- "gpt-oss-20b"
- "claude-3-sonnet"
- "llama-3-70b"
๐ How It Works
- Input Validation: Validates all required fields and parameters
- API Key Configuration: Sets up OpenAI/OpenRouter API keys
- Web Crawling: Fetches content from specified URLs
- Extracts text using BeautifulSoup
- Removes scripts and styles
- Aggregates content with metadata
- AI Graph Generation: Sends content to AI API
- Prompts the model to extract concepts and relationships
- Parses JSON response
- Cleans and validates graph data
- Graph Analysis: Analyzes graph structure
- Calculates centrality (degree centrality)
- Identifies root and leaf nodes
- Finds connected components
- Computes density and statistics
- Learning Path Generation: Creates optimal learning paths
- Uses Dijkstra's algorithm for longest paths
- Finds shortest paths between important nodes
- Generates beginner and intermediate paths
- Estimates learning time
- Documentation: Creates comprehensive Markdown output
- Save Results: Stores all data in Apify datasets
๐๏ธ Project Structure
cogni/โโโ main.py # Main actor entry pointโโโ Dockerfile # Docker container configurationโโโ requirements.txt # Python dependenciesโโโ test-input-simple.json # Sample test inputโโโ README.md # This fileโโโโ services/ # Core servicesโ โโโ crawler_service.py # Web crawlingโ โโโ ai_service.py # AI integrationโ โโโ graph_service.py # Graph analysisโโโโ utils/ # Utilitiesโ โโโ markdown_generator.py # Documentation generationโ โโโ validators.py # Input/graph validationโ โโโ translations.py # Multilingual supportโโโโ models/ # Data modelsโโโ __init__.py # Type definitions
๐ Error Handling
The actor includes comprehensive error handling:
- Input Validation: Validates required fields and types
- API Errors: Handles rate limits, authentication, and API errors
- Network Errors: Handles connection timeouts and failures
- JSON Parsing: Recovers from malformed AI responses
- Graph Validation: Ensures graph data integrity
All errors are logged and saved to Apify datasets for debugging.
๐ค Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test locally
- Deploy to Apify
- Submit a pull request
๐ License
This project is licensed under the MIT License.
๐ Support
For issues and questions:
- Check the Apify actor logs
- Review error messages in the output datasets
- Verify API keys are correctly configured
- Test with simpler inputs first
๐ Changelog
Version 1.0 (2025-11-06)
- Initial Python implementation
- Complete rewrite from TypeScript
- Improved error handling
- Enhanced validation
- Multilingual support
- Learning path generation
- Comprehensive documentation
Built with โค๏ธ using Python, Apify, OpenAI, and OpenRouter