100% private · fully offline
Hierarchical orchestration with dynamic task planning, persistent memory with semantic retrieval, and a closed-loop agent behavioral correction system.
Advanced agents on top of local models
Each agent is an autonomous computational module with encapsulated state, a specialized system prompt and an isolated tool space. The Archangel is an HTN planner: it builds a graph of sub-tasks with explicit dependencies, allocates agents by specialization and coordinates them through a shared state bus.
Higher efficiency from local models
Three orthogonal mechanisms: (i) the context window is managed via semantic retrieval — at inference time a weighted sample by relevance score is supplied; (ii) parallel execution of DAG nodes with synchronization at dependency barriers; (iii) adaptive task routing by latency/quality/VRAM profile.
Unlimited context — the agent always remembers
Four memory layers: working (inference window), episodic (extractive summarization), semantic store (vector DB, ANN, cosine similarity with temporal decay) and procedural (behavioral patterns). Retrieval is a ranking task: k relevant fragments within a token budget.
Productivity monitoring for local models
Real-time observability stack: task completion rate, confidence calibration score, tool invocation success ratio, context utilization efficiency, VRAM·time. Hallucination detection via cross-validation between agents and consistency checking on factual claims.
Live behavioral correction algorithm
Behavioral monitor as a closed control loop: it computes topic drift score, uncertainty expression rate and task alignment index. On threshold breach: runtime modification of the system prompt, rebuilding the retrieval strategy and changing tool access scope. On a reasoning loop — a forced context reset.
100% offline — full data privacy
The entire inference stack, retrieval, tool calls and inter-agent coordination are confined to localhost with no outbound connections. Telemetry is architecturally absent. GDPR, HIPAA and SOC 2 compliance are structural consequences of the isolated runtime.
Agent orchestrator for complex multi-level projects
The scheduler builds a DAG with dependencies, priorities and resource constraints. Agents execute nodes in parallel; artifacts are passed through typed interfaces with schema verification. Online replanning: only the affected sub-tree is rebuilt, not the entire pipeline.
Integration with 40+ services and systems
A unified tool-calling interface on top of REST, GraphQL and WebSocket adapters. Each connector encapsulates authorization, rate-limit handling and retry with exponential backoff. New integrations are added without modifying the orchestrator core.
Built-in unlimited image generator
Local inference on Stable Diffusion / FLUX. Integrated as a first-class tool callable by an agent programmatically. VRAM management during parallel LLM inference and generation via dynamic offloading of weights between GPU and CPU.
Full voice communication — agents hear and speak
STT uses Whisper-compatible architectures with local inference; TTS uses neural vocoder models with low latency. Voice input is deserialized into a structured intent. Response synthesis is asynchronous — audio is streamed as it is generated.
Vision for any model — even small ones
A cascaded multi-model pipeline: small models analyze image structure and the results are aggregated into a structured text description for a text-only LLM agent. Vision capabilities without a multimodal LLM in the main stack — substantially lowers the VRAM threshold.
Large library of local models with specialization search
A catalog with metadata: benchmarks (MMLU, HumanEval, GSM8K, MATH) per task, VRAM at different quantization levels (Q4_K_M, Q5_K_M, Q8_0, F16), throughput on reference hardware. Search matches the task profile to model characteristics within resource constraints.
Full cloud-model support — Gemini, Anthropic, OpenAI, Grok, DeepSeek
A unified LLM abstraction layer: a single interface for local and cloud providers. The orchestrator routes tasks by a multi-criteria function: cost, latency, capability profiles and privacy constraints. Routing is configured declaratively.
Advanced websites and landing pages
The Code agent operates at the architecture level: component graph, API contracts, dependency schema, ADRs in long-term memory. It iterates autonomously on the results of static analysis, test coverage and performance metrics.
Prototyping — an ideal fit for startups
Full stack: DB schema, migrations, API layer with validation, auth, UI and infrastructure configs. Generated code includes exception handling, structured logging and a baseline security model.
Game development
The Game agent works with ECS patterns, behavioral state machines and event-driven systems. It generates game logic, NPC behavior trees and procedural generation. Pairs with the image generator and TTS.
Workflow management and automation
The Workflow agent builds event-driven pipelines with branching, exception handling and rollback. Behavior adapts based on execution history in procedural memory — parameterized logic that learns from precedents.
Data monitoring and processing
The agent acts as a continuous consumer: parses sources on a schedule or event trigger, normalizes to a target schema, detects anomalies (z-score, IQR, isolation forest) and generates fully contextual alerts.
Integrations: Gmail, Google Calendar, Monday, AirTable, Notion...
Semantically consistent transactions across multiple services: inbound document → extraction → cross-reference with CRM → status update → task → notification. Atomic from a business-logic standpoint.
Writing books and research papers
The Research agent builds a knowledge graph from a source corpus, tracks claim consistency and stores a stylistic profile in long-term memory. It generates bibliographies in any citation format.
Website scanning — data extraction and structuring
Playwright backend: JS rendering, pagination traversal, dynamic content. Deduplication by content hash, normalization to a target schema. Horizontal scaling via parallel browser sessions.
Building and launching ad campaigns
Generation of variant content by audience parameters, an A/B hypothesis matrix and creative packages tailored to placement formats. Analyzes performance metrics and proposes iterations based on statistical significance.
Music composition
Generation of MIDI sequences, harmonic progressions, melodic lines and arrangements through specialized generative models. The agent keeps musical context (key, meter, thematic material) and iterates as a co-author.
Soul Studio
Local
Full stack on your hardware
All agents, the planner, integrations and image generation — inference is confined to the machine. Automatic model sharding and layer offloading tailored to the hardware configuration.
- GPU 8 GB VRAM min
- 12 GB+ recommended
- Fully offline
- Air-gapped architecture
Soul Studio
Cloud
Same power — through the browser
Identical orchestration architecture with inference on managed infrastructure. API-compatible with Local: agent configurations port over without modification.
- Any device
- Browser
- Team collaboration
- Multi-user workspaces
Soul Studio
Mini
Same architecture — for low-end hardware
Orchestration architecture optimized for CPU inference. 1B–4B models with aggressive Q4 quantization. Parity on memory architecture and behavioral correction.
- CPU-inference
- 8 GB RAM
- Q4 quantization
- Fully offline
The modern landscape of LLM-based automation systems is defined by a fundamental structural contradiction: although the quality of local open-source models is high enough to solve a significant share of production tasks, the infrastructure layer that would orchestrate them effectively is missing.
It has been shown empirically that a specialized 7B-parameter model outperforms general-purpose commercial systems on its target tasks. 34B-class models with quantization are competitive with the best commercial offerings on a large portion of standard benchmarks.
Soul Studio is that infrastructure layer.
Orchestration as the determinant of system intelligence
A multi-agent system with a correctly implemented planner outperforms a single monolithic model many times larger — through execution parallelism and per-task agent specialization.
Persistent hierarchical memory as a precondition for agentic behavior
A system without long-term memory with semantic retrieval is a stateless function, not an autonomous agent. Accumulating institutional knowledge through episodic and procedural memory is the key condition for an agent to grow more effective over time.
Local inference as an architectural advantage, not a compromise
A deterministic execution environment, zero latency on tool calls, the absence of rate limits and full control over the execution environment have value in their own right — independent of any privacy considerations.
Each phase is a complete functional layer, not an interim state. We build bottom-up: first a production-ready core, then vertical specializations and an open ecosystem. No phase starts before the previous one stabilizes.
Foundation
production-ready orchestrator, 40+ adapters, Local + Cloud
Vertical Agents
Vertical agent packages: Legal, Healthcare, FinTech, eCommerce
Ecosystem
SDK for custom agents, marketplace of specializations
Federation
Distributed agent networks, node federation, P2P coordination
Foundation
production-ready orchestrator, 40+ adapters, Local + Cloud
Vertical Agents
Vertical agent packages: Legal, Healthcare, FinTech, eCommerce
Ecosystem
SDK for custom agents, marketplace of specializations
Federation
Distributed agent networks, node federation, P2P coordination
Soul Studio is built by a team of practicing engineers who have accumulated systematic experience with existing solutions in the agentic systems space — LangChain, AutoGPT, LM Studio, Open WebUI — and concluded that they are fundamentally limited for production use.
None of the existing solutions delivers all at once: production-ready orchestration of many specialized agents, hierarchical persistent memory with semantic retrieval, a closed-loop behavioral correction layer and full execution-environment isolation. Soul Studio is built as the answer to this combined set of requirements — not as an incremental improvement.
Why Multi-Agent Systems Will Replace Single LLM Pipelines
Orchestration is the new intelligence. When specialized agents work in parallel with shared memory, they consistently outperform monolithic models — even much larger ones.
Read on MediumPersistent Memory Is What Separates Agents from Chatbots
A system without long-term semantic memory is a stateless function. True autonomous agents accumulate institutional knowledge through episodic and procedural memory layers.
Read on MediumLocal Inference Is an Architectural Advantage, Not a Compromise
Zero latency on tool calls, deterministic execution environments, no rate limits. Running AI locally isn't about privacy — it's about performance and control.
Read on MediumHow We Built a Production-Ready AI Orchestrator in 6 Months
From a single script to a full multi-agent system with 40+ tool adapters. The engineering decisions that made Soul Studio production-ready — and the ones we regret.
Read on MediumUpgrade
Upgrade to the current version for existing users
- Update to the latest version
- All new agents and tools
- Configurations preserved
- Memory data migration
Lifetime License
Lifetime personal license for Soul Studio Local
- Soul Studio Local — full version
- All agents and the orchestrator
- 40+ integration adapters
- Image generator (unlimited)
- Voice pipeline (STT + TTS)
- Updates within the current major version
- Priority support
Subscription
Continuous updates and full access to all Soul products
- Soul Studio Local + Cloud + Mini
- All updates across all versions
- Early access to new features
- Test utilities and beta releases
- Priority support channel
Refunds — 14 days, no questions asked