Soul Studio is a production-ready multi-agent orchestrator for local large language models. It combines an HTN planner, four-layer persistent memory with semantic retrieval, closed-loop behavioral control and 40+ tool adapters into a single platform that runs entirely on local hardware.

Does Soul Studio run fully offline?

Yes. The entire inference stack, retrieval, tool calls and inter-agent coordination are confined to localhost with no outbound connections. There is no telemetry by architecture. GDPR, HIPAA and SOC 2 compliance are structural consequences of the air-gapped runtime.

What hardware does Soul Studio Local require?

A GPU with at least 8 GB of VRAM is the minimum; 12 GB or more is recommended. The system automatically handles model sharding and layer offloading based on available hardware.

Which local LLMs can I use?

Soul Studio runs Ollama-compatible models across quantization levels (Q4_K_M, Q5_K_M, Q8_0, F16). The built-in model catalog matches task profiles to model characteristics using benchmarks such as MMLU, HumanEval, GSM8K and MATH.

Can Soul Studio also use cloud LLMs?

Yes. A unified LLM abstraction layer routes tasks across local and cloud providers (Gemini, Anthropic, OpenAI, Grok, DeepSeek) using a multi-criteria function over cost, latency, capability profile and privacy constraints. Routing is configured declaratively.

What versions of Soul Studio are available?

Soul Studio Local (alpha, available now) runs the full stack on local hardware. Soul Studio Cloud delivers the same orchestration architecture from the browser. Soul Studio Mini is optimized for CPU inference with aggressive Q4 quantization for low-end machines.

How much does Soul Studio cost?

Soul Studio offers a one-time Lifetime License at $129 for Local, a $19 monthly Subscription with full access to Local + Cloud + Mini, and a $39 Upgrade for existing users. All purchases are covered by a 14-day no-questions refund policy.

What can I build with Soul Studio?

Typical workloads include web and product development, rapid prototyping, game development, workflow automation, data processing, integrations with services like Gmail, Google Calendar, Monday, AirTable and Notion, research and writing, web scraping with Playwright, marketing campaigns and music generation.

Is Soul Studio open source?

The orchestration core is proprietary; selected open components are released under the BUSL-1.1 license.

Alpha · Early Access

Orchestrator for
self-hosted
AI agents

100% private · fully offline

Hierarchical orchestration with dynamic task planning, persistent memory with semantic retrieval, and a closed-loop agent behavioral correction system.

Get Access Watch Demo

Soul Studio in action — demo coming soon

Core Capabilities

Architecture, not a wrapper

01HTN Planning

Advanced agents on top of local models

Each agent is an autonomous computational module with encapsulated state, a specialized system prompt and an isolated tool space. The Archangel is an HTN planner: it builds a graph of sub-tasks with explicit dependencies, allocates agents by specialization and coordinates them through a shared state bus.

02Parallel Execution

Higher efficiency from local models

Three orthogonal mechanisms: (i) the context window is managed via semantic retrieval — at inference time a weighted sample by relevance score is supplied; (ii) parallel execution of DAG nodes with synchronization at dependency barriers; (iii) adaptive task routing by latency/quality/VRAM profile.

03ANN Retrieval

Unlimited context — the agent always remembers

Four memory layers: working (inference window), episodic (extractive summarization), semantic store (vector DB, ANN, cosine similarity with temporal decay) and procedural (behavioral patterns). Retrieval is a ranking task: k relevant fragments within a token budget.

04Real-time Observability

Productivity monitoring for local models

Real-time observability stack: task completion rate, confidence calibration score, tool invocation success ratio, context utilization efficiency, VRAM·time. Hallucination detection via cross-validation between agents and consistency checking on factual claims.

05Closed-loop Control

Live behavioral correction algorithm

Behavioral monitor as a closed control loop: it computes topic drift score, uncertainty expression rate and task alignment index. On threshold breach: runtime modification of the system prompt, rebuilding the retrieval strategy and changing tool access scope. On a reasoning loop — a forced context reset.

06Air-gap Architecture

100% offline — full data privacy

The entire inference stack, retrieval, tool calls and inter-agent coordination are confined to localhost with no outbound connections. Telemetry is architecturally absent. GDPR, HIPAA and SOC 2 compliance are structural consequences of the isolated runtime.

07DAG Scheduler

Agent orchestrator for complex multi-level projects

The scheduler builds a DAG with dependencies, priorities and resource constraints. Agents execute nodes in parallel; artifacts are passed through typed interfaces with schema verification. Online replanning: only the affected sub-tree is rebuilt, not the entire pipeline.

08Plugin Architecture

Integration with 40+ services and systems

A unified tool-calling interface on top of REST, GraphQL and WebSocket adapters. Each connector encapsulates authorization, rate-limit handling and retry with exponential backoff. New integrations are added without modifying the orchestrator core.

09Local SD/FLUX

Built-in unlimited image generator

Local inference on Stable Diffusion / FLUX. Integrated as a first-class tool callable by an agent programmatically. VRAM management during parallel LLM inference and generation via dynamic offloading of weights between GPU and CPU.

10Whisper / Vocoder

Full voice communication — agents hear and speak

STT uses Whisper-compatible architectures with local inference; TTS uses neural vocoder models with low latency. Voice input is deserialized into a structured intent. Response synthesis is asynchronous — audio is streamed as it is generated.

11Multi-model Cascade

Vision for any model — even small ones

A cascaded multi-model pipeline: small models analyze image structure and the results are aggregated into a structured text description for a text-only LLM agent. Vision capabilities without a multimodal LLM in the main stack — substantially lowers the VRAM threshold.

12Model Catalog

Large library of local models with specialization search

A catalog with metadata: benchmarks (MMLU, HumanEval, GSM8K, MATH) per task, VRAM at different quantization levels (Q4_K_M, Q5_K_M, Q8_0, F16), throughput on reference hardware. Search matches the task profile to model characteristics within resource constraints.

13Unified LLM Layer

Full cloud-model support — Gemini, Anthropic, OpenAI, Grok, DeepSeek

A unified LLM abstraction layer: a single interface for local and cloud providers. The orchestrator routes tasks by a multi-criteria function: cost, latency, capability profiles and privacy constraints. Routing is configured declaratively.

Use Cases

What people build on Soul Studio

Web Development

Advanced websites and landing pages

The Code agent operates at the architecture level: component graph, API contracts, dependency schema, ADRs in long-term memory. It iterates autonomously on the results of static analysis, test coverage and performance metrics.

Rapid Prototyping

Prototyping — an ideal fit for startups

Full stack: DB schema, migrations, API layer with validation, auth, UI and infrastructure configs. Generated code includes exception handling, structured logging and a baseline security model.

Game Dev

Game development

The Game agent works with ECS patterns, behavioral state machines and event-driven systems. It generates game logic, NPC behavior trees and procedural generation. Pairs with the image generator and TTS.

Workflow Automation

Workflow management and automation

The Workflow agent builds event-driven pipelines with branching, exception handling and rollback. Behavior adapts based on execution history in procedural memory — parameterized logic that learns from precedents.

Data Processing

Data monitoring and processing

The agent acts as a continuous consumer: parses sources on a schedule or event trigger, normalizes to a target schema, detects anomalies (z-score, IQR, isolation forest) and generates fully contextual alerts.

Integrations

Integrations: Gmail, Google Calendar, Monday, AirTable, Notion...

Semantically consistent transactions across multiple services: inbound document → extraction → cross-reference with CRM → status update → task → notification. Atomic from a business-logic standpoint.

Research & Writing

Writing books and research papers

The Research agent builds a knowledge graph from a source corpus, tracks claim consistency and stores a stylistic profile in long-term memory. It generates bibliographies in any citation format.

Web Scraping

Website scanning — data extraction and structuring

Playwright backend: JS rendering, pagination traversal, dynamic content. Deduplication by content hash, normalization to a target schema. Horizontal scaling via parallel browser sessions.

Marketing

Building and launching ad campaigns

Generation of variant content by audience parameters, an A/B hypothesis matrix and creative packages tailored to placement formats. Analyzes performance metrics and proposes iterations based on statistical significance.

Music Generation

Music composition

Generation of MIDI sequences, harmonic progressions, melodic lines and arrangements through specialized generative models. The agent keeps musical context (key, meter, thematic material) and iterates as a co-author.

Versions

Three versions. One studio.

Soul Studio
Local

Full stack on your hardware

All agents, the planner, integrations and image generation — inference is confined to the machine. Automatic model sharding and layer offloading tailored to the hardware configuration.

GPU 8 GB VRAM min
12 GB+ recommended
Fully offline
Air-gapped architecture

Alpha — available

Soul Studio
Cloud

Same power — through the browser

Identical orchestration architecture with inference on managed infrastructure. API-compatible with Local: agent configurations port over without modification.

Any device
Browser
Team collaboration
Multi-user workspaces

In development

Soul Studio
Mini

Same architecture — for low-end hardware

Orchestration architecture optimized for CPU inference. 1B–4B models with aggressive Q4 quantization. Parity on memory architecture and behavioral correction.

CPU-inference
8 GB RAM
Q4 quantization
Fully offline

In development

Vision

An operating system for AI work

The modern landscape of LLM-based automation systems is defined by a fundamental structural contradiction: although the quality of local open-source models is high enough to solve a significant share of production tasks, the infrastructure layer that would orchestrate them effectively is missing.

It has been shown empirically that a specialized 7B-parameter model outperforms general-purpose commercial systems on its target tasks. 34B-class models with quantization are competitive with the best commercial offerings on a large portion of standard benchmarks.

Soul Studio is that infrastructure layer.

Orchestration as the determinant of system intelligence

A multi-agent system with a correctly implemented planner outperforms a single monolithic model many times larger — through execution parallelism and per-task agent specialization.

Persistent hierarchical memory as a precondition for agentic behavior

A system without long-term memory with semantic retrieval is a stateless function, not an autonomous agent. Accumulating institutional knowledge through episodic and procedural memory is the key condition for an agent to grow more effective over time.

Local inference as an architectural advantage, not a compromise

A deterministic execution environment, zero latency on tool calls, the absence of rate limits and full control over the execution environment have value in their own right — independent of any privacy considerations.

Roadmap

Where Soul Studio is heading

Each phase is a complete functional layer, not an interim state. We build bottom-up: first a production-ready core, then vertical specializations and an open ecosystem. No phase starts before the previous one stabilizes.

01Current phase

LiveAlpha

Foundation

production-ready orchestrator, 40+ adapters, Local + Cloud

02Phase 2

Q3 2025

Vertical Agents

Vertical agent packages: Legal, Healthcare, FinTech, eCommerce

03Phase 3

Q1 2026

Ecosystem

SDK for custom agents, marketplace of specializations

04Phase 4

2026+

Federation

Distributed agent networks, node federation, P2P coordination

01Current phase

LiveAlpha

Foundation

production-ready orchestrator, 40+ adapters, Local + Cloud

02Phase 2

Q3 2025

Vertical Agents

Vertical agent packages: Legal, Healthcare, FinTech, eCommerce

03Phase 3

Q1 2026

Ecosystem

SDK for custom agents, marketplace of specializations

04Phase 4

2026+

Federation

Distributed agent networks, node federation, P2P coordination

Team

Practitioners, not theorists

Soul Studio is built by a team of practicing engineers who have accumulated systematic experience with existing solutions in the agentic systems space — LangChain, AutoGPT, LM Studio, Open WebUI — and concluded that they are fundamentally limited for production use.

None of the existing solutions delivers all at once: production-ready orchestration of many specialized agents, hierarchical persistent memory with semantic retrieval, a closed-loop behavioral correction layer and full execution-environment isolation. Soul Studio is built as the answer to this combined set of requirements — not as an incremental improvement.

40+tool adapters

4hierarchical memory layers

100%local inference

No venture fundingDecisions driven by engineering rationaleUsed in production every day

Press

Soul Studio in the press

Why Multi-Agent Systems Will Replace Single LLM Pipelines

Architecture

Feb 2025 · 6 min read

Why Multi-Agent Systems Will Replace Single LLM Pipelines

Orchestration is the new intelligence. When specialized agents work in parallel with shared memory, they consistently outperform monolithic models — even much larger ones.

Read on Medium

Memory

Jan 2025 · 8 min read

Persistent Memory Is What Separates Agents from Chatbots

A system without long-term semantic memory is a stateless function. True autonomous agents accumulate institutional knowledge through episodic and procedural memory layers.

Read on Medium

Local Inference Is an Architectural Advantage, Not a Compromise

Infrastructure

Jan 2025 · 5 min read

Local Inference Is an Architectural Advantage, Not a Compromise

Zero latency on tool calls, deterministic execution environments, no rate limits. Running AI locally isn't about privacy — it's about performance and control.

Read on Medium

How We Built a Production-Ready AI Orchestrator in 6 Months

Engineering

Dec 2024 · 10 min read

How We Built a Production-Ready AI Orchestrator in 6 Months

From a single script to a full multi-agent system with 40+ tool adapters. The engineering decisions that made Soul Studio production-ready — and the ones we regret.

Read on Medium

Plans

No hidden conditions

Upgrade

$39one-time

Upgrade to the current version for existing users

Update to the latest version
All new agents and tools
Configurations preserved
Memory data migration

Lifetime License

$129one-time — forever

Lifetime personal license for Soul Studio Local

Soul Studio Local — full version
All agents and the orchestrator
40+ integration adapters
Image generator (unlimited)
Voice pipeline (STT + TTS)
Updates within the current major version
Priority support

Subscription

$19per month

Continuous updates and full access to all Soul products

Soul Studio Local + Cloud + Mini
All updates across all versions
Early access to new features
Test utilities and beta releases
Priority support channel

Refunds — 14 days, no questions asked

Orchestrator fordecentralizedself-hostedAI agents

Architecture, not a wrapper

What people build on Soul Studio

Three versions. One studio.

Soul StudioLocal

Soul StudioCloud

Soul StudioMini

An operating system for AI work

Where Soul Studio is heading

Practitioners, not theorists

Soul Studio in the press

Why Multi-Agent Systems Will Replace Single LLM Pipelines

Persistent Memory Is What Separates Agents from Chatbots

Local Inference Is an Architectural Advantage, Not a Compromise

How We Built a Production-Ready AI Orchestrator in 6 Months

No hidden conditions

Orchestrator for
self-hosted
AI agents

Soul Studio
Local

Soul Studio
Cloud

Soul Studio
Mini