Model Management

Manage your local AI models

Dome Terminal supports both local AI models (running entirely on your machine) and remote AI providers. This guide explains how to download, configure, activate, and manage local models — and how to decide which model to use for which task.

Why local models — privacy, cost, and offline capability

The choice between local and remote AI isn't just technical — it's a question of what you value and how you work. Here's an honest comparison of what each option gives you.

Local AI
Nothing leaves your machine — your journal, chart data, and strategies stay private
No usage cost per query after initial setup
Works offline — no internet required once the model is downloaded
Responds instantly on capable hardware (GPU with 8GB+ VRAM)
Slower on CPU-only machines, especially larger models
Smaller models have less reasoning depth than top remote models
Requires disk space (350MB–8GB+ per model)
Remote AI
Highest quality reasoning — top-tier models available
No local hardware requirements — works on any machine
Fastest for complex, multi-step analysis
Requires internet connectivity
Prompt data is sent to the provider's servers
Per-query cost depending on your plan
Not available if internet is down

A practical approach for most traders

Use local AI for quick queries, private research, and anything where your journal or strategy details are part of the context. Use remote AI for complex, deep analysis tasks where the quality difference justifies sending data externally. You can switch between the two in the AI provider selector at any time — it's not a permanent choice.

Supported models

Dome Terminal supports any GGUF-format model compatible with Ollama. The models listed below are tested and recommended. Larger models produce better results but require more hardware.

Local Model Browser — Settings → AI → Local Models
Model
Size
Context
Best for
Status
Qwen 2.5 0.5B
Ultra-fast · CPU-friendly
350 MB
8K tokens
Quick summaries, short queries, active trading sessions
● Active
Qwen 2.5 1.5B
Balanced · Good for most tasks
1.0 GB
32K tokens
Market analysis, journal review, ORACLE conversations
↓ Download
7B+ Models
Deep reasoning · GPU recommended
4–8 GB
128K tokens
Complex multi-step analysis, AI Committee, deep journal analysis
↓ Download
HardwareRecommended model tierWhy
CPU only, 8GB RAM0.5B or 1.5BLarger models will be slow (>30s per response) and may cause system instability under load.
CPU only, 16GB+ RAM1.5B comfortably, 7B with patience7B on CPU is usable for non-time-sensitive research. Expect 30–90 second response times.
GPU with 6–8GB VRAM7B models7B models fit in 6-8GB VRAM and produce responses in 5–15 seconds. A good balance of quality and speed.
GPU with 12GB+ VRAM7B–13B modelsLarger models produce meaningfully better reasoning. Response times are still fast (under 15 seconds).
Downloading a model

Models are downloaded directly from within Dome Terminal. No external tools or command-line steps are required for supported models.

01

Open Settings → AI → Local Models

Navigate to Settings (gear icon in the sidebar, or Ctrl+,) → AI tab → Local Models section. You'll see the model browser with all available models and their download status.

02

Find the model and click Download

Browse the available models, check the size and context length against your hardware specs (table above), and click ↓ Download. Make sure you have enough free disk space before starting — the download size is shown in the model browser.

03

Monitor progress in the status bar

A download progress indicator appears in the terminal status bar at the bottom of the screen. You can continue using the terminal normally during the download — it runs in the background.

Qwen 2.5 1.5B — qwen2.5-1.5b-instruct.Q4_K_M.gguf 72%
734 MB / 1.02 GB · ~2m 14s remaining
04

Download complete — model appears in Available Models

When the download finishes, the model status changes from ↓ Download to Set Active. You can now activate it for use. The model file stays on your machine and does not need to be re-downloaded.

Activating a model

Only one local model is active at a time. The active model is loaded into memory and used for all local AI features — ORACLE, AI Committee, and any other module that routes to local AI.

How to set the active model

In Settings → AI → Local Models, click Set Active next to any downloaded model. The terminal loads the model into memory — this takes between 5 and 30 seconds depending on model size and your hardware. A loading indicator appears in the AI status bar.

Once loaded, the AI provider indicator in the toolbar changes to show the active local model name. You can switch models at any time — the current model unloads and the new one loads. Switching mid-session does not lose conversation history.

Switch to a smaller model during active trading

During a live trading session, response speed often matters more than reasoning depth. Switch to the 0.5B or 1.5B model for quick queries while trading, and switch back to a larger model for deeper research when the market is closed. You can create a keyboard shortcut for model switching in Settings → Shortcuts.

Privacy controls — what the AI can see

Dome Terminal lets you control exactly what context is included when an AI model processes your queries. These settings are separate for local and remote models — you can give local AI access to sensitive data while keeping it off-limits for remote AI.

Local AI context
Active chart symbol and price
Quant Brain readings
Journal entries (all fields)
Strategy scripts
Account balance and PnL
Open positions

All data stays on your machine. Nothing is transmitted externally.

Remote AI context (defaults)
Active chart symbol and price
Quant Brain readings (anonymized)
Journal entries (mood, tags only)
Strategy scripts — blocked by default
Account balance — blocked by default
Live PnL — blocked by default

Sent fields go to the remote provider's servers. Adjust in Settings → AI → Privacy.

To adjust which fields are included in remote AI context, go to Settings → AI → Privacy and toggle each category. Changes apply immediately to all future queries — they do not affect queries already sent.

Using Ollama to manage models

If you already use Ollama to manage GGUF models on your machine, Dome Terminal can connect directly to Ollama and use its model library — you don't need to download the same models twice.

Configure Ollama in Settings → AI → Ollama

Navigate to Settings → AI → Ollama and enter the Ollama host address. If Ollama is running on the same machine, the default http://localhost:11434 will work without changes. Click Test Connection to verify.

Once connected, all models available in your Ollama library appear automatically in the Dome Terminal model browser alongside any natively downloaded models. You can activate, switch, and use them exactly the same way.

Ollama connection is optional

If you don't use Ollama, you don't need to configure it. Dome Terminal has its own built-in model downloader and manager. Ollama integration is for users who prefer to manage their model library with Ollama directly, or who want to use models that aren't in the Dome Terminal model browser.

Performance tips

Local AI performance depends heavily on your hardware and what else is running on your machine. These three tips make the biggest practical difference.

1
Close memory-heavy applications before loading large models
7B+ models require 4–8GB of RAM (or VRAM). If a browser with many tabs, a video editor, or another heavy application is using that memory, the model will either load slowly, run slowly, or fail to load at all. Close what you don't need before activating a large model.
2
Restart the Local LLM service if responses become slow
After extended use, the local AI service can degrade in performance — especially on systems with limited RAM. Go to Settings → AI → Local Models and click Restart LLM Service. This unloads and reloads the active model cleanly without restarting the entire terminal.
3
Use a smaller model during active trading sessions
When you're actively watching the market, a fast 0.5B or 1.5B response in 2–3 seconds is more useful than a thorough 7B response that takes 40 seconds. Reserve larger models for pre-session research and post-session journal review when response speed is less critical.