<%- include("../../../src/web/views/partials/layout-top", { title }) %> <% const renderPresetOptions = (options, current) => { const value = Number(current); const hasValue = options.some((option) => option.value === value); let html = ""; if (!hasValue && Number.isFinite(value)) { html += ``; } html += options.map((option) => ``).join(""); return html; }; %>

Lumi AI

Managed local inference, assistant access, and guarded plugin tools.

<%= runtimeStatus.healthy ? "Runtime ready" : "Runtime offline" %>
Improvement Center

Overview

Current installation and host capacity.

Providerllama.cpp
Selected model<%= models.find((model) => model.id === config.selected_model_id)?.label || config.selected_model_id %>
Gate model<%= models.find((model) => model.id === config.gate.model_id)?.label || config.gate.model_id %>
Gate status<%= gateStatus.healthy ? "Ready" : gateStatus.state %>
RAM<%= Math.round(hardware.total_ram_mb / 1024) %> GB
Free disk<%= formatBytes(hardware.free_disk_mb * 1048576) %>
CPU threads<%= hardware.cpu_threads %>
GPU<%= hardware.gpu.present ? hardware.gpu.model : "Not detected" %>
VRAM<%= hardware.gpu.vram_mb ? `${Math.round(hardware.gpu.vram_mb / 1024)} GB` : "Unavailable" %>
Compute API<%= hardware.gpu.compute_api?.length ? hardware.gpu.compute_api.map((api) => api.toUpperCase()).join(", ") : "CPU only" %>
GPU driver<%= hardware.gpu.driver || "Unavailable" %>
Installed backend<%= String(runtimeStatus.runtime_backend || "cpu").toUpperCase() %>
Recommended backend<%= String(hardware.runtime_selection.backend || "cpu").toUpperCase() %>
Total AI RAM estimate<%= formatBytes(resourceEstimate.total_cpu_memory_mb * 1048576) %>
Total AI VRAM estimate<%= formatBytes(resourceEstimate.total_gpu_memory_mb * 1048576) %>
<% resourceEstimate.warnings.forEach((warning) => { %>
<%= warning %>
<% }) %> <% sizeDiagnostics.forEach((diagnostic) => { %>
<%= diagnostic.message %>
<% }) %>

Models

Pinned GGUF files downloaded directly from Hugging Face and verified by SHA-256.

<% models.forEach((model) => { %>
<%= model.label %> <%= formatBytes(model.size) %> · <%= model.ram_gb %> GB recommended RAM · <%= model.repo %>
<%= model.downloaded ? "Installed" : model.compatible ? "Available" : "Exceeds host" %>
<% if (model.downloaded) { %>
<%- include("../../../src/web/views/partials/state-button", { type: "submit", classes: "subtle", attrs: `data-ai-download-button data-download-id="model:${model.id}"`, states: [ { id: "idle", text: "Redownload" }, { id: "loading", text: "Downloading", spinner: true }, { id: "success", text: "Downloaded" }, { id: "error", text: "Retry" } ] }) %>
<% } else { %>
<% if (!model.compatible) { %> <% } %> <%- include("../../../src/web/views/partials/state-button", { type: "submit", classes: "subtle", attrs: `data-ai-download-button data-download-id="model:${model.id}"`, states: [ { id: "idle", text: "Download" }, { id: "loading", text: "Downloading", spinner: true }, { id: "success", text: "Downloaded" }, { id: "error", text: "Retry" } ] }) %>
<% } %>
<% }) %>

Runtime

Official llama.cpp release, bound to localhost and stored inside this plugin.

<%- include("../../../src/web/views/partials/state-button", { type: "button", attrs: "data-runtime-primary", loadingState: "starting", successState: "running", errorState: "error", defaultState: runtimeStatus.state === "running" ? "running" : "idle", states: [ { id: "idle", text: "Start" }, { id: "starting", text: "Starting", spinner: true }, { id: "running", text: "Restart" }, { id: "restarting", text: "Restarting", spinner: true }, { id: "error", text: "Retry" } ] }) %>
Installed<%= runtimeStatus.runtime_installed ? "Yes" : "No" %> Process<%= runtimeStatus.state %> Health<%= runtimeStatus.healthy ? "Healthy" : "Unavailable" %> PID<%= runtimeStatus.pid || "None" %> Last stop<%= runtimeState.last_stop_reason %> Platform<%= hardware.platform %>-<%= hardware.architecture %> Self-test<%= runtimeStatus.last_self_test?.success ? "Passed" : runtimeStatus.last_self_test ? "Failed" : "Not run" %> Runtime folder<%= formatBytes(runtimeFolderSize) %> Runtime archive<%= runtimeTarget ? formatBytes(runtimeTarget.size) : "Unavailable" %> Model installed<%= formatBytes(modelFileSize) %> Model download<%= formatBytes(models.find((model) => model.id === config.selected_model_id)?.size || 0) %> Backend<%= String(runtimeStatus.runtime_backend || "cpu").toUpperCase() %> GPU intent<%= runtimeStatus.gpu_allocation_intent_percent || 0 %>% GPU actual<%= runtimeStatus.gpu_allocation_actual_percent || 0 %>% GPU safe maximum<%= runtimeStatus.gpu_allocation_max_safe_percent || 0 %>% GPU layers<%= runtimeStatus.gpu_layers || 0 %> Total VRAM<%= formatBytes((runtimeStatus.total_vram_mb || 0) * 1048576) %> Free VRAM<%= formatBytes((runtimeStatus.free_vram_mb || 0) * 1048576) %> Managed model VRAM<%= formatBytes((runtimeStatus.managed_model_vram_mb || 0) * 1048576) %> External VRAM estimate<%= formatBytes((runtimeStatus.external_vram_estimate_mb || 0) * 1048576) %>

Lightweight gate: <%= gateStatus.healthy ? "Ready" : gateStatus.state %>

<%= gateStatus.model_id || config.gate.model_id %> · CPU <%= formatBytes((gateStatus.estimated_cpu_memory_mb || 0) * 1048576) %> · VRAM <%= formatBytes((gateStatus.estimated_gpu_memory_mb || 0) * 1048576) %>

<% if (gateStatus.last_error) { %>
<%= gateStatus.last_error %>
<% } %> <% if (runtimeTarget) { %>

Managed <%= String(runtimeTarget.backend || "cpu").toUpperCase() %> release <%= runtimeManifest?.version || "b9592" %>

<%= runtimeTarget.filename %> · <%= formatBytes(runtimeTarget.size) %>

<%- include("../../../src/web/views/partials/state-button", { type: "submit", classes: "subtle", attrs: "data-ai-download-button data-download-id=\"runtime\"", states: [ { id: "idle", text: runtimeStatus.runtime_installed ? "Reinstall runtime" : "Download runtime" }, { id: "loading", text: "Downloading", spinner: true }, { id: "success", text: "Downloaded" }, { id: "error", text: "Retry" } ] }) %>
<% } else { %>
No managed runtime build is available for this OS and architecture.
<% } %> <% if (runtimeStatus.last_error) { %>
<%= runtimeStatus.last_error %>
<% } %> <% if (runtimeStatus.acceleration_warning) { %>
<%= runtimeStatus.acceleration_warning %>
<% } %> <% tuningHints.forEach((hint) => { %>
<%= hint %>
<% }) %> <% if (hardware.runtime_selection.fallback_to_cpu) { %>
<%= hardware.runtime_selection.reason %>
<% } %>

Runtime diagnostics

Latest plugin-local runtime failure and remediation details.

Download diagnostics
<% if (latestDiagnostic) { %>
<%= latestDiagnostic.code %>: <%= latestDiagnostic.message %>

<%= latestDiagnostic.category %> / <%= latestDiagnostic.severity %>

<% if (latestDiagnostic.remediation_steps?.length) { %>
    <% latestDiagnostic.remediation_steps.forEach((step) => { %>
  1. <%= step %>
  2. <% }) %>
<% } %>
Raw diagnostic details
<%= JSON.stringify(latestDiagnostic, null, 2) %>
<% } else { %>

No runtime diagnostic has been recorded.

<% } %> <% if (hardware.network_path_warning) { %>
The plugin path may be a mapped or network-like location. A local disk path is more reliable for native runtime DLL loading.
<% } %> <% if (hardware.long_path_warning) { %>
The plugin path is unusually long for Windows native loading. Consider a shorter local installation path.
<% } %>

Storage cleanup

Plugin-local files only. Selected models and active runtimes are protected.

<%= formatBytes(storageUsage.total) %> total
<% Object.entries(storageUsage.categories).forEach(([category, bytes]) => { %>
<%= category.replace("_", " ") %><%= formatBytes(bytes) %>
<% }) %>

Assistant

Configuration remains admin-only. Visibility controls only the sidebar assistant.

Model, runtime, and GPU
<% if (!selectedModelInstalled) { %>
The currently selected model is not installed. Choose an installed model before saving.
<% } %>
<%= gpuAllocation.gpu_allocation_intent_percent %>% intent
CPU onlyMaximum safe: <%= gpuAllocation.gpu_allocation_max_safe_percent %>%Maximum GPU
Backend <%= String(gpuAllocation.backend).toUpperCase() %> Intended <%= gpuAllocation.gpu_allocation_intent_percent %>% Actual <%= gpuAllocation.gpu_allocation_actual_percent %>% Managed model VRAM <%= formatBytes(gpuAllocation.managed_model_vram_mb * 1048576) %> Total VRAM <%= formatBytes(gpuAllocation.total_vram_mb * 1048576) %> Free VRAM <%= formatBytes(gpuAllocation.free_vram_mb * 1048576) %> External VRAM <%= formatBytes(gpuAllocation.external_vram_estimate_mb * 1048576) %>

><%= gpuAllocation.warning || "" %>

Shows Continue waiting controls without stopping the job.
Normal assistant requests use the class budgets below.
Lightweight request gate
Use the smallest downloaded model that can reliably return JSON classifications.
Timeout or errors immediately escalate to the main model.
The gate cannot execute tools. Permission-sensitive, ambiguous, complex, low-confidence, and repeated requests are sent to the main model.
Assistant visibility and diagnostics
Sidebar visibility
Assistant pill: <%= visibilityDiagnostics.available ? "Backend ready" : "Hidden" %> <%= assistantReason %>
<% visibilityDiagnostics.conditions.forEach((condition) => { %>
<%= condition.key.replaceAll("_", " ") %> <%= condition.passed ? "Pass" : "Fail" %>
<% }) %>
User ID <%= visibilityDiagnostics.permission.debug_details.resolved_user_id || "None" %> Role <%= visibilityDiagnostics.permission.normalized_role %> Role source <%= visibilityDiagnostics.permission.debug_details.role_source %> Allowed roles <%= visibilityDiagnostics.permission.debug_details.allowed_roles.join(", ") || "None" %> Origin <%= visibilityDiagnostics.permission.debug_details.origin %> Endpoint status <%= panelDiagnostics.panel_endpoint_status || "Not requested" %> HTML length <%= panelDiagnostics.panel_html_length || 0 %> Template <%= panelDiagnostics.panel_template_path %> Missing locals <%= panelDiagnostics.missing_locals.length ? panelDiagnostics.missing_locals.join(", ") : "None" %> HTML error <%= panelDiagnostics.panel_html_error || "None" %> Mount error <%= panelDiagnostics.mount_error || "None" %>
Improvement Center
Trusted moderators can verify reviews. Only administrators can approve, edit, delete, promote, export, or run evals.
Assistant identity and scope
Lumi AssistantBuilt-in AI assistant for Lumi. This identity is fixed and cannot be replaced by model branding.
Style guidance for the final answer. It does not limit prompt context or reasoning.
Controls model output capacity, not prompt or context length.
Hard scope, role, tool, and confirmation rules cannot be overridden.
Hard scope
    <% hardRules.forEach((rule) => { %>
  • <%= rule %>
  • <% }) %>
Platform commands
Platforms <% Object.entries(config.commands.platforms).forEach(([platform, enabled]) => { %> <% }) %>
Roles
Rate limits
<% [ ["limit_role_admin","Administrator role",config.rate_limits.roles.admin], ["limit_role_mod","Moderator role",config.rate_limits.roles.mod], ["limit_role_user","User role",config.rate_limits.roles.user], ["limit_platform_webui","WebUI",config.rate_limits.platforms.webui], ["limit_platform_discord","Discord",config.rate_limits.platforms.discord], ["limit_platform_twitch","Twitch",config.rate_limits.platforms.twitch], ["limit_platform_youtube","YouTube",config.rate_limits.platforms.youtube], ["limit_platform_kick","Kick",config.rate_limits.platforms.kick], ["limit_platform_other","Other platform",config.rate_limits.platforms.other], ["limit_user","Per user",config.rate_limits.per_user], ["limit_channel","Per channel/server",config.rate_limits.per_channel] ].forEach(([key,label,limit]) => { %>
<%= label %>
<% }) %>
Support diagnostics and logging
Support diagnostics
Logging <% [["log_prompts","Prompts"],["log_responses","Responses"],["log_tool_calls","Tool calls"],["log_metrics","Metrics"],["log_internal_audit","Internal audit"]].forEach(([key,label]) => { %> <% }) %>

User AI access

Bans and timeouts apply to WebUI and platform commands.

<% activeAiRestrictions.forEach((entry) => { %>"><% }) %> <% if (!activeAiRestrictions.length) { %><% } %>
UserRestrictionUntilReason
<%= entry.user_id %><%= entry.banned ? "Banned" : "Timed out" %><%= entry.timeout_until ? formatDate(entry.timeout_until) : "-" %><%= entry.reason || "-" %>
No active AI restrictions.
Previous Page <%= accessPage.page %> of <%= accessPage.pages %> (<%= accessPage.total %> entries) Next
Recent rate-limit denials
<% recentRateLimitDenials.forEach((entry) => { %><% }) %> <% if (!recentRateLimitDenials.length) { %><% } %>
TimeUserPlatformBucketRetry
<%= formatDate(entry.at) %><%= entry.user_id %><%= entry.platform %><%= entry.bucket %><%= entry.retry_after_seconds %>s
No recent rate-limit denials.

Repository support index

Local Lumi routes, settings pages, plugin manifests, commands, and documentation.

Status<%= repoIndexStatus.present ? repoIndexStatus.stale ? "Stale" : "Ready" : "Missing" %>
Last indexed<%= repoIndexStatus.indexed_at ? formatDate(repoIndexStatus.indexed_at) : "Never" %>
Commit<%= repoIndexStatus.commit ? repoIndexStatus.commit.slice(0, 12) : "Unavailable" %>
Routes<%= repoIndexStatus.route_count %>
Plugins<%= repoIndexStatus.plugin_count %>
Commands<%= repoIndexStatus.command_count %>

Test console

Run a request as a simulated role without changing the logged-in actor.

Tools are disabled for this test unless explicitly enabled above.

Metrics

Plugin-local operational counters and recent requests.

Requests<%= metrics.total_requests %>
Successful<%= metrics.successful %>
Failed<%= metrics.failed %>
Refused<%= metrics.refusals %>
Gate decisions<%= metrics.gate_decisions || 0 %>
Average<%= formatDuration(metrics.average_response_ms) %>
Median<%= formatDuration(metrics.median_response_ms) %>
Avg gate<%= formatDuration(metrics.average_stage_ms?.gate_ms || 0) %>
Avg main generation<%= formatDuration(metrics.average_stage_ms?.main_generate_ms || 0) %>
Current and recent assistant jobs
<% jobDiagnostics.forEach((job) => { %><% }) %> <% if (!jobDiagnostics.length) { %><% } %>
CreatedState / stageClass / budgetElapsedGateQueuePrompt evalGenerationTokensSpeedRuntimeUI timeout
<%= formatDate(job.created_at) %><%= job.state %> / <%= job.stage %><%= job.details.route_class || "-" %> / <%= job.details.max_output_tokens_used || job.details.max_output_tokens || "-" %><%= formatDuration(job.elapsed_ms) %><%= formatDuration(job.details.gate_ms) %><%= formatDuration(job.details.queue_ms) %><%= formatDuration(job.details.prompt_eval_ms) %><%= formatDuration(job.details.generation_ms) %><%= job.details.prompt_tokens || 0 %> / <%= job.details.generated_tokens || 0 %><%= job.details.prompt_tps || 0 %> / <%= job.details.generation_tps || 0 %> tok/s<%= job.details.backend || "-" %>, <%= job.details.gpu_layers || 0 %> layers, ctx <%= job.details.context_size || "-" %><%= job.frontend_soft_timeout_at ? (job.still_running ? "Still running" : "Recorded") : "No" %>
No assistant jobs recorded since this plugin process started.
Recent slow and 504-risk requests
<% slowRequestsPage.entries.forEach((entry) => { %><% }) %> <% if (!slowRequestsPage.entries.length) { %><% } %>
TimeRoute / classReason / budgetGateQueuePrompt evalGenerationTokensSpeedTotalRisk
<%= entry.timestamp %><%= entry.route_used || "-" %> / <%= entry.route_class || "-" %><%= entry.reason_code || "-" %> / max <%= entry.max_output_tokens_used || "-" %><%= formatDuration(entry.gate_ms) %><%= formatDuration(entry.queue_ms) %><%= formatDuration(entry.prompt_eval_ms) %><%= formatDuration(entry.generation_ms) %><%= entry.prompt_tokens || 0 %> / <%= entry.generated_tokens || 0 %><%= entry.prompt_tps || 0 %> / <%= entry.generation_tps || 0 %> tok/s<%= formatDuration(entry.total_ms) %><%= entry.frontend_soft_timeout ? "UI waited" : entry.risk_504 ? "504 risk" : "Slow" %>
No requests over 30 seconds.
Previous slow requests Page <%= slowRequestsPage.page %> of <%= slowRequestsPage.pages %> (<%= slowRequestsPage.total %> slow requests) Next slow requests
<% history.forEach((entry) => { %><% }) %> <% if (!history.length) { %><% } %>
TimeKindStatusRouteConfidence / reasonRoleGenerated / final / deliveredDuration
<%= entry.timestamp %><%= entry.kind %><%= entry.status %><%= entry.route_used || "-" %><%= entry.confidence ?? entry.gate_confidence ?? "-" %> / <%= entry.reason_code || entry.gate_reason_code || "-" %><%= entry.role || "-" %><%= entry.internal_generated_length ?? "-" %> / <%= entry.final_reply_length ?? entry.original_final_length ?? "-" %> / <%= entry.delivered_length ?? "-" %><%= formatDuration(entry.duration_ms) %>
No requests recorded.
Previous Page <%= metricsPage.page %> of <%= metricsPage.pages %> (<%= metricsPage.total %> entries) Next

Runtime logs

Open a tail view without loading entire large files.

<% logFiles.forEach((file) => { %> <% }) %> <% if (!logFiles.length) { %><% } %>
FilenameSizeModifiedActions
<%= file.name %> <%= formatBytes(file.size) %> <%= formatDate(file.modified_at) %> View Download
No runtime logs found.
Previous Page <%= logPage.page %> of <%= logPage.pages %> (<%= logPage.total %> logs) Next

Privacy and troubleshooting

Local inference remains on this host.

Models are downloaded from pinned Hugging Face revisions. The managed runtime is downloaded from the official llama.cpp release and verified by SHA-256. No cloud inference is used. Prompt and response logging are off by default.

If startup fails, confirm that the runtime and selected model show as installed, the plugin directory is writable, and enough RAM and disk are available. Runtime logs are stored under plugins/lumi_ai/data/logs/.

<%- include("tool-modal") %> <%- include("../../../src/web/views/partials/layout-bottom") %>