Lumi/plugins/lumi_ai/README.md

# Lumi AI

`lumi_ai` is a standalone Lumi plugin that manages a local `llama.cpp` inference process and adds a scoped AI Assistant to the WebUI.

## Install and configure

1. Place this directory at `plugins/lumi_ai/`.
2. Restart Lumi.
3. Open **Plugins -> Lumi AI** in the sidebar.
4. Download the managed runtime and a compatible model.
5. Select the model, configure visibility and instructions, then save.
6. Start the runtime and enable AI.

The settings page is always registered as an admin-only item in the `Plugins` sidebar section. The assistant pill is injected separately above the profile footer and follows the configured admin, moderator, and user visibility controls.

## Storage

Every writable path is confined to `plugins/lumi_ai/data/`:

- `config/`: settings and runtime state
- `models/`: verified GGUF models
- `runtime/`: extracted `llama.cpp` runtime
- `logs/`: runtime logs
- `metrics/`: usage and audit records
- `rag/`, `cache/`, `tmp/`: plugin-local working data

Downloads are written to `data/tmp/`, verified against a pinned SHA-256 digest, and only then moved or extracted into their final plugin-local directory.

## Runtime and downloads

Models use pinned Hugging Face repository commits. The runtime uses a pinned official `ggml-org/llama.cpp` GitHub release because the llama.cpp project does not publish authoritative multi-platform runtime archives on Hugging Face. This is the only download-source exception; the archive URL, version, size, and SHA-256 are pinned in `runtime_manifest.json`.

The runtime binds only to `127.0.0.1` on an ephemeral port. It is never exposed on `0.0.0.0`.

Before loading a model, Lumi AI runs `llama-server --help` as a smoke test. Failed launches and exits are decoded into plugin-local diagnostics, including Windows NTSTATUS values such as `0xC0000005 / STATUS_ACCESS_VIOLATION`. The admin page provides remediation steps, raw stdout/stderr tails, model verification, and a redacted diagnostics bundle.

Model tiers are capability-based: Tiny, Small, Medium, Large, General, Power, and Extreme. Lumi AI detects compatible GPUs and selects a pinned Vulkan or Metal runtime when available, with CPU as the fallback. The GPU Acceleration setting maps a percentage to the model's supported offload layers and automatically limits the selectable range using model size, context size, and available VRAM.

GPU allocation stores administrator intent separately from the live allocation. Managed runtime VRAM is treated as Lumi-owned usage, while external VRAM pressure can clamp the actual allocation without overwriting the saved intent. The admin page also provides safe model deletion, category-based storage cleanup, paged metrics, structured runtime logs, and assistant-pill visibility diagnostics.

Lumi Assistant uses an immutable identity and safety policy, with administrator-configurable support topics, domains, style, links, clarification behavior, answer length, and role-specific overrides. A plugin-local repository index under `data/repo_index/` provides verified Lumi WebUI routes and support context. The admin visibility debugger reports backend eligibility and frontend slot, loader, response, and mount conditions.

Assistant replies normalize verified Lumi routes into safe links for the active WebUI host. Repository paths and implementation details are restricted to administrators; moderator code help is separately opt-in. Reply length limits are applied after inference to delivered text rather than prompt or retrieved context.

The WebUI assistant keeps a bounded per-user conversation and panel preference in browser storage so navigation does not discard the active chat. The panel opens at one-sixth of the viewport by default, supports vertical resizing, and restores its previous height and open state. Assistant Markdown is rendered as safe DOM content, including fenced code blocks with exact-copy controls.

The `!assistant` command and its default `!lumi` alias use the same provider, identity, scope, access restrictions, and rate limits as the WebUI assistant. Replies are normalized for the originating platform. AI bans, timeouts, and recent rate-limit denials are managed from the plugin settings page and stored under `data/config/`.

The access-control picker searches known Lumi profiles and linked platform identities. Runtime log listings and metrics use 25-row pages, while individual log views read only the latest bounded chunk.

Assistant panel rendering is validated before the core reports the panel as available. Template, locals, endpoint, HTML, and mount diagnostics are available to administrators, with failures recorded in `data/logs/assistant-panel.log`.

The test console no longer exposes a user-editable scope label. Clearly unrelated requests are rejected deterministically, while ambiguous requests are passed to the scoped Lumi system prompt instead of being rejected by a fixed keyword list.

## Plugin API

Other Lumi plugins can use:

```js
const ai = global.lumiFrameworks?.ai;
const health = await ai.health();
const result = await ai.generate({
  message: "Summarize this Lumi event.",
  user: requestingUser,
  sessionId: requestSessionId,
  scope: "my_plugin"
});
```

Available functions:

- `generate`
- `classify`
- `summarize`
- `route_tool`
- `health`
- `capabilities`
- `metrics_summary`
- `registerContext`
- `unregisterContext`
- `registerTool`

AI tools must provide an owning plugin, a synchronous permission check, a fixed argument schema, and an established workflow handler. Model output cannot execute SQL, shell commands, file operations, or arbitrary URLs.

## Tool registration

```js
ai.registerTool({
  tool_id: "example.action",
  display_name: "Example action",
  description: "Runs an existing plugin workflow.",
  owning_plugin: "example",
  required_role: "user",
  required_permission: "example.action.self",
  permission_check: ({ user, arguments: args }) => canRunWorkflow(user, args),
  schema: { target: "string", amount: "integer" },
  confirmation_required: true,
  risk_level: "sensitive",
  audit_category: "example",
  workflow_handler: ({ arguments: args, user, initiated_via_ai, ai_request_id }) =>
    existingWorkflow({ ...args, actor: user, initiated_via_ai, ai_request_id })
});
```

## Verification

Run:

```powershell
node plugins/lumi_ai/tests/verify.js
```

The verification covers path confinement, size formatting, GPU intent and actual allocation, pagination, model and log deletion safety, assistant role access, tool schema and permission checks, queue limits, refusal behavior, and runtime resume persistence.