94 lines
4.9 KiB
Markdown
94 lines
4.9 KiB
Markdown
# Lumi AI Web Search
|
|
|
|
`lumi_ai_web_search` is an AI tool plugin for controlled current-information lookup. It is loaded only by Lumi AI's tool manager and is not an ordinary core plugin.
|
|
|
|
## Installation and enablement
|
|
|
|
1. Install this directory as `plugins/lumi_ai_web_search/`.
|
|
2. Install Lumi AI `0.8.0` or newer.
|
|
3. Open **Plugins -> Lumi AI -> Tools**.
|
|
4. Select **Settings** for Lumi AI Web Search.
|
|
5. Configure the provider and URL policy, turn on **Web search enabled**, and save.
|
|
6. Select **Enable** in the Tools list.
|
|
|
|
The tool is not registered with the assistant while disabled. If its internal enabled setting or provider endpoint is missing, Lumi AI marks it unavailable without preventing Lumi from starting.
|
|
|
|
## Provider
|
|
|
|
The initial adapters accept JSON from a configured public endpoint:
|
|
|
|
- `searxng_json` reads the SearxNG `results` array.
|
|
- `generic_json` reads `results`, `items`, or `web.results.value`.
|
|
|
|
The configured query parameter defaults to `q`. The adapter adds `format=json`, safe-search level, and result count. API keys are stored in `data/settings.json` with restricted file permissions where supported. The settings API never returns the key and a blank save keeps the existing secret.
|
|
|
|
Provider requests use a strict timeout, a 2 MiB response limit, and at most three redirects. No page JavaScript is executed.
|
|
|
|
## URL policy
|
|
|
|
The default is an empty whitelist, so no result URL is usable until an administrator adds explicit rules. Rules support:
|
|
|
|
- Domain: `docs.example.com`
|
|
- Domain and subdomains: `example.com`
|
|
- Subdomain wildcard: `*.example.com`
|
|
- Path prefix: `example.com/docs`
|
|
- Full wildcard pattern: `https://*.example.com/resources/*`
|
|
|
|
Whitelist mode permits only matching result, page, and redirect URLs. Blacklist mode permits public URLs except matching rules.
|
|
|
|
Independent hard network rules always block:
|
|
|
|
- `localhost`, `.localhost`, `.local`, and known metadata hostnames
|
|
- Private, loopback, carrier-grade NAT, link-local, multicast, and reserved IP ranges
|
|
- DNS names resolving to private or otherwise unsafe addresses
|
|
- URL credentials
|
|
- Non-HTTP/HTTPS protocols
|
|
|
|
The same checks run before each page fetch and after every redirect. Administrator rules cannot override these blocks.
|
|
|
|
## Tool behavior
|
|
|
|
The registered tool ID is `lumi_ai_web_search.search`. It accepts:
|
|
|
|
- `query`
|
|
- `reason`: `fact_lookup`, `resource_lookup`, `troubleshooting`, `documentation_lookup`, `news_or_recent`, or `general_lookup`
|
|
- Optional `requested_depth`: `search` or `full_page`
|
|
- Optional `freshness`
|
|
|
|
The assistant should use this tool only for current or external information that is not available in verified local Lumi context.
|
|
|
|
The tool registers as a read-only lookup. This allows permitted Discord, Twitch, YouTube, Kick, and other contexts to use it even though those contexts cannot run WebUI action tools. Lumi AI evaluates the configured origin allowlist before including the tool in the model prompt and again before execution.
|
|
|
|
Results are sanitized and returned as structured data rather than raw provider JSON. Each result contains a title, permitted URL or no URL when links are disabled, domain, condensed snippet, source type, date, and relevance score. Documentation and troubleshooting searches prioritize authoritative sources; recent searches prioritize dated sources.
|
|
|
|
Optional full-page mode extracts bounded visible text only when the administrator enables it. It does not automate a browser, submit forms, execute scripts, or follow unrestricted links.
|
|
|
|
## Origin limits and rate limits
|
|
|
|
Allowed origins and output budgets are independently configurable for WebUI, Discord, Twitch, YouTube, Kick, and other sources. Trusted runtime context determines the origin; a model-provided origin cannot elevate access.
|
|
|
|
Twitch is limited to compact output and at most two source references. Discord permits moderate detail. WebUI permits richer summaries and more results. The tool also applies a per-actor, per-origin, per-server/channel rolling request limit.
|
|
|
|
## Auditing and storage
|
|
|
|
All writable data remains under this plugin:
|
|
|
|
- `data/settings.json`: normalized settings and provider secret
|
|
- `data/audit.jsonl`: query, reason, actor, origin, server/channel, policy outcome, result count, cache status, and timing
|
|
|
|
Provider credentials are not written to audit records or returned in tool results. Updates preserve `data/` by default.
|
|
|
|
## Security boundary
|
|
|
|
The plugin has no shell, SQL, arbitrary filesystem, browser automation, or code-execution feature. Network access is limited to the configured public search provider and policy-approved public result pages. Lumi AI's backend role and permission checks remain authoritative.
|
|
|
|
## Verification
|
|
|
|
Run:
|
|
|
|
```powershell
|
|
node plugins/lumi_ai_web_search/tests/verify.js
|
|
```
|
|
|
|
The suite covers whitelist/blacklist matching, hard private-network blocks, redirect checks, reason-aware formatting, origin budgets, provider failures, settings effects, registration availability, and audits.
|