92 lines
4.6 KiB
Markdown
92 lines
4.6 KiB
Markdown
# Lumi AI Web Search
|
|
|
|
`lumi_ai_web_search` is an AI tool plugin for controlled current-information lookup. It is loaded only by Lumi AI's tool manager and is not an ordinary core plugin.
|
|
|
|
## Installation and enablement
|
|
|
|
1. Install this directory as `plugins/lumi_ai_web_search/`.
|
|
2. Install Lumi AI `0.7.1` or newer.
|
|
3. Open **Plugins -> Lumi AI -> Tools**.
|
|
4. Select **Settings** for Lumi AI Web Search.
|
|
5. Configure the provider and URL policy, turn on **Web search enabled**, and save.
|
|
6. Select **Enable** in the Tools list.
|
|
|
|
The tool is not registered with the assistant while disabled. If its internal enabled setting or provider endpoint is missing, Lumi AI marks it unavailable without preventing Lumi from starting.
|
|
|
|
## Provider
|
|
|
|
The initial adapters accept JSON from a configured public endpoint:
|
|
|
|
- `searxng_json` reads the SearxNG `results` array.
|
|
- `generic_json` reads `results`, `items`, or `web.results.value`.
|
|
|
|
The configured query parameter defaults to `q`. The adapter adds `format=json`, safe-search level, and result count. API keys are stored in `data/settings.json` with restricted file permissions where supported. The settings API never returns the key and a blank save keeps the existing secret.
|
|
|
|
Provider requests use a strict timeout, a 2 MiB response limit, and at most three redirects. No page JavaScript is executed.
|
|
|
|
## URL policy
|
|
|
|
The default is an empty whitelist, so no result URL is usable until an administrator adds explicit rules. Rules support:
|
|
|
|
- Domain: `docs.example.com`
|
|
- Domain and subdomains: `example.com`
|
|
- Subdomain wildcard: `*.example.com`
|
|
- Path prefix: `example.com/docs`
|
|
- Full wildcard pattern: `https://*.example.com/resources/*`
|
|
|
|
Whitelist mode permits only matching result, page, and redirect URLs. Blacklist mode permits public URLs except matching rules.
|
|
|
|
Independent hard network rules always block:
|
|
|
|
- `localhost`, `.localhost`, `.local`, and known metadata hostnames
|
|
- Private, loopback, carrier-grade NAT, link-local, multicast, and reserved IP ranges
|
|
- DNS names resolving to private or otherwise unsafe addresses
|
|
- URL credentials
|
|
- Non-HTTP/HTTPS protocols
|
|
|
|
The same checks run before each page fetch and after every redirect. Administrator rules cannot override these blocks.
|
|
|
|
## Tool behavior
|
|
|
|
The registered tool ID is `lumi_ai_web_search.search`. It accepts:
|
|
|
|
- `query`
|
|
- `reason`: `fact_lookup`, `resource_lookup`, `troubleshooting`, `documentation_lookup`, `news_or_recent`, or `general_lookup`
|
|
- Optional `requested_depth`: `search` or `full_page`
|
|
- Optional `freshness`
|
|
|
|
The assistant should use this tool only for current or external information that is not available in verified local Lumi context.
|
|
|
|
Results are sanitized and returned as structured data rather than raw provider JSON. Each result contains a title, permitted URL or no URL when links are disabled, domain, condensed snippet, source type, date, and relevance score. Documentation and troubleshooting searches prioritize authoritative sources; recent searches prioritize dated sources.
|
|
|
|
Optional full-page mode extracts bounded visible text only when the administrator enables it. It does not automate a browser, submit forms, execute scripts, or follow unrestricted links.
|
|
|
|
## Origin limits and rate limits
|
|
|
|
Allowed origins and output budgets are independently configurable for WebUI, Discord, Twitch, YouTube, Kick, and other sources. Trusted runtime context determines the origin; a model-provided origin cannot elevate access.
|
|
|
|
Twitch is limited to compact output and at most two source references. Discord permits moderate detail. WebUI permits richer summaries and more results. The tool also applies a per-actor, per-origin, per-server/channel rolling request limit.
|
|
|
|
## Auditing and storage
|
|
|
|
All writable data remains under this plugin:
|
|
|
|
- `data/settings.json`: normalized settings and provider secret
|
|
- `data/audit.jsonl`: query, reason, actor, origin, server/channel, policy outcome, result count, cache status, and timing
|
|
|
|
Provider credentials are not written to audit records or returned in tool results. Updates preserve `data/` by default.
|
|
|
|
## Security boundary
|
|
|
|
The plugin has no shell, SQL, arbitrary filesystem, browser automation, or code-execution feature. Network access is limited to the configured public search provider and policy-approved public result pages. Lumi AI's backend role and permission checks remain authoritative.
|
|
|
|
## Verification
|
|
|
|
Run:
|
|
|
|
```powershell
|
|
node plugins/lumi_ai_web_search/tests/verify.js
|
|
```
|
|
|
|
The suite covers whitelist/blacklist matching, hard private-network blocks, redirect checks, reason-aware formatting, origin budgets, provider failures, settings effects, registration availability, and audits.
|