History

Franz Rolfsvaag c8208b78b7 Add self-contained Lumi web search		2026-06-14 05:01:13 +02:00
..
backend	Add self-contained Lumi web search	2026-06-14 05:01:13 +02:00
data	Add self-contained Lumi web search	2026-06-14 05:01:13 +02:00
public	Add self-contained Lumi web search	2026-06-14 05:01:13 +02:00
tests	Add self-contained Lumi web search	2026-06-14 05:01:13 +02:00
views	Add self-contained Lumi web search	2026-06-14 05:01:13 +02:00
index.js	Add self-contained Lumi web search	2026-06-14 05:01:13 +02:00
readme.md	Add self-contained Lumi web search	2026-06-14 05:01:13 +02:00
tool_info.json	Add self-contained Lumi web search	2026-06-14 05:01:13 +02:00

readme.md

Lumi AI Web Search

lumi_ai_web_search gives Lumi Assistant controlled public web search and safe URL reading without requiring an API key, manually installed search service, or separate provider setup.

Default behavior

The default provider is lumi_search_broker. It lives entirely inside this plugin and uses lightweight public search endpoints through replaceable adapters:

DuckDuckGo HTML search
Bing RSS fallback

Adapters return a stable internal result shape. If one source times out, rate-limits requests, blocks automated access, or changes markup, the broker records the adapter error and tries the next source. Optional external JSON providers remain available as an advanced mode.

Fresh installations default to:

Search, explicit URL fetch, and URL summarization enabled
Blacklist policy with no custom blocks
WebUI, Discord, and Twitch origins allowed
No headless browser
No external endpoint or API key

The parent Lumi AI tool enable state is preserved. Existing installations that were explicitly disabled remain disabled.

Registered tools

The plugin registers enabled capabilities independently:

web_search.search: discover current or external public information
web_search.fetch_url: safely read an explicit public URL
web_search.summarize_url: safely extract compact content from an explicit public URL for summarization

Disabling search does not disable explicit URL fetch or summarization. Lumi AI prompt diagnostics show each capability as registered/exposed or hidden with its reason.

Trusted ctx, actor, role, origin, channel, and server details are supplied by Lumi at execution time. The model cannot provide or override them.

When Lumi should search

Lumi Assistant is instructed to search for current, recent, niche, externally verifiable, or likely outdated facts. Search is also appropriate when a user asks to verify, confirm, look up, cite, find the latest, compare current options, or inspect public third-party information.

Lumi-local routes, plugin data, corrections, and help answers continue to use verified Lumi context first. Casual chat, creative writing, rewriting, translation, and formatting do not trigger search unless current factual support is needed. An explicit request not to search is respected.

Safe URL fetching

Every search result, explicit URL, selected page, and redirect is checked before use. The fetcher:

Allows only HTTP and HTTPS
Rejects URL credentials
Resolves DNS before connecting and pins the request to a verified public address
Blocks localhost, loopback, private, carrier-grade NAT, link-local, multicast, reserved, and metadata targets
Rechecks policy after every redirect
Limits redirects, time, compressed bytes, decompressed bytes, and extracted characters
Accepts readable HTML, plain text, XML, RSS, and Atom content only
Never executes JavaScript

Security blocks override administrator whitelist rules.

URL policy

Blacklist mode allows safe public URLs except matching rules. Whitelist mode allows only matching rules.

Rules may be:

Domain: docs.example.com
Domain including subdomains: example.com
Wildcard subdomain: *.example.com
Path prefix: example.com/docs
Full pattern: https://*.example.com/resources/*

Tracking parameters such as utm_*, fbclid, and gclid are removed from normalized search results where safe.

Extraction and result processing

The HTML extractor removes scripts, styles, navigation, footers, forms, hidden content, and other non-readable elements. It prefers <main> or <article>, then falls back to headings, metadata, and body text.

Normal tool results never include raw HTML. Structured results contain:

Status, query, reason, provider, policy mode, cache state, timing, counts, warnings, and errors
Normalized title, permitted URL, domain, snippet, date, rank, source, source ID, relevance score, and policy state
Fetched page URL, final URL, title, description, bounded readable text, content type, fetch time, and extraction state

Lumi AI passes this structured result back to the model to produce the final natural answer. Normal users do not see raw tool JSON.

Settings

Open Plugins -> Lumi AI -> Tools -> Lumi AI Web Search -> Settings.

The tool-owned settings panel shows:

Provider and provider health
Search/fetch/summarize capability states
Policy mode and allowed origins
Last successful request and last error
Cache count and size
Recent redacted calls
A test field that uses Lumi Assistant's normal tool pipeline

Settings include capability toggles, URL policy, timeouts, byte/text limits, redirects, cache TTL, safe-search level, origins, source-link controls, per-origin output budgets, three rate-limit scopes, and optional external provider fields. Reset to defaults restores the no-key broker configuration.

Output limits

Results are condensed before returning to Lumi AI:

WebUI may use richer context and multiple sources
Discord receives compact context
Twitch receives a very short result with at most one source reference when enabled

The final Lumi response formatter still applies the platform's authoritative message limit.

Caching, limits, and diagnostics

Cache entries are stored in data/cache/ and expire according to the configured TTL. Rate limits apply independently per actor, origin, and server/channel. Rate-limited results include a retry-after value.

data/status.json stores provider health, aggregate counters, cache status, and recent redacted calls. data/audit.jsonl records actor, role, origin, capability, safe query summary/hash, reason, provider, policy decision, result count, cache state, timing, blocked reason, and status. Full page content and secrets are not logged.

Optional external provider

Select external_json only when an administrator explicitly wants a compatible SearxNG or generic JSON endpoint. External provider settings are advanced and are not required for default operation. Selecting external mode without an endpoint blocks only search discovery; explicit URL tools remain available.

Browser and sidecar modes

Headless browser fallback is disabled and not implemented as a default path. The setting is reserved for a future restricted Lumi-managed runtime. A future local sidecar can be added behind the provider abstraction without changing tool contracts; default operation will continue to work without it.

Troubleshooting

Search unavailable but URL fetch works: a public search adapter may be blocked or rate-limited. Check provider health and recent adapter errors.
URL blocked: inspect blacklist/whitelist rules. Hard private-network blocks cannot be overridden.
External provider not configured: select lumi_search_broker or configure the optional endpoint.
Rate limited: wait for the returned retry-after interval.
No readable content: the page may be JavaScript-only or use an unsupported content type. Browser fallback is intentionally off.

Verification

Run:

node plugins/lumi_ai_web_search/tests/verify.js

The suite covers default no-key availability, three-capability registration, prompt exposure compatibility, broker adapters, URL policy, redirects, private-network blocking, readable extraction, output budgets, caching, rate limits, audits, and clean failure behavior.