Lumi/plugins/lumi_ai_web_search
2026-06-14 05:01:13 +02:00
..
backend Add self-contained Lumi web search 2026-06-14 05:01:13 +02:00
data Add self-contained Lumi web search 2026-06-14 05:01:13 +02:00
public Add self-contained Lumi web search 2026-06-14 05:01:13 +02:00
tests Add self-contained Lumi web search 2026-06-14 05:01:13 +02:00
views Add self-contained Lumi web search 2026-06-14 05:01:13 +02:00
index.js Add self-contained Lumi web search 2026-06-14 05:01:13 +02:00
readme.md Add self-contained Lumi web search 2026-06-14 05:01:13 +02:00
tool_info.json Add self-contained Lumi web search 2026-06-14 05:01:13 +02:00

Lumi AI Web Search

lumi_ai_web_search gives Lumi Assistant controlled public web search and safe URL reading without requiring an API key, manually installed search service, or separate provider setup.

Default behavior

The default provider is lumi_search_broker. It lives entirely inside this plugin and uses lightweight public search endpoints through replaceable adapters:

  • DuckDuckGo HTML search
  • Bing RSS fallback

Adapters return a stable internal result shape. If one source times out, rate-limits requests, blocks automated access, or changes markup, the broker records the adapter error and tries the next source. Optional external JSON providers remain available as an advanced mode.

Fresh installations default to:

  • Search, explicit URL fetch, and URL summarization enabled
  • Blacklist policy with no custom blocks
  • WebUI, Discord, and Twitch origins allowed
  • No headless browser
  • No external endpoint or API key

The parent Lumi AI tool enable state is preserved. Existing installations that were explicitly disabled remain disabled.

Registered tools

The plugin registers enabled capabilities independently:

  • web_search.search: discover current or external public information
  • web_search.fetch_url: safely read an explicit public URL
  • web_search.summarize_url: safely extract compact content from an explicit public URL for summarization

Disabling search does not disable explicit URL fetch or summarization. Lumi AI prompt diagnostics show each capability as registered/exposed or hidden with its reason.

Trusted ctx, actor, role, origin, channel, and server details are supplied by Lumi at execution time. The model cannot provide or override them.

Lumi Assistant is instructed to search for current, recent, niche, externally verifiable, or likely outdated facts. Search is also appropriate when a user asks to verify, confirm, look up, cite, find the latest, compare current options, or inspect public third-party information.

Lumi-local routes, plugin data, corrections, and help answers continue to use verified Lumi context first. Casual chat, creative writing, rewriting, translation, and formatting do not trigger search unless current factual support is needed. An explicit request not to search is respected.

Safe URL fetching

Every search result, explicit URL, selected page, and redirect is checked before use. The fetcher:

  • Allows only HTTP and HTTPS
  • Rejects URL credentials
  • Resolves DNS before connecting and pins the request to a verified public address
  • Blocks localhost, loopback, private, carrier-grade NAT, link-local, multicast, reserved, and metadata targets
  • Rechecks policy after every redirect
  • Limits redirects, time, compressed bytes, decompressed bytes, and extracted characters
  • Accepts readable HTML, plain text, XML, RSS, and Atom content only
  • Never executes JavaScript

Security blocks override administrator whitelist rules.

URL policy

Blacklist mode allows safe public URLs except matching rules. Whitelist mode allows only matching rules.

Rules may be:

  • Domain: docs.example.com
  • Domain including subdomains: example.com
  • Wildcard subdomain: *.example.com
  • Path prefix: example.com/docs
  • Full pattern: https://*.example.com/resources/*

Tracking parameters such as utm_*, fbclid, and gclid are removed from normalized search results where safe.

Extraction and result processing

The HTML extractor removes scripts, styles, navigation, footers, forms, hidden content, and other non-readable elements. It prefers <main> or <article>, then falls back to headings, metadata, and body text.

Normal tool results never include raw HTML. Structured results contain:

  • Status, query, reason, provider, policy mode, cache state, timing, counts, warnings, and errors
  • Normalized title, permitted URL, domain, snippet, date, rank, source, source ID, relevance score, and policy state
  • Fetched page URL, final URL, title, description, bounded readable text, content type, fetch time, and extraction state

Lumi AI passes this structured result back to the model to produce the final natural answer. Normal users do not see raw tool JSON.

Settings

Open Plugins -> Lumi AI -> Tools -> Lumi AI Web Search -> Settings.

The tool-owned settings panel shows:

  • Provider and provider health
  • Search/fetch/summarize capability states
  • Policy mode and allowed origins
  • Last successful request and last error
  • Cache count and size
  • Recent redacted calls
  • A test field that uses Lumi Assistant's normal tool pipeline

Settings include capability toggles, URL policy, timeouts, byte/text limits, redirects, cache TTL, safe-search level, origins, source-link controls, per-origin output budgets, three rate-limit scopes, and optional external provider fields. Reset to defaults restores the no-key broker configuration.

Output limits

Results are condensed before returning to Lumi AI:

  • WebUI may use richer context and multiple sources
  • Discord receives compact context
  • Twitch receives a very short result with at most one source reference when enabled

The final Lumi response formatter still applies the platform's authoritative message limit.

Caching, limits, and diagnostics

Cache entries are stored in data/cache/ and expire according to the configured TTL. Rate limits apply independently per actor, origin, and server/channel. Rate-limited results include a retry-after value.

data/status.json stores provider health, aggregate counters, cache status, and recent redacted calls. data/audit.jsonl records actor, role, origin, capability, safe query summary/hash, reason, provider, policy decision, result count, cache state, timing, blocked reason, and status. Full page content and secrets are not logged.

Optional external provider

Select external_json only when an administrator explicitly wants a compatible SearxNG or generic JSON endpoint. External provider settings are advanced and are not required for default operation. Selecting external mode without an endpoint blocks only search discovery; explicit URL tools remain available.

Browser and sidecar modes

Headless browser fallback is disabled and not implemented as a default path. The setting is reserved for a future restricted Lumi-managed runtime. A future local sidecar can be added behind the provider abstraction without changing tool contracts; default operation will continue to work without it.

Troubleshooting

  • Search unavailable but URL fetch works: a public search adapter may be blocked or rate-limited. Check provider health and recent adapter errors.
  • URL blocked: inspect blacklist/whitelist rules. Hard private-network blocks cannot be overridden.
  • External provider not configured: select lumi_search_broker or configure the optional endpoint.
  • Rate limited: wait for the returned retry-after interval.
  • No readable content: the page may be JavaScript-only or use an unsupported content type. Browser fallback is intentionally off.

Verification

Run:

node plugins/lumi_ai_web_search/tests/verify.js

The suite covers default no-key availability, three-capability registration, prompt exposure compatibility, broker adapters, URL policy, redirects, private-network blocking, readable extraction, output budgets, caching, rate limits, audits, and clean failure behavior.