# Ask AI — Conversational Analytics Interface

**Last updated**: 2026-03-23
**Author**: Joe Lanzone
**Status**: Living document

---

## Overview

Ask AI (`/query`) is the conversational interface to EXOS Analytics. It translates natural language questions into ClickHouse SQL, executes them against live operational data, auto-visualizes results, and generates AI-powered summaries with follow-up suggestions. Multi-turn conversations create directed analytical investigations.

**Core thesis**: Conversations are a better analytical medium than dashboards. Dashboards answer pre-formulated questions. Conversations follow the user's curiosity.

---

## Architecture

```
User Question (NL or SQL)
    |
    v
+---[ /query page (query.html) ]---+
|                                   |
|  Mode: Ask AI    Mode: SQL        |
|  (NL → LLM)     (direct exec)    |
+-----------------------------------+
    |                    |
    v                    v
/api/query            /api/sql
    |                    |
    +---- Validate ------+
    |     (read-only)    |
    v                    |
Azure Anthropic         |
(Claude Opus 4.6)       |
    |                    |
    v                    v
ClickHouse Cloud --------+
(Azure East US 2)
    |
    v
+---[ Results Pipeline ]---+
|                           |
|  Auto-chart detection     |
|  Performance badge        |
|  AI summary + follow-ups  |
|  Drill-down on click      |
|  EXPLAIN (optional)       |
|  Report export (3+ turns) |
+---------------------------+
```

### Data Flow

1. **NL Path**: User types question → `/api/query` sends to Claude Opus with 121-line domain system prompt → generates SQL → validates (read-only only) → executes on ClickHouse → second LLM call generates summary + 3 follow-up suggestions → auto-detects chart type → returns everything
2. **SQL Path**: User writes SQL directly → `/api/sql` validates → executes on ClickHouse → returns results
3. **Multi-turn**: Last 5 Q&A pairs sent as conversation context for subsequent queries

---

## Features

### Core (Existing)

| Feature | Description |
|---------|-------------|
| **NL-to-SQL** | Claude Opus 4.6 generates ClickHouse SQL from English questions |
| **Domain Injection** | 121-line system prompt with ServiceLink business model, 6 operator roles, 3 LOBs, 40+ span names, 25+ CH functions |
| **Multi-turn Conversation** | 5-turn context window. AI remembers previous questions for follow-up depth |
| **Auto-Visualization** | `detectChartType()` picks bar/line/pie/number from column types + row count |
| **AI Summaries** | Second LLM call generates 1-2 sentence insight + 3 follow-up suggestions |
| **Schema Browser** | Live sidebar showing all ClickHouse tables with expandable columns. Click to insert column reference |
| **SQL Autocomplete** | Tab-completion for table names, column names, and 25+ ClickHouse functions |
| **Fix with AI** | When SQL errors, click "Fix with AI" — sends broken SQL + error to LLM for correction |
| **Ask AI / SQL Toggle** | Mode switch. Ask AI for NL, SQL for direct queries. Both render identically |
| **History & Saved** | localStorage-backed query history + saved queries sidebar |
| **Share via URL** | `#sql=<base64>` hash for shareable query links |
| **CSV / JSON Export** | Download results in either format |
| **Global Filters** | LOB, Operator, Task filters inject context into NL queries |
| **Cross-Database** | Queries can span `default` (operations) and `fb_marketplace_watcher` (marketplace intelligence) |
| **Keyboard Shortcuts** | `Ctrl+Enter` run, `?` shortcuts help, `Esc` close panels |

### Tier 1: Demo-Ready (2026-03-23)

#### Performance Badge

Every query result displays a prominent performance badge:

```
38ms | 14.2M rows scanned | 374M rows/sec
```

- Dark slate gradient background with cyan value highlights
- Computed from `stats.rows_read` and `stats.elapsed` returned by ClickHouse
- Makes ClickHouse's speed visible — every query becomes a performance advertisement
- **Implementation**: `renderPerfBadge()` in query.html, ~30 lines JS. Zero backend changes.

#### Drill-Down on Click

Click any cell value in the results table to auto-generate a follow-up NL query:

| Cell Type | Drill-Down Action |
|-----------|------------------|
| Operator name | "Show me all activity for [name] in the last 7 days" |
| LOB value | "Break down [LOB] performance: operators, bottlenecks, error rates" |
| SpanName/task | "Analyze the [task]: duration, top operators, week-over-week comparison" |
| Date/DateTime | "Show hourly breakdown for [date]" |
| Numeric value | "What contributes to this [column] value? Break down by operator, LOB, time" |

- Creates "infinite depth" demo experience — ask one question, click to drill deeper
- **Implementation**: `drillDown()` in query.html + `drillable` CSS class on `<td>` elements. Zero backend changes.

#### Conversational EXPLAIN

"Why was this fast?" button appears next to the performance badge after every query.

Clicking it:
1. Sends `lastExecutedSQL` to new `/api/explain` endpoint
2. Runs `EXPLAIN PLAN actions=1, header=1, indexes=1, description=1` on ClickHouse
3. Sends EXPLAIN output to Claude Haiku with a ClickHouse-internals system prompt
4. Renders a 2-3 sentence plain-English explanation:

> "ClickHouse used the primary key (ServiceName, SpanName, toDate(Timestamp)) to skip 99.7% of the data. Only 42,000 of 18.2M rows were read from disk. The Map column SpanAttributes was accessed lazily — only the 'operator.id' key was loaded, not the entire map."

- Collapsible "Show raw EXPLAIN output" section below the summary
- **Implementation**: `functions/api/explain.js` (90 lines, zero deps) + `runExplain()` in query.html

#### Conversation-to-Report

After 3+ conversation turns, "Export Report" button appears in the conversation header.

Clicking it:
1. Captures chart image (if chart view is active) via `canvas.toDataURL()`
2. Builds conversation summary from all turns
3. Calls `/api/query` with a synthesis prompt to generate title, findings, methodology
4. Renders a styled HTML page in a new tab:
   - Branded header (Oswald + Roboto typography, blue accent)
   - Key findings (3-5 bullet points with left blue border)
   - Embedded chart visualization
   - Complete investigation trail (all turns with row counts)
   - Methodology note
   - Print-ready layout

- **Implementation**: `exportReport()` in query.html (~80 lines). Entirely client-side rendering.

### Discovery Cards (2026-03-23)

Replaced flat grey query chips with categorized discovery cards:

| Category | Color | Icon Theme | Queries |
|----------|-------|-----------|---------|
| Process Mining | Indigo (#6366f1) | Arrows, clocks, charts | 4 cards |
| Operators | Green (#059669) | Trophy, email, people | 3 cards |
| Anomalies | Amber (#d97706) | Flags, alerts, trends | 3 cards |
| Impact | Blue (#2563eb) | Dollar, laptop, list | 3 cards |
| Insights | Purple (#7c3aed) | Sparkle, lightning | 2 cards |

Each card has: category-colored icon, title, and 1-line description of what the query returns. Staggered fade-in animation on load.

### Live Stats Bar

Below the query input:
- Green pulse dot with "Connected" label
- Table count (from schema API)
- Column count (from schema API)

---

## API Endpoints

### POST /api/query

Natural language → SQL → execution → summary pipeline.

**Request:**
```json
{
  "question": "Which operators are fastest at wire confirmation?",
  "history": [
    { "question": "previous question", "sql": "SELECT ...", "rowCount": 42 }
  ],
  "workspace": "exos-orders",
  "database": "default"
}
```

**Response:**
```json
{
  "sql": "SELECT SpanAttributes['operator.name'] as operator, ...",
  "columns": [{ "name": "operator", "type": "String" }, ...],
  "rows": [{ "operator": "R. Martinez", ... }, ...],
  "stats": { "rows_read": 14200000, "elapsed": 0.038 },
  "summary": "Wire confirmation averages 2.3 days...",
  "suggestions": ["Which team is fastest?", "Show trend over time", "Compare to SLA"],
  "chartType": "bar"
}
```

### POST /api/explain

EXPLAIN PLAN with AI summary.

**Request:**
```json
{
  "sql": "SELECT ... FROM otel_traces WHERE ...",
  "summarize": true
}
```

**Response:**
```json
{
  "explain": "Expression\n  ReadFromMergeTree ...",
  "summary": "ClickHouse used the primary key to skip 99.7% of data..."
}
```

### POST /api/sql

Direct SQL execution (no LLM).

**Request:**
```json
{ "sql": "SELECT count() FROM otel_traces" }
```

### GET /api/schema

Returns all ClickHouse tables + columns for the schema browser.

---

## The 90-Second Demo

Choreographed walkthrough designed to make a ClickHouse engineer forward the URL:

| Time | Action | What Happens |
|------|--------|-------------|
| 0:05 | Type: "What's our biggest closing workflow bottleneck?" | — |
| 0:08 | AI generates windowFunnel SQL | Badge: **38ms / 14.2M rows / 374M rows/sec** |
| 0:12 | Bar chart auto-renders | AI summary: "Wire confirmation is the primary bottleneck" |
| 0:17 | Click follow-up: "Which operators are fastest?" | New results, new badge |
| 0:25 | **Click** operator name "R. Martinez" in table | Auto-drill: all activity for Martinez |
| 0:35 | Click **"Why was this fast?"** | EXPLAIN: "Primary key pruned 99.8% of data" |
| 0:45 | Click **"Export Report"** | New tab: styled executive briefing |
| 1:00 | Copy URL, send to VP | Done. Zero SQL. One shareable report. |

**What the ClickHouse engineer sees:**
1. `windowFunnel` used correctly in production (their advanced function)
2. Performance badge: sub-100ms on 14M rows (their speed story)
3. EXPLAIN in plain English (their architecture, made accessible)
4. $2K/yr replacing $500K Celonis (cost displacement)
5. Non-SQL user did complex process mining analysis (accessibility)

---

## System Prompt (Domain Knowledge)

The system prompt in `/api/query` includes 121 lines of ServiceLink domain context:

- **Tables**: otel_traces (primary), otel_logs, hyperdx_sessions, otel_metrics_*, fb_marketplace_watcher.listings
- **Business model**: 3 LOBs (valuations, title, closing), 6 operator roles
- **Span names**: 40+ operational events (portal.login → wire.confirmed)
- **SpanAttributes**: operator.id, operator.name, operator.role, lob, core.task, automation.candidate, automation.est_minutes_saved
- **ClickHouse functions**: windowFunnel, sequenceMatch, sequenceCount, countIf, quantile, topK, groupArray, etc.
- **Query patterns**: Operator email time, automation candidates, daily workflow, mailbox volume, LOB filtering

---

## Configuration

### Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `AZURE_ANTHROPIC_ENDPOINT` | Yes | Azure AI Foundry endpoint for Claude |
| `AZURE_AI_KEY` | Yes | API key for Azure AI |
| `CLICKHOUSE_HOST` | Yes | ClickHouse Cloud hostname |
| `CLICKHOUSE_USER` | No | ClickHouse user (default: `default`) |
| `CLICKHOUSE_PASSWORD` | Yes | ClickHouse password |

### Models Used

| Call | Model | Purpose |
|------|-------|---------|
| SQL generation | Claude Opus 4.6 | Accuracy-critical: NL → valid ClickHouse SQL |
| Summary + follow-ups | Claude Opus 4.6 | Insight quality matters for demo |
| EXPLAIN summary | Claude Haiku 4.5 | Cost-effective: structured input → structured output |

### Cost per Query

| Interaction | LLM Calls | Est. Cost |
|-------------|-----------|-----------|
| Basic query | 2 (SQL gen + summary) | ~$0.08 |
| With EXPLAIN | 3 (+ explain summary) | ~$0.10 |
| With report | 4 (+ synthesis) | ~$0.14 |

At 100 queries/day: $8-14/day, $240-420/month.

---

## File Map

| File | Lines | Responsibility |
|------|-------|---------------|
| `query.html` | ~2400 | Entire frontend: HTML + CSS + JS |
| `functions/api/query.js` | 370 | NL→SQL pipeline, zero deps |
| `functions/api/explain.js` | 90 | EXPLAIN PLAN + Haiku summary |
| `functions/api/sql.js` | ~60 | Direct SQL execution |
| `functions/api/schema.js` | ~40 | Schema introspection |
| `functions/api/_shared.js` | ~30 | CORS helpers, response formatters |

---

## Roadmap

See `docs/plans/query-product-vision.md` for the full product vision.

### Tier 2 (~19 hours)
- Slash commands (`/funnel`, `/compare`, `/trend`, `/anomaly`)
- Conversation persistence (ClickHouse ReplacingMergeTree)
- Proactive anomaly cards (canary queries on page load)
- Dynamic system prompt (auto-reflect schema changes)
- Query validation pipeline (EXPLAIN ESTIMATE pre-check)

### Tier 3 (~58 hours)
- Streaming responses (50% perceived latency reduction)
- Adaptive discovery cards (Gamma-Poisson from Taste Machine paper)
- Scheduled queries + alerting
- Dashboard generation from conversations
- MV pipeline visualization
- Team analytics feed
