# EXOS Analytics — Storage Architecture & Data Retention

## ClickHouse Internals (How Data Actually Lives)

ClickHouse is a **columnar, append-only, merge-tree database**. Not relational. Not page-based.

### How MergeTree Works

```
INSERT (batch of rows)
    │
    ▼
  Part (immutable, column files on disk)
    │
    ▼ (background merge)
  Larger Part (sorted, deduplicated, compressed)
    │
    ▼ (over time)
  Very Large Part (potentially hundreds of MB per column file)
```

- Data arrives in **parts** (batches). Each INSERT creates a new part.
- Parts are **immutable** — once written, never modified.
- ClickHouse merges small parts into larger ones in the background.
- Data is stored **column by column** — reading `ServiceName` from 1M rows only touches the `ServiceName` column file.
- **Sparse primary index**: one entry per 8,192 rows (a "granule"). The primary key = physical sort order.
- **When is data queryable?** Immediately after INSERT completes (milliseconds). No "eventual consistency" delay.

### Compression

ClickHouse compresses each column independently using LZ4 (default) or ZSTD:

| Data type | Typical compression ratio |
|-----------|--------------------------|
| `LowCardinality(String)` (enum-like) | 50-100x |
| `DateTime64` (timestamps) | 10-20x (delta encoding) |
| `UInt64` (counters, durations) | 5-10x |
| `String` (URLs, JSON blobs) | 3-5x |
| `Map(String, String)` (key-value attributes) | 3-5x |

Cloudflare measured **36 bytes/record average** at 6M req/sec on ClickHouse (raw data is 200-500 bytes/record).

---

## Tiered Storage Architecture

### Self-Hosted ClickHouse (AKS / VM)

If you run ClickHouse yourself, you control the storage layers:

```
┌──────────────────────────────────────────────────┐
│ HOT TIER: NVMe SSD (Azure Premium SSD v2)        │
│ Last 7 days. Sub-millisecond read latency.        │
│ ~$0.08/GB/mo. 500 IOPS/GB.                       │
│ All recent queries hit this tier.                 │
├──────────────────────────────────────────────────┤
│ WARM TIER: Standard SSD (Azure Standard SSD)      │
│ 7-90 days. Single-digit ms latency.              │
│ ~$0.04/GB/mo. Lower IOPS but fine for scans.     │
│ Dashboards, historical comparisons.              │
├──────────────────────────────────────────────────┤
│ COLD TIER: Azure Blob Storage (via S3 API)        │
│ 90+ days. 10-50ms first-byte latency.            │
│ ~$0.02/GB/mo (Hot) or $0.01/GB (Cool).           │
│ Regulatory retention, audit trail, rare queries.  │
├──────────────────────────────────────────────────┤
│ ARCHIVE: Azure Blob Archive                       │
│ 1+ years. Minutes to hours for retrieval.         │
│ $0.001/GB/mo. Write-once, read-never (until you  │
│ need it for compliance).                          │
└──────────────────────────────────────────────────┘
```

**ClickHouse TTL policies** automate movement between tiers:

```sql
ALTER TABLE otel_traces
  MODIFY TTL
    -- Hot: keep on fast disk for 7 days
    Timestamp + INTERVAL 7 DAY TO VOLUME 'hot',
    -- Warm: move to standard SSD after 7 days
    Timestamp + INTERVAL 7 DAY TO VOLUME 'warm',
    -- Cold: move to blob storage after 90 days
    Timestamp + INTERVAL 90 DAY TO VOLUME 'cold',
    -- Delete after 12 months
    Timestamp + INTERVAL 12 MONTH DELETE;
```

ClickHouse storage policies define volumes:

```xml
<storage_configuration>
  <disks>
    <hot>  <type>local</type><path>/mnt/nvme/</path> </hot>
    <warm> <type>local</type><path>/mnt/ssd/</path>  </warm>
    <cold> <type>s3</type>
           <endpoint>https://clickstackstorage.blob.core.windows.net/cold/</endpoint>
           <access_key_id>...</access_key_id>
           <secret_access_key>...</secret_access_key>
    </cold>
  </disks>
  <policies>
    <tiered>
      <volumes>
        <hot>  <disk>hot</disk>  </hot>
        <warm> <disk>warm</disk> </warm>
        <cold> <disk>cold</disk> </cold>
      </volumes>
    </tiered>
  </policies>
</storage_configuration>
```

### ClickHouse Cloud (What You Get for Free)

ClickHouse Cloud handles this automatically — **you don't configure tiers**:

```
┌──────────────────────────────────────────────────┐
│ All data lives in Azure Blob Storage              │
│ $25.30/TiB/mo (flat, no tiers)                   │
│                                                   │
│ + Automatic SSD cache on compute nodes            │
│   (frequently queried data cached locally)        │
│                                                   │
│ + Compute scales to zero when idle                │
│   (no compute cost when nobody's querying)        │
│                                                   │
│ + TTL policies still work (auto-delete old data)  │
│   but there's no tier movement — just deletion    │
└──────────────────────────────────────────────────┘
```

**Why this is actually better for most teams:**

1. **No tier management** — ClickHouse Cloud's caching layer handles hot/cold automatically. Recent data is SSD-cached. Old data is in blob storage but still queryable at the same latency (few hundred ms for cold scans vs. sub-ms for cached).

2. **Simpler pricing** — $25.30/TiB flat. No decisions about which tier to put data in. The "premium" over raw Blob Storage ($20/TiB) is only $5/TiB.

3. **No ops** — no disk resizing, no IOPS monitoring, no storage policy debugging.

4. **TTL still works** — you set `TTL Timestamp + INTERVAL 12 MONTH DELETE` and old data disappears automatically.

The tradeoff: you pay $25.30/TiB even for "cold" data that you'd store at $1/TiB on Archive tier. If you have 50+ TiB of historical data that's rarely queried, a self-hosted hybrid (ClickHouse + Blob archive) is cheaper.

---

## Data Retention Recommendations for EXOS

| Data Type | Hot (ClickHouse) | Archive (ADLS Gen2) | Why |
|-----------|-----------------|---------------------|-----|
| Session replay events | 90 days | 12 months | Replays are large (~2MB/session) but rarely replayed after 90 days |
| API traces (otel_traces) | 12 months | 3 years | Needed for trend analysis, funnel comparisons YoY |
| Logs (otel_logs) | 30 days | 12 months | Debugging context, short-lived value |
| Metrics (otel_metrics_*) | 12 months | 3 years | Dashboard KPIs, rolled-up aggregates last longer |
| Metabase query audit | Forever | — | Every query run in Metabase, lightweight metadata |

### TTL Configuration

```sql
-- Session replays: 90 days hot, then delete
ALTER TABLE hyperdx_sessions
  MODIFY TTL Timestamp + INTERVAL 90 DAY DELETE;

-- Traces: 12 months hot, then delete (archive separately if needed)
ALTER TABLE otel_traces
  MODIFY TTL Timestamp + INTERVAL 12 MONTH DELETE;

-- Logs: 30 days
ALTER TABLE otel_logs
  MODIFY TTL Timestamp + INTERVAL 30 DAY DELETE;

-- Metrics: 12 months
ALTER TABLE otel_metrics_gauge
  MODIFY TTL TimeUnix + INTERVAL 12 MONTH DELETE;
ALTER TABLE otel_metrics_histogram
  MODIFY TTL TimeUnix + INTERVAL 12 MONTH DELETE;
ALTER TABLE otel_metrics_sum
  MODIFY TTL TimeUnix + INTERVAL 12 MONTH DELETE;
```

### Archive Pipeline (if needed at scale)

For data you need to keep beyond the hot retention window:

```
ClickHouse TTL approaching
    │
    ▼
Materialized View → export to ADLS Gen2 (Parquet format)
    │
    ▼
Azure Data Factory or scheduled ClickHouse query:
  INSERT INTO FUNCTION s3(
    'https://clickstackstorage.blob.core.windows.net/archive/otel_traces/{_partition_id}.parquet',
    'Parquet'
  )
  SELECT * FROM otel_traces
  WHERE Timestamp < now() - INTERVAL 11 MONTH
    AND Timestamp >= now() - INTERVAL 12 MONTH
```

This gives you cheap Parquet files in ADLS that Synapse Serverless, Databricks, or even ClickHouse's `s3()` table function can query on demand.

---

## Cost Projections by Retention Policy

Assumes 100K daily visitors, full session replay (~200GB/day compressed):

### ClickHouse Cloud Only (current architecture)

| Retention | Storage (compressed) | Cost/mo |
|-----------|---------------------|---------|
| 30 days hot | ~6 TiB | ~$152 |
| 90 days hot | ~18 TiB | ~$455 |
| 12 months hot | ~72 TiB | ~$1,822 |

### Hybrid: ClickHouse Cloud + ADLS Archive

| Tier | Retention | Storage | Cost/mo |
|------|-----------|---------|---------|
| ClickHouse (hot) | 90 days | ~18 TiB | ~$455 |
| ADLS Cool | 12 months | ~72 TiB | ~$720 |
| **Total** | | | **~$1,175** |

vs. pure ClickHouse Cloud for 12 months: ~$1,822. Hybrid saves ~$650/mo at this scale.

### Current Scale (demo, ~20GB total)

| Component | Cost |
|-----------|------|
| ClickHouse Cloud | ~$150/mo (dev tier minimum) |
| VM (B2ms) | ~$50/mo |
| Total | ~$200/mo |

At demo scale, the tiered architecture is irrelevant. It matters at 10+ TiB.

---

## Making the Frontend More Accessible

### Metabase Customization

Metabase supports extensive white-labeling (Pro/Enterprise tier) but even the free tier allows:

- **Custom home page**: Pin saved questions and dashboards
- **Collections**: Organize queries by team (PM, Design, SRE)
- **Embedding**: Iframe any dashboard or question into the portal
- **Custom colors**: Match ServiceLink brand (#1C7C35)

### Recommended Starter Dashboards for Metabase

Create these in Metabase using the visual query builder (no SQL required):

1. **Executive Summary**: Total sessions, unique users, error rate, top pages (last 30 days)
2. **Service Health**: Traces by service, p95 latency, error count (last 7 days)
3. **User Journey**: Top 10 page sequences, avg session duration, bounce rate
4. **Error Drill-down**: Errors by service, by page, by time of day
5. **EXOS Operations**: Orders processed, assignments completed, inspections scheduled

### Query Studio Affordances (for non-technical users)

The portal's Query Studio should include:

**Reference cards** (not just pills at the bottom):
```
┌─────────────────────────────────────────────────┐
│ BASICS                                          │
│                                                 │
│ Count all traces:                               │
│   SELECT count() FROM otel_traces               │
│                                                 │
│ Filter by service:                              │
│   WHERE ServiceName = 'exos-orders'             │
│                                                 │
│ Filter by time:                                 │
│   WHERE Timestamp >= now() - INTERVAL 7 DAY     │
│                                                 │
│ Group and count:                                │
│   SELECT ServiceName, count()                   │
│   FROM otel_traces                              │
│   GROUP BY ServiceName                          │
│   ORDER BY 2 DESC                               │
├─────────────────────────────────────────────────┤
│ COMMON QUESTIONS                                │
│                                                 │
│ "How many orders this week?"                    │
│   → Click to load in editor                     │
│                                                 │
│ "Which pages have the most errors?"             │
│   → Click to load in editor                     │
│                                                 │
│ "Show me the login → order → upload funnel"     │
│   → Click to load in editor                     │
└─────────────────────────────────────────────────┘
```

The AI assistant sidebar handles everything else — "I want to see a Sankey chart of user flows" → Opus generates the SQL.