FireFly Analytics LogoFireFly Analytics
Solutions

Embedding Databricks Apps w/o SSO

A Go-based reverse proxy that enables embedding Databricks Lakehouse Apps (like VS Code and Marimo notebooks) without requiring users to authenticate directly with Databricks. Uses a session-cookie architecture — no tokens ever appear in URLs, browser storage, or logs.

Overview

Normally, embedding a Databricks app (such as the hosted VS Code editor or Marimo notebooks) in an iframe requires users to authenticate directly with Databricks through an SSO login flow. This exposes the Databricks login interface to end users and breaks the seamless experience of a custom application.

This Go reverse proxy eliminates the need for Databricks SSO by handling authentication transparently. Users authenticate with your application (via Better Auth + Okta SSO), and the proxy securely manages Databricks OAuth tokens entirely on the server side — exchanging a short-lived JWT for an opaque session cookie that the browser uses for all subsequent proxied requests.

Key Features

  • HttpOnly session cookies — no tokens in URLs, logs, or browser storage
  • Server-side session management with PostgreSQL
  • JWT-validated session creation via /start-session
  • Automatic SPN token refresh (5-minute pre-expiry window)
  • Strict CORS origin validation (exact match, no wildcards)
  • Bidirectional WebSocket proxying for real-time features
  • SSRF protection: app URLs validated against ALLOWED_APEX_DOMAIN

Embedded Applications

This proxy architecture powers several embedded applications:

High-Level Architecture

The following diagram shows the interaction between your Next.js application, the Go proxy server, and Databricks Lakehouse Apps. The React ProxyIframe component bootstraps a server-side session via JWT exchange, receives an opaque proxy_sid cookie, then loads the iframe. Every subsequent proxied request is authenticated by the session cookie alone.

Why a Proxy is Needed

Databricks Lakehouse Apps require OAuth Bearer token authentication for every request. When embedding these apps in iframes, we face several security challenges:

Token Exposure

If we embed the Databricks app directly with the token in the URL, the OAuth token would be visible in the browser's address bar, network inspector, and history.

CORS Restrictions

Databricks apps have strict CORS policies that prevent direct cross-origin requests from custom web applications.

WebSocket Authentication

WebSocket connections (used for terminals and real-time features) cannot easily include custom authentication headers from browser-initiated connections.

How the Proxy Solves These

Session Cookie Architecture

The React ProxyIframe component calls /start-session with a short-lived JWT. The proxy validates the JWT, fetches a Databricks bearer token via SPN client credentials, and returns an opaque HttpOnly session cookie. Databricks tokens never appear in URLs, logs, or browser storage.

CORS Proxy

The proxy adds appropriate CORS headers to responses, enabling cross-origin iframe embedding while maintaining security. The /start-session endpoint enforces a strict exact-match origin check against FRONTEND_URL.

WebSocket Proxying

The proxy detects WebSocket upgrade requests, looks up the session cookie to retrieve the Databricks bearer token, establishes an authenticated connection to the app, and bidirectionally forwards messages.

Session Creation Flow

Session initialization is a 7-step flow driven by the ProxyIframe React component. It runs once when the component mounts (with a React StrictMode guard to prevent double invocation).

Step 1: Fetch JWT

ProxyIframe calls authClient.token() to obtain a short-lived, signed JWT from Better Auth. The JWT has the current user's session as its subject and FRONTEND_URL as both issuer and audience.

Step 2: POST /start-session

The component posts { jwt, toolId, orgId } to {proxyBaseUrl}/start-session with credentials: "include" so the browser sends and receives cookies cross-origin.

Step 3: Origin validation

The Go proxy checks Origin == FRONTEND_URL (exact match, no wildcards). Requests from any other origin receive a 403 Forbidden before any JWT processing begins.

Step 4: JWT verification

The proxy fetches the JWKS from {frontendURL}/api/auth/jwks, verifies the JWT's signature, and checks iss, aud, and exp claims. The sub and email claims are extracted.

Step 5: Access validation

The proxy runs a 4-step DB check: user not banned → user belongs to orgId → tool exists and is not deleted → SPN credentials exist for this tool. On any failure the request is rejected with 403 Forbidden.

Step 6: Databricks token fetch

The proxy calls POST {workspaceURL}/oidc/v1/token using SPN client-credentials OAuth flow (grant_type=client_credentials, scope=all-apis). The resulting bearer token is stored in the session record — the browser never sees it.

Step 7: Session created, cookie set

A 32-byte cryptographically random session ID is generated. Its SHA-256 hash (not the raw ID) is stored in the proxy_sessions table. The raw session ID is returned as an HttpOnly proxy_sid cookie (1-hour TTL).

ProxyIframe Component

The session initialization logic lives in src/components/proxy-iframe.tsx. A useRef guard prevents React StrictMode's double-invocation from creating duplicate sessions.

src/components/proxy-iframe.tsx — session init
// 1. Fetch a short-lived JWT from better-auth (requires session cookie).
const tokenResult = await authClient.token();
if (tokenResult.error || !tokenResult.data?.token) {
  setStatus("error");
  return;
}

// 2. POST { jwt, toolId, orgId } to the Go proxy /start-session endpoint.
//    The proxy validates the JWT via JWKS, looks up the tool + SPN in
//    the database, fetches a Databricks bearer token, and sets the session cookie.
const res = await fetch(`${proxyBaseUrl}/start-session`, {
  method: "POST",
  credentials: "include",         // send/receive cookies cross-origin
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ jwt: tokenResult.data.token, toolId, orgId }),
});

// 3. Once the cookie is set, render the iframe.
//    All subsequent requests to /app-proxy/{toolId}/ carry the cookie automatically.
if (res.ok) setStatus("ready");

Complete Session Flow Diagram

Proxy URL Pattern

The proxy uses a simple tool-scoped URL pattern. Unlike token-in-URL approaches, there is no sensitive data in the URL — routing is resolved entirely from the server-side session record.

Pattern:
  /app-proxy/{toolId}/          ← initial iframe load
  /app-proxy/{toolId}/{path}    ← all subsequent requests (assets, API calls, WS)

Example:
  /app-proxy/code-editor-3771219485779100/
  /app-proxy/code-editor-3771219485779100/terminal
  /app-proxy/code-editor-3771219485779100/api/files

How routing works:
  1. Browser includes proxy_sid cookie automatically (no token in URL)
  2. Proxy looks up session by SHA-256(proxy_sid) in proxy_sessions table
  3. Session record contains appURL (e.g. https://{app}.aws.databricksapps.com)
  4. Proxy forwards request to appURL/{path} with Authorization: Bearer {accessToken}

Session Validation & Token Refresh

Every request to /app-proxy/{toolId}/... goes through session validation before being proxied. The session ID itself is never stored — only its SHA-256 hash — so a leaked database row cannot be used to forge a valid cookie.

1. Extract & hash cookie

The proxy_sid cookie value is read and its SHA-256 hash is computed. The hash is used to query proxy_sessions.

2. Validate session

The session must not be expired (expiresAt > now) and the toolID in the record must match the toolId in the request path. This prevents cross-app session reuse if a cookie is sent to the wrong proxy path.

3. Automatic token refresh

If the stored Databricks access token expires within 5 minutes, the proxy transparently fetches a new one via SPN client credentials and updates the session record before proxying the request.

4. Proxy the request

The request is forwarded to appURL with Authorization: Bearer {accessToken}. All X-Forwarded-* headers from the client are stripped to prevent injection. Security headers (CSP, X-Frame-Options) are injected on the response.

Session Database Schema

go/migrations/001_proxy_sessions.sql
CREATE TABLE proxy_sessions (
    id               TEXT        PRIMARY KEY,  -- hex(SHA-256(cookie_value))
    user_id          TEXT        NOT NULL,
    user_email       TEXT        NOT NULL,
    tool_id          TEXT        NOT NULL,
    org_id           TEXT        NOT NULL,
    app_url          TEXT        NOT NULL,     -- validated against ALLOWED_APEX_DOMAIN
    workspace_url    TEXT        NOT NULL,
    spn_client_id    TEXT        NOT NULL,
    spn_client_secret TEXT       NOT NULL,
    access_token     TEXT        NOT NULL,     -- Databricks bearer token (never in browser)
    token_expires_at TIMESTAMPTZ NOT NULL,
    expires_at       TIMESTAMPTZ NOT NULL,     -- session TTL (1 hour)
    created_at       TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Fast lookup by hash, cleanup by expiry, session list by tool+user
CREATE INDEX proxy_sessions_expires_at_idx  ON proxy_sessions (expires_at);
CREATE INDEX proxy_sessions_tool_user_idx   ON proxy_sessions (tool_id, user_id);

Production Deployment: Domain-Based Wildcard Routing

⚠️ Important: The Firefly Reference Implementation Uses Path-Based Cookies — This Is Not the Production Recommendation

The Firefly reference implementation ships with a single shared proxy domain using path-scoped cookies (/app-proxy/{toolId}/ ). This works for development, staging demos, and getting started quickly, but must not be used in production due to the following cross-app security risks:

  • Shared cookie namespace: All apps run under the same domain. An XSS vulnerability in one embedded app could potentially read or interfere with sessions belonging to other apps on the same domain.
  • Path scoping is not a security boundary: Path: /app-proxy/{toolId}/ is a browser hint that limits which requests receive the cookie — it is not enforced by the Same-Origin Policy. JavaScript running on the same domain can access all cookies for that domain regardless of path.
  • Cross-app session enumeration: A compromised or malicious embedded app could attempt to probe other session paths on the shared domain.

For production, use wildcard subdomain routing with strict CORS (detailed below).

Recommended: Wildcard Subdomain Pattern

Assign each tool a dedicated subdomain. This gives every app a separate origin, enforcing the browser's Same-Origin Policy as a hard isolation boundary — no JavaScript on app-tool-a.firefly-analytics.com can access cookies or storage from app-tool-b.firefly-analytics.com.

Subdomain routing pattern
Pattern:
  app-{toolId}.firefly-analytics.com  →  Go proxy for that tool

Examples:
  app-code-editor.firefly-analytics.com
  app-notebook-1234.firefly-analytics.com
  app-sql-dashboard.firefly-analytics.com

DNS:
  *.firefly-analytics.com  →  A record → reverse proxy (nginx / Cloudflare / ALB)
  Reverse proxy extracts toolId from hostname, routes to Go proxy.

Cookie configuration per subdomain:

Name:     proxy_sid
Domain:   app-{toolId}.firefly-analytics.com   ← exact subdomain, not wildcard
SameSite: Strict                                ← or Lax if frontend is same registrable domain
Secure:   true
HttpOnly: true
MaxAge:   3600

Security Benefits

  • Browser SOP enforces full isolation — app-foo.firefly-analytics.com cannot read cookies or storage of app-bar.firefly-analytics.com
  • XSS in one embedded app is contained to that app's subdomain only
  • SameSite=Strict (or Lax) can be used instead of None, reducing CSRF surface further
  • FRONTEND_URL CORS check on /start-session ensures only your application can initiate sessions

Path-Based vs Domain-Based Architecture

Comparison: Path-Based vs Domain-Based

FeaturePath-Based (Firefly Reference)Domain-Based (Recommended Production)
Cookie isolationPartial (path hint only)Full (SOP-enforced boundary)
XSS blast radiusAll apps on same domainSingle app subdomain only
SameSite settingNone (cross-site iframe)Strict or Lax
CORS protectionOrigin check on /start-sessionOrigin check + subdomain isolation
JS access to cookiesAll same-domain cookies accessibleOnly subdomain cookies accessible
Setup complexitySimple (single domain)Requires wildcard DNS + routing
Recommended forDev / demos / getting startedProduction deployments

Iframe Embedding

Once the session cookie is set, the ProxyIframe component renders an <iframe> pointing to {proxyBaseUrl}/app-proxy/{toolId}/. The browser automatically includes the proxy_sid cookie on all requests within that iframe.

Iframe Architecture

Sandbox Attributes

allow-scripts

Allows JavaScript execution (required for editor functionality)

allow-same-origin

Allows access to localStorage and cookies within iframe context

allow-forms

Enables form submission for file uploads and settings

allow-popups

Allows opening new windows for help docs and external links

allow-downloads

Permits file downloads for notebooks and data exports

WebSocket Support

Real-time features like terminal sessions and language server protocol require WebSocket connections. The Go proxy provides full bidirectional WebSocket proxying, using the same session cookie for authentication.

WebSocket Proxy Flow

WebSocket Detection & Auth

// WebSocket requests are detected by the Upgrade header.
func isWebSocketRequest(r *http.Request) bool {
  return strings.ToLower(r.Header.Get("Connection")) == "upgrade" &&
         strings.ToLower(r.Header.Get("Upgrade")) == "websocket"
}

// In the main proxy handler — session cookie provides the auth token.
if isWebSocketRequest(r) {
  // Session already validated; accessToken retrieved from proxy_sessions.
  wsURL := strings.Replace(targetURL, "https://", "wss://", 1) + remainingPath
  handleWebSocketProxy(w, r, wsURL, accessToken)
} else {
  handleHTTPProxy(w, r, targetURL, accessToken, remainingPath)
}

Deployment

The Go proxy can be deployed in several ways. All deployment options require a PostgreSQL database for session storage.

Docker Container

Build a Docker image and deploy to any container platform (ECS, Kubernetes, Cloud Run)

FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o proxy .

FROM alpine:latest
COPY --from=builder /app/proxy /proxy

# Required
ENV FRONTEND_URL=""          # e.g. https://firefly-analytics.com
ENV ALLOWED_APEX_DOMAIN=""   # e.g. aws.databricksapps.com
ENV DATABASE_URL=""          # PostgreSQL connection string

# Optional
ENV DEV_MODE="false"         # Set to "true" for http://localhost testing only
ENV PORT="8090"

EXPOSE 8090
CMD ["/proxy"]

Serverless Function

Deploy as AWS Lambda or Google Cloud Functions for auto-scaling. Note: WebSocket support requires a long-lived connection — ensure your serverless platform supports it (e.g., API Gateway WebSocket APIs).

VM or Bare Metal

Run directly on VMs for maximum performance and control. Recommended for high-concurrency WebSocket workloads.

Configuration Reference

VariableDescriptionRequired
FRONTEND_URLOrigin of the Next.js app (e.g. https://firefly-analytics.com). Used for JWT iss/aud validation and strict CORS origin check.Yes
ALLOWED_APEX_DOMAINDatabricks apps apex domain (e.g. aws.databricksapps.com). App URLs from the DB are validated against this to prevent SSRF.Yes
DATABASE_URLPostgreSQL connection string for the proxy_sessions table.Yes
DEV_MODESet to "true" to use path-scoped cookies without Secure flag. For http://localhost development only. Never enable in production.No
PORTServer port (default: 8090)No