Reference-Implementation-T3-Chat-Scale-And-Workflows

Issue 102 Edition 2026-04-12 7 min read

General

Sources: 1 • Confidence: Medium • Updated: 2026-04-12 10:36

Key takeaways

The speaker attributes the main T3 Chat logging/OTEL implementation work to teammates Julius and Mark (mostly Julius).
Traditional line-oriented logs do not reliably reconstruct what happened during incidents in modern distributed systems where a single request traverses many services and components.
A proposed logging pattern is to accumulate context throughout a request lifecycle and emit a single wide, canonical event per request per service hop at the end of the request.
Adopting OpenTelemetry standardizes collection/export but does not determine what to instrument or add business context, so it does not automatically make observability useful without engineering instrumentation decisions.
High-cardinality logging data is described as expensive and slow primarily on legacy logging systems optimized for low-cardinality search strings rather than being inherently expensive and slow.

The speaker attributes the main T3 Chat logging/OTEL implementation work to teammates Julius and Mark (mostly Julius).
A codebase referenced as T3 Chat required cleanup of over a thousand debug log statements that had accumulated from habitual excessive logging.
In T3 Chat, spans are annotated with contextual attributes across the request flow (e.g., validation errors, thread IDs, message IDs, attachment metadata) rather than passing a large context object through function calls.
The T3 Chat production logging volume is reported as about 5.8 billion records and 7.3 TB of raw text logs, excluding user messages and model responses.
T3 Chat correlates client and server activity by passing span identifiers from the frontend to the backend, and client errors can surface a span ID to support direct trace retrieval for a specific request.
The speakers report using ClickHouse for log analytics workloads.

Traditional line-oriented logs do not reliably reconstruct what happened during incidents in modern distributed systems where a single request traverses many services and components.
Concurrent request handling causes log-line interleaving that makes it difficult to reconstruct the sequence of events for a single user request without strong correlation context.
Grepping/string-searching logs is unreliable during investigations because identifiers and required context fields are inconsistently emitted across events and across services.
Common logging practices tend to optimize for easy log emission by developers rather than efficient querying and investigation during incidents.

A proposed logging pattern is to accumulate context throughout a request lifecycle and emit a single wide, canonical event per request per service hop at the end of the request.
Wide events shift debugging from text search toward structured querying and aggregation (including SQL-style workflows) over production traffic.
With wide events, a described target workflow is querying checkout failures by user segment and feature flag, grouping by error code, and getting sub-second results to identify root cause in one query.
A wide canonical event should include request, user, business, infrastructure, error, and performance context fields to support debugging and product questions without multiple log searches.

Adopting OpenTelemetry standardizes collection/export but does not determine what to instrument or add business context, so it does not automatically make observability useful without engineering instrumentation decisions.
Structured logging (e.g., key-value or JSON) is necessary but does not, by itself, produce useful debugging data without deliberate inclusion of high-value context fields.

High-cardinality logging data is described as expensive and slow primarily on legacy logging systems optimized for low-cardinality search strings rather than being inherently expensive and slow.
Some logging systems are economically and technically misaligned with debugging because they charge by volume and struggle with high-cardinality fields that are important for debugging (e.g., user/session/trace IDs).

What measured changes (if any) occurred in MTTR, incident frequency, or on-call time after adopting span annotation, wide-event thinking, and query-based debugging?
What are the actual observability costs (ingest, storage, query) for the reported T3 Chat logging scale, and how do those costs change under tail sampling?
How are wide canonical events implemented in practice across services (e.g., how context is accumulated, schema governance, and how the final 'emit once at end' interacts with early failures and partial execution paths)?
What query latency and completeness is achieved under real incident loads for the described segmentation/group-by workflows (e.g., checkout failures by feature flag and error code)?
What are the constraints and trade-offs of high field-count schemas (e.g., field explosion, naming consistency, PII handling, schema evolution, and downstream compatibility)?

Architectures that support high cardinality, wide event schemas, and fast SQL style incident queries may see increased interest as teams move from grep based logging to query based debugging.
OpenTelemetry adoption may shift differentiation from collection plumbing toward products that help teams decide, govern, and propagate business context fields across services.
Analytics first backends used for observability workflows may benefit if they can deliver low latency group by segmentation during incidents at large logging and span throughput.

Reported improvements in MTTR, on call load, or incident reconstruction speed after adopting span annotation and wide canonical events, with before and after metrics.
Published observability cost breakdowns at the described scale, including ingest, storage, and query, plus the impact of tail sampling on cost and query completeness.
Demonstrated incident queries that remain fast under load, showing segmentation and group by workflows using high field count schemas without operational issues.

No measurable reliability or on call benefits versus structured line logs, or benefits only in narrow cases not generalizable across services.
High field count and high cardinality schemas create operational friction such as schema drift, PII handling failures, or downstream breakage that blocks adoption.
Query latency or data completeness during incidents is poor due to sampling, pipeline limits, or storage constraints, making wide event datasets unreliable for debugging.