Observability is for troubleshooting

28 Feb 2019 · Two minute read · on Gianluca's blog

Monitoring notifies you when something does not work. You get an alert, a slap in the face based on the priority of the issue. Observability is about troubleshooting, debugging, “looking around.” You don’t use observability techniques only when something doesn’t work.

Mainly because you don’t know where it happens, it can be anytime. You observe during development, locally or in production, anytime.

The ability to use the same observability tools and techniques such as tracing, log analysis and metrics is a tremendous value. You get used to them day by day and not only under pressure, during an outage. I practical trip that I can give you when you are instrumenting an application is about interconnection. You need a way to connect logs, with traces and with metrics.

There is nothing too complicated to understand. Every HTTP request has its own generated ID.

This ID will become the trace ID, and it will be attached to all the logs generated by that request. One of my application I instrumented uses opentracing/opentracing-to and uber-go/zap as the logger. I use a middleware similar to the one provided by the opentracing-contrib/go-stdlib.

Inside an HTTP handler, I configure the logger to add the trace_id for every log:

logger := GetLogger().With(zap.String("api.handler", "ping"))
if intTraceId := req.Context().Value("internal_trace_id"); intTraceId != nil {
    logger = logger.With(zap.String("trace_id", intTraceId.(string)))
}

In this way from this point in time the logger will add the trace_id to every line of log.

With this code req.Context().Value("internal_trace_id") I am retrieving the “trace_id” from the context. In Go every HTTP request has a context attached and this work because inside the middleware I set the trace_id in the context of the request and also as HTTP header:

// This is a temporary fix until this issue will be addressed
// https://github.com/opentracing/opentracing-go/issues/188
// This works only with Zipkin.
zipkinSpan, ok := sp.Context().(zipkin.SpanContext)
if ok == true && zipkinSpan.TraceID.Empty() == false {
  w.Header().Add("X-Trace-ID", zipkinSpan.TraceID.ToHex())
  r = r.WithContext(context.WithValue(r.Context(), "internal_trace_id", zipkinSpan.TraceID.ToHex()))
}

Having the trace_id exposed as header is nice because I can ask and traing everyone to just grab that parameter when they have issues. Or we can code the API consumer in a way that takes care about this value when something doesn’t go as expected.

All these connections are useful to build a context from different sources. This is the secret for happiness and Welcome to my Wonderland!

Something weird with this website? Let me know.