OpenTelemetry: A Revolution in System Monitoring

What is open telemetry?

OpenTelemetry (OTEL for short) is an observability framework designed to create and manage telemetry data (traces, metrics, and logs). OpenTelemetry is vendor agnostic, which means it’s not tied to a specific platform and can be used to send telemetry data to any observability backend that supports it, for instance, Azure Monitor (AppInsights), AWS, Grafana, Dynatrace, and New Relic.

What telemetry can we get?

Traces

Traces allow us to “trace” the path of a request (or maybe input to be more general) once it enters your application. It doesn’t matter if we are talking about a single monolith or a complicated mesh of services. OTEL makes it easy to track the processing of the request and detect any problems along the way.

Logs

You can send logs generated by your applications to a backend using the OTEL framework and automatically get them tied to the traces so you can see what’s going on and find any issues with ease.

Metrics

Measurements we can take to capture information about availability and performance

How can we get it?

To obtain telemetry, we need to instrument our systems. Instrumenting a codebase means adding code to generate and submit telemetry, like marking the beginning and end of an operation, emitting metrics when receiving a new request, etc.

This can be done manually by using the appropriate language OTEL sdk tools and adding a few lines of code, or by using instrumentation libraries. Those are special libraries that inject instrumentation into common-use libraries (and frameworks), for instance, in Python we have instrumentation libraries for requests, Django, flask, MySQL, etc. (we’ll see that in the example).

We also need to set up telemetry exporters to send the telemetry to the backend. OpenTelemetry sdk comes with some generic ones but backends might have their own exporters as separate packages as plugins (you can also come up with your own plugin packages!).

There are three types of exporters: traces, span, and log exporters. Most exporters need to be configured using some argument that can be provided by environment variables.

The example app

Let’s see how to use OpenTelemetry on a Python project with a simple flask example.

I created three simple flask apps: server, api1, and api2.

We need to use a flask version lower than 3.0 as 3 is not supported yet by OTEL, and due to an issue with the library, we’ll need to force werkzeug to a version lower than 3.0.0.

pip3 install 'werkzeug<3.0.0' 'flask<3'

server.py

import requests

from random import randint

from flask import Flask

from time import sleep

app = Flask(__name__)

@app.route("/callserviceok")

def callOK():

   # lets waste some time

   sleep(2)

   # then call our service

   response = requests.get("http://localhost:8081/get-data")

   # imagine we are processing something here

   sleep(1)

   response = requests.get("http://localhost:8082/get-data")

   return str(response.content)

@app.route("/callserviceerr")

def callErr():

   # lets waste some time

   sleep(2)

   # imagine we are processing something here

   sleep(1)

   response = requests.get("http://localhost:8082/err")

   return str(response.content)

api1.py

from flask import Flask

from opentelemetry.trace import get_tracer, SpanKind

from time import sleep

tracer = get_tracer(__name__)

app = Flask(__name__)

@app.route("/get-data")

def get_data():

   sleep(3)

   data = get_data_from_db()

   return data

def get_data_from_db():

   with tracer.start_as_current_span("my_database", kind=SpanKind.CLIENT):

       sleep(2)

       return "my data"

api2.py

from flask import Flask

from opentelemetry.trace import get_tracer, SpanKind

tracer = get_tracer(__name__)

app = Flask(__name__)

@app.route("/get-data")

def get_data():

   return "my data"

@app.route("/err")

def err():

   raise Exception("An error occured")

Picking a backend

As mentioned before, you can use OTEL to send telemetry to any backend that supports it,

In general, instrumenting the project is the laborious part, and switching to or adding a backend is just a matter of installing some packages and using the right environment variables.

There are several free observability backend projects that we can use to collect and visualize telemetry like Jaeger and Zipkin. In this example, I’ll be using a Zipkin instance on Docker.

OpenTelemetry Setup

First, we need to install the required packages. The OpenTelemetry Python implementation (requires Python 3.6 or higher) provides command-line tools to bootstrap the instrumentation work.

pip install opentelemetry-distro

The opentelemetry-bootstrap command will detect installed packages in your local environment and list or install the required instrumentation libraries (you might still need to manually install some extra instrumentation library packages).

opentelemetry-bootstrap -a install

We can also use the command to get a requirements.txt-ready list of packages to install instead.

`opentelemetry-bootstrap -a requirements`

You will also need to install additional packages (the exporters) depending on your backend choice, in this case for zipkin.

pip install opentelemetry-exporter-zipkin

Instrumenting our applications

For this example, we’ll use auto-instrumentation. This approach might be enough if your project is based on a popular framework like Django or Flask and makes use of common databases and Python libraries, as the telemetry you need is likely already provided by available instrumentation libraries.

The opentelemetry-instrument wrapper command injects all the instrumentation to the target Python application, we can configure it using environment variables or command-line arguments.

Setup environment variables

As I want to keep this short, the only setting I will set up using environment variables will be the zipkin endpoint, the other parameters will go through arguments.

export OTEL_EXPORTER_ZIPKIN_ENDPOINT=http://localhost:9411/api/v2/spans

Running the apps with instrumentation

Let’s run this app at the same time, in different consoles.

opentelemetry-instrument --traces_exporter zipkin --metrics_exporter none --service_name server flask --app server.py  run -p 8080

opentelemetry-instrument --traces_exporter zipkin --metrics_exporter none --service_name service-api-1 flask --app api1.py  run -p 8081

opentelemetry-instrument --traces_exporter zipkin --metrics_exporter none --service_name service-api-2 flask --app api2.py  run -p 8082

This will run the instrumentation and enable the zipkin traces exporter (also disable the metrics one, as we won’t be showing that here). For each application, we define a service name, and that’s how we’ll identify each service.

Let’s see what it can do

Now, let’s call the server application from the browser to the /callserviceok endpoint. All our traces will show in zipkin:

Here is what we see if we click “show” on the main trace:

Zipkin is pretty basic, but what we see here will be present in all other monitoring software. Here, we can see the trace’s timeline and what was called, when, to where, the response status, and so on. Plus, a bunch of collected data like my user agent.

The endpoint calls the two API services, service-api-1 which simulates some work and a call to a database server, and service-api-2 which returns immediately.

Let’s simulate an error that occurred in one of the APIs by calling the /callserviceerr endpoint instead.

It will throw an exception in api2, and here we see how easy it is to spot in the requests list.

Here is the details for the request:

As you can see the error information is collected automatically, and we can clearly see which of the API requests failed (api2).

Zipkin also shows a dependency map generated from the trace data

More complex monitoring software like Azure Appinsight, Dynatrace, or Grafana allows for the collection of logs to get a clearer picture of what’s going on. OpenTelemetry is a really powerful tool and it’s implemented in many programming languages. As you can see from this example, OpenTelemetry allows zipkin to correlate all the information from different services into one flow, and this works out of the box most of the time. Let’s say we had a complex javascript application in the frontend, with a few steps we can instrument it too, and be able to inspect at a glance all the application behavior from the frontend to the backend.

OpenTelemetry: A Revolution in System Monitoring

Recent Posts

Make Moves With Distillery.

Quick Links

Resources

Global HQ

Phone

Email