What is open telemetry?
OpenTelemetry (OTEL for short) is an observability framework designed to create and manage telemetry data (traces, metrics, and logs). OpenTelemetry is vendor agnostic, which means it’s not tied to a specific platform and can be used to send telemetry data to any observability backend that supports it, for instance, Azure Monitor (AppInsights), AWS, Grafana, Dynatrace, and New Relic.
What telemetry can we get?
Traces
Traces allow us to “trace” the path of a request (or maybe input to be more general) once it enters your application. It doesn’t matter if we are talking about a single monolith or a complicated mesh of services. OTEL makes it easy to track the processing of the request and detect any problems along the way.
Logs
You can send logs generated by your applications to a backend using the OTEL framework and automatically get them tied to the traces so you can see what’s going on and find any issues with ease.
Metrics
Measurements we can take to capture information about availability and performance
How can we get it?
To obtain telemetry, we need to instrument our systems. Instrumenting a codebase means adding code to generate and submit telemetry, like marking the beginning and end of an operation, emitting metrics when receiving a new request, etc.
This can be done manually by using the appropriate language OTEL sdk tools and adding a few lines of code, or by using instrumentation libraries. Those are special libraries that inject instrumentation into common-use libraries (and frameworks), for instance, in Python we have instrumentation libraries for requests, Django, flask, MySQL, etc. (we’ll see that in the example).
We also need to set up telemetry exporters to send the telemetry to the backend. OpenTelemetry sdk comes with some generic ones but backends might have their own exporters as separate packages as plugins (you can also come up with your own plugin packages!).
There are three types of exporters: traces, span, and log exporters. Most exporters need to be configured using some argument that can be provided by environment variables.
The example app
Let’s see how to use OpenTelemetry on a Python project with a simple flask example.
I created three simple flask apps: server, api1, and api2.
We need to use a flask version lower than 3.0 as 3 is not supported yet by OTEL, and due to an issue with the library, we’ll need to force werkzeug to a version lower than 3.0.0.
pip3 install 'werkzeug<3.0.0' 'flask<3'
server.py
import requests
from random import randint
from flask import Flask
from time import sleep
app = Flask(__name__)
@app.route("/callserviceok")
def callOK():
# lets waste some time
sleep(2)
# then call our service
response = requests.get("http://localhost:8081/get-data")
# imagine we are processing something here
sleep(1)
response = requests.get("http://localhost:8082/get-data")
return str(response.content)
@app.route("/callserviceerr")
def callErr():
# lets waste some time
sleep(2)
# imagine we are processing something here
sleep(1)
response = requests.get("http://localhost:8082/err")
return str(response.content)
api1.py
from flask import Flask
from opentelemetry.trace import get_tracer, SpanKind
from time import sleep
tracer = get_tracer(__name__)
app = Flask(__name__)
@app.route("/get-data")
def get_data():
sleep(3)
data = get_data_from_db()
return data
def get_data_from_db():
with tracer.start_as_current_span("my_database", kind=SpanKind.CLIENT):
sleep(2)
return "my data"
api2.py
from flask import Flask
from opentelemetry.trace import get_tracer, SpanKind
tracer = get_tracer(__name__)
app = Flask(__name__)
@app.route("/get-data")
def get_data():
return "my data"
@app.route("/err")
def err():
raise Exception("An error occured")
Picking a backend
As mentioned before, you can use OTEL to send telemetry to any backend that supports it,
In general, instrumenting the project is the laborious part, and switching to or adding a backend is just a matter of installing some packages and using the right environment variables.
There are several free observability backend projects that we can use to collect and visualize telemetry like Jaeger and Zipkin. In this example, I’ll be using a Zipkin instance on Docker.
OpenTelemetry Setup
First, we need to install the required packages. The OpenTelemetry Python implementation (requires Python 3.6 or higher) provides command-line tools to bootstrap the instrumentation work.
pip install opentelemetry-distro
The opentelemetry-bootstrap command will detect installed packages in your local environment and list or install the required instrumentation libraries (you might still need to manually install some extra instrumentation library packages).
opentelemetry-bootstrap -a install
We can also use the command to get a requirements.txt-ready list of packages to install instead.
`opentelemetry-bootstrap -a requirements`
You will also need to install additional packages (the exporters) depending on your backend choice, in this case for zipkin.
pip install opentelemetry-exporter-zipkin
Instrumenting our applications
For this example, we’ll use auto-instrumentation. This approach might be enough if your project is based on a popular framework like Django or Flask and makes use of common databases and Python libraries, as the telemetry you need is likely already provided by available instrumentation libraries.
The opentelemetry-instrument wrapper command injects all the instrumentation to the target Python application, we can configure it using environment variables or command-line arguments.
Setup environment variables
As I want to keep this short, the only setting I will set up using environment variables will be the zipkin endpoint, the other parameters will go through arguments.
export OTEL_EXPORTER_ZIPKIN_ENDPOINT=http://localhost:9411/api/v2/spans
Running the apps with instrumentation
Let’s run this app at the same time, in different consoles.
opentelemetry-instrument --traces_exporter zipkin --metrics_exporter none --service_name server flask --app server.py run -p 8080
opentelemetry-instrument --traces_exporter zipkin --metrics_exporter none --service_name service-api-1 flask --app api1.py run -p 8081
opentelemetry-instrument --traces_exporter zipkin --metrics_exporter none --service_name service-api-2 flask --app api2.py run -p 8082
This will run the instrumentation and enable the zipkin traces exporter (also disable the metrics one, as we won’t be showing that here). For each application, we define a service name, and that’s how we’ll identify each service.
Let’s see what it can do
Now, let’s call the server application from the browser to the /callserviceok endpoint. All our traces will show in zipkin:
Here is what we see if we click “show” on the main trace:
Zipkin is pretty basic, but what we see here will be present in all other monitoring software. Here, we can see the trace’s timeline and what was called, when, to where, the response status, and so on. Plus, a bunch of collected data like my user agent.
The endpoint calls the two API services, service-api-1 which simulates some work and a call to a database server, and service-api-2 which returns immediately.
Let’s simulate an error that occurred in one of the APIs by calling the /callserviceerr endpoint instead.
It will throw an exception in api2, and here we see how easy it is to spot in the requests list.
Here is the details for the request:
As you can see the error information is collected automatically, and we can clearly see which of the API requests failed (api2).
Zipkin also shows a dependency map generated from the trace data
More complex monitoring software like Azure Appinsight, Dynatrace, or Grafana allows for the collection of logs to get a clearer picture of what’s going on. OpenTelemetry is a really powerful tool and it’s implemented in many programming languages. As you can see from this example, OpenTelemetry allows zipkin to correlate all the information from different services into one flow, and this works out of the box most of the time. Let’s say we had a complex javascript application in the frontend, with a few steps we can instrument it too, and be able to inspect at a glance all the application behavior from the frontend to the backend.