Adding Observability To Your Python Applications With OpenTelemetry

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform,

including simple pricing,

node balancers,

40 gigabit networking, dedicated CPU and GPU instances,

s 3 compatible object storage, and worldwide data centers.

Go to pythonpodcast.com/linode,

that's l I n o d e today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Your host as usual is Tobias Macy. And today, I'm interviewing Austin Parker and Alex Bowden about the OpenTelemetry

project and its efforts to standardize the collection and analysis of observability data for your applications.

So, Austin, can you start by introducing yourself? Sure thing. Hi. I'm Austin Parker. I'm principal developer advocate at Light Step, and I'm a maintainer on the OpenTelemetry

project.

And, Alex, how about you? Yeah. Sure. Hi. I'm Alex Boden. I'm an open source software engineer at LightStep, and I'm a contributor to the OpenTelemetry

project, as well as 1 of the maintainers on the OpenTelemetry Python project.

And so going back to you, Austin, do you remember how you first got introduced to Python?

Oh, actually, I think by the first time I ever used Python was,

way back in college. I feel like that's I want to say maybe my intro to programming type of thing.

Before that, I've done a lot with just general scripting,

things like Apple script on a Mac

and other forms of, you know, like bash scripting and shell scripting. But Python was kind of new to me, and I I really enjoyed it. Actually, I still love to work in Python. And, Alex, how about you? Yeah. So I guess I remember the first time I heard about Python was people complaining from the Java world about the white spacing

that Python required. But,

you know, at the time, I hadn't really used it. And I was really introduced to it about 6 years ago when I joined the team, that was building a platform as a service, and API and the CLI were all written in Python on top of Docker. And so, basically, I jumped in, and I just needed what I needed to to learn the team, move forward, using Python 27 at the time, I think. And in terms of the OpenTelemetry

project, can you each give a bit of background as to how you each got involved with it and maybe describe a bit about what the project is and some of the story behind it? Sure. I guess I'll start off. So I actually came into OpenTelemetry

from 1 of its predecessors,

OpenTracing.

Kind of the short version of the story is in, I wanna say, 2016,

2017 or so, OpenTracing

came out,

as an open source project. And the goal was really to

provide a standard unified API for distributed tracing

across multiple different languages. So everything from Java to Python,

c sharp, and so forth. And this

you know, the underlying

idea here was that distributed tracing was, you know, very useful, but often

people would apply it in sort of a polyglot environment where you had some services running in Node and some in Python and some in Java. And you wanna be able to for useful for distributed tracing specifically to be useful, you you really want it to be able to kind of go through the entire request, you know, nose to tail as it were. So you need some standard idioms. You need some standard, you know think about it as adjectives and verbs. Right? You need these standard

things,

between your different languages. So open tracing was designed to fill that hole in the ecosystem.

And it was an API, you know, that

had to be reimplemented

by different vendors. So you'd have a Jaeger

implementation or a Zipkin implementation,

but your actual tracing code was

independent. So I could use the same tracing code with any number of vendors, and it was great. Now in reality,

maybe it didn't work out quite as well as we had hoped. 2018 or so rolls around. 20/17/2018,

I wanna say, you saw OpenCensus

kind of arrive as a competitor to open tracing in a lot of ways. And OpenCensus was a Google project. Microsoft also joined in.

And it had a similar sort of concepts like, Hey, we want a bit of tracing,

we want it to be polyglot,

but we want to provide, sort of, both the API and the SDK,

and then

let you, you know, have a pluggable, sort of, exporter model. These 2 projects coexisted for a while,

but it was causing confusion in, like, the broader open source community. People weren't sure which to use. You know, I had open source library authors come and say, like, well, I have customers or I have users that want to use tracing,

but which 1 of these 2 things should I pick? And, ultimately, I I think what happened really is

everyone sort of looked at the situation and said, hey. This is kind

of

join

forces.

Let's do kind of join forces.

Let's do a best of both worlds approach. And out of that, you was born OpenTelemetry.

And, Alex, how did you get involved in the OpenTelemetry

project?

Yeah. So I I actually came came out of it from the other side. I was I was the user of both,

open sentence and open tracing. So I was I was right in the middle of all that confusion about,

I don't know, maybe a year and a half, 2 years ago now. And,

I was introduced to the OpenTelemetry project much like everybody else on the on the user community around the announcements that was done about a year

a year and a few months ago now.

And, I really joined the project as a contributor when I joined LightStep about 7 or 8 months ago now, And that that was my first,

real introduction to the project. And so the tagline of OpenTelemetry

is that it's trying to help make observability data easier to collect and access. And before we get too much further into the specifics of how it does that, I'm wondering if you can give a bit of a definition as to how you think about observability,

and in what ways it's separate from the quote unquote traditional approach to monitoring where you're just collecting different metrics and shipping it off to some host for being able to aggregate them there? Yeah. That that's such a great question. I think if you kind of look around, there's a few different definitions of observability,

but I like to talk about it and think about it in a pretty simple way, which is observability is about understanding.

And it's about having the ability to understand your system,

your system's dependencies.

It's about being able to understand not only, sort of, the aggregate

behavior of a system, you know, in production not

you

know, what is wrong with some this 1 request, but also to what is wrong with

this 1 request, but also to be able to kind of pull back that 30, 000 foot view and understand

the entire thing, how all the pieces fit together.

And having both that very coarse and fine visibility

into what's going on means that you can do a lot of interesting things that you can't do with traditional monitoring.

There's the

sort of common, you know, unknown unknowns

theory. Right?

Where when you're building a system, when you're running a system, even there's a you don't know

it's not the things that you know about, and it's not the metrics that you maybe you're collecting and that you think you care about at the beginning, and it's not the things that you sort of discover. The things that are gonna bite you are the things that you don't even know about that you should know about. Right? So the whole principle of observability is it kind of gets down to this instrumentation level, which is where open telemetry really plays, is you need to have

a, you know, an SDK that can help you

effortlessly collect a lot of different data points and has a pretty deep integration into

your underlying frameworks and tools, and then ship that off somewhere

to a system that is capable of asking these sort of arbitrary questions.

I think, Austin, you you hit the nail on the head there. You know, observability is really about understanding.

When I think of traditional monitoring, it's always been about putting graphs on a dashboard and kind of watching for, you know, something to change on those graphs. And although those graphs and dashboards definitely play play a role in observability, they tend to only really give you part of that picture. And too often, they're kind of a result or an afterthought. Oh, you know, like, we have an outage with metrics than we have on our dashboard. Well, we better, you know, go ahead and add it right here. But when I think of observability, I think of being able to really dig to the the bottom of the behavior of a system while while code is actually running that system and and being able to answer questions like, you know, is the code doing what we expect it to? If the user is hitting an issue, would I be able to detect it? If an anomaly occurs,

is I guess, it's possible for me to dig into the cause of that anomaly right then and there,

or do I just have to wait for that anomaly to occur again? And I think observability is changing how software is being built by thinking about observing a system,

rather than using it as an afterthought.

And as you mentioned, the need to retroactively

go in and add new metrics collection points to try and understand what happens in an anomalous situation rather than being able to

automatically

capture the necessary relevant context and information is the real game changer there. And I'm wondering what you see as being some of the biggest challenges

in enabling people to capture that necessary information

cross cutting way where it's beyond just the specifics of a single log line or a particular counter that you're incrementing

when a particular function happens or certain timers that are being collected and just some of the overall goals of the OpenTelemetry

project in how it's going to help people achieve that sort of holy grail of observability.

Sure. You know, I think you raise an interesting point there, and I would kind of turn it not turn it around, but why do people not do this already? Right? There's a pretty

pervasive view, I think, in the developer community that things things like distributed tracing, things like observability are only for, like, big companies,

you know, your Googles and your Facebooks and your Microsofts, where you have just an

eye popping amount of services, independent

services,

and different line of business things and, you know, unmanageable complexity. Right? And I think that

that is

really that's an opinion that is kind of brought forth by

an unwillingness to kind of grapple with the goals of our existing tools. Because far too often,

no matter how many fancy dashboards you make, no matter how many,

you know, cool metric data points you put together, I think Alex at the right of it, a lot of times it's just like, we're making this dashboard to make this dash board because this dashboard is sort of proof of life. This dashboard is the thing we can point to when someone says, well, is it is it up? So I think 1 of the goals of OpenTelemetry

is to really help change this narrative, and I think we do that by

using the fact that OpenTelemetry is so widely supported.

It has extremely broad support from, you know, many,

many, many organizations,

people you've heard

of like Microsoft and Google and Amazon,

a huge variety of, you know, monitoring observability

tool vendors,

also open source vendor open source

maintainers, people that are creating things like Jaeger or Prometheus.

Right? So because it has this broad support, that means that we can sort of push the point of integration away from the individual dev and maybe go down a step or 2. You know, I think it's a pretty common thought when you're trying to monitor an abstraction, when you're trying to observe an abstraction, you really wanna go 1 step below it. So if I'm trying to understand the behavior of my application, you know, a good way to monitor that is to kind of go to the application container. Right? Is to go to the runtime and look at certain metrics and measurements there. With OpenTelemetry, you can have a similar sort of, behavior where

I would say a goal is to integrate it into things like Kubernetes,

things like Spring

on the Java side

or Flask on the Python side. So you're actually getting

a more holistic view of what's going on very, very easily without having to spend a bunch of time doing sort of manual instrumentation code. And in terms of the actual process of instrumentation,

I know that 1 of the

use cases for the SDKs is to automate some of that setup and be able to, out of the box, to start collecting useful information without requiring a lot of development effort upfront, but then still being able to provide the option of adding additional collection points for different metrics and traces.

And I'm wondering what the developer workflow looks like for actually being able to get set up with collecting those metrics and then being able to perform some useful analysis on them. Yeah. So I guess there's kind of there's kind of 2 ways of thinking about instrumenting an application,

at least at least in the in the Python side, and I know it's true in a lot of the other 6 as well. There's kind of what we call the auto instrumentation,

portion of of instrumenting, which,

in the Python world means that we're basically looking at what libraries are going to be utilized by a certain application. And so, if you're if you go ahead and you install the auto instrumentation

package as well as some of the auto instrumentation libraries that we already support, what ends up happening is you run a separate script

to wrap your your Python executable with, which is just called open telemetry instrument. And it basically will go ahead and instrument

all the libraries that you're using that have support for open telemetry

with, like, spans and metrics for for those libraries for you. So that's the auto instrumentation piece. The manual instrumentation piece would still, require application developers to to go ahead

and set up their

their SDKs

and start spans or start collecting metrics and the different portions of their applications that they're interested in. But a big piece of what we're trying to do here is we're trying to make sure that the correlation between,

the manual and the auto instrumentation,

all works and all of the context

flows through your application, and it also flows between the services along the wire. And I think that that

context

aspect is the real differentiating factor between just,

raw metrics of here is

a an integer that represents something, but you need to have enough internal context or awareness of the code base to understand what it means versus being able to have that context propagate along with the metric so that it's a little bit easier for somebody who doesn't

Yeah.

Yeah. Absolutely. I think context is is absolutely critical here. Otherwise, all of that data is really just data that's floating in the air. And stepping back up to the level of the more broad OpenTelemetry

project and its mission because of the fact that it is focused on support across a number of different language communities and run times and specifics of libraries and frameworks.

I'm wondering

how that benefits the overall broader software community and just some of the challenges that you face in avoiding the trap of limiting the entire capability of the ecosystem to the lowest common denominator

that exists across those different run times?

I can speak to this a little bit. You know, I think 1 of the things as, like, an open tracing maintainer, certainly, that we heard from the community was,

for lack of a better word, people were,

let's say, displeased with the quote unquote Java ness of the,

open tracing API, right?

And

from the jump with OpenTelemetry,

there's been a pretty concentrated

push

to make the look and feel of OpenTelemetry

feel very native in each language.

So

that really influenced everything from, sort of, spec writing, where, you know, 1 of there's been a pretty lengthy lengthy now project,

a kind of cross language

compatibility

SIG, that is mostly looking at, like, you know, what are the words we're using in the specification? Are we making sure that we have kind of all the the words we need in the spec to allow for individual languages to implement the spec in a way that feels native for that language? There's also

a bit of hewing to kind of whatever the dominant patterns is. Right? So I think if you look at outside of the real raw primitives, things like starting and ending a span, adding attributes or events to a span,

a lot of the stuff around metrics, you know, the

the up down counters and things like that. You know, outside of these very primitive

sort of

actions, there's a lot more flexibility in how that gets integrated into the native work flow. A good example I would probably point to is what is what's going on in the c sharp SIG. And I know we've used that word a lot for people that don't know. SIG is special interest group. That's basically the primary way we're organizing the project

around these languages or topics have SIGs.

So the C SIG is you know, Microsoft is very heavily involved in this project. And so there's actually a push to integrate OpenTelemetry

into the dot net runtime itself

and bridge it with sort of the existing,

you know, diagnostic information that is already available in dot net. And so you're seeing kind of this 2 forked approach where you can use kind of just existing primitives in the dotnet side of the shop that you might be are already using or already familiar with. It's the dotnet developer. And then with a simple bit of configuration change,

those will hook into OpenTelemetry and start emitting OpenTelemetry

spans, OpenTelemetry

metrics that can then be forwarded to some other part of the OpenTelemetry

ecosystem, like the collector component, in order to be exported elsewhere. I think with Python, you see something Go.

A lot of it is giving, you know,

making sure that each sig has Go. A lot of it is giving you know, making sure that each sig has the ability to kind of change things as they need to,

along with a pretty strong top level community that can make sure that, hey, we're all on the same page. No one's drifted too far from spec or is kind of reinventing the wheel over here in a way that they shouldn't.

Yeah. And if I can just add a little bit to that.

I think the the specification is written in a way that's specific enough around the intent of a particular definition

while live leaving enough room for languages to interpret it in a way that makes sense at a language.

And another thing we've seen that has been pretty successful is around implementation

of proposals

around that we call OTAPs, which are open telemetry enhancement proposals,

which

is basically just a process that we go through before making changes to the spec where you're you're able to propose a change that you want. And 1 thing that we've seen work well is to actually have different,

SIGs

implement an OTP as as a prototype just to get an idea for whether or not a particular concept works in the line in the language that is defined

in each of the languages that we care that we care about. And so that's kind of implemented at the at the sig level by by folks that are working into language itself. Yeah. I I think the OTEP process has been

very helpful for the project.

And

just sort of the the requirement to show your work, something that maybe a lot of open source projects could look at. You know, we didn't necessarily

get you know, I'm not going to say we came up with it ourselves. It was, Isabelle Redelmeier was actually

a very strong proponent of OTP or the OTP process and helped kind of codify it originally.

But I know she

talked to people that were very involved with creating the, KEPs, right, the Kubernetes enhancement protocols or proposals.

I don't remember what the p stands for there, but that's where a lot of the inspiration was drawn from,

I wanna say. Yeah. There are a number of different communities that have gone down that path where Python has its peps. I know Django has their own process for that. There are a number of other open source communities that are following on that. So it's definitely a good way to bring everybody into the conversation and ensure that you have as diverse of as possible of inputs to make sure that you're not just getting tunnel vision on the ways 1 thing should be implemented and bringing in the voices that are necessary to make sure that it works for the broader community. Mhmm. And then as far as the

actual

data that you're collecting for being able to gain some visibility into the systems,

We've talked about metrics. We've talked about spans. I know that there's initial support for log collection and some of the way that those should be formatted. So I'm curious if we can dig a bit more into the specifics of the information that's collected. And then maybe in terms of the Python SDK, some of the available hooks into the runtime to be able to pull that information out and propagate it appropriately.

So, basically, the types of information we're looking at collecting with the initial release of the of the project

is is really around the implementation and collect collection of distributed traces

and metrics.

That's that's kind of the the the initial goal. And,

basically, I mean, metrics could be anything from, you know, your your memory consumption, CPU,

or request timing, or or whatever it is that you you care about in your application.

And then on the distributed traces,

basically, what we're worried about there is collecting traces and spans that are distributed across your different services.

1 of the things that I think is really helpful when you think about distributed traces

is that

in a lot of ways, they're really just semantic.

They're structured logs, right? They're structured logs of context.

And 1 of the things that we've done in OpenTelemetry

is we've tried to

sort of extend the span model a little bit and add a lot more semantic

meaning to different fields,

which is something that I don't believe is gonna be, like, this is not a, you know, it's gonna not gonna revolutionize

your life tomorrow, but in a couple years now. I think being able to go and say, like, okay,

is a small example.

The status field on a stand in OpenTelemetry

supports

the gRPC

status codes. Right? So there's a lot of ways to actually classify

the work that happened under a span

in a semantically meaningful way.

The difference between, for example,

this failed due to a timeout versus this fail, you know, the difference between a 400 or a 404

or a 405. Right? But also things like,

you know, this was a context you know, the context for this request was canceled. So as analysis systems sort of catch up to the open telemetry and start to implement

the ability to

use this data to kind of derive interesting

statistical information about your system, I think you'll see

a lot of interesting innovation happen in terms of building tools that can sort of

really understand what's going on. Because I I 1 of the the sort of last mile problem exists in open telemetry as it does with most monitoring observability things. You have all this data. Great. But you still have to have someone that actually understands what the data represents in order to make heads or tails of it.

But by focusing a lot on sort of having the semantic these semantically accurate spans,

I think we'll be able to build better analysis tools in the future.

To get back to the question specifically around Python. So for the the open telemetry Python project, what we're really focusing on is just wrapping up the work around implementing the spec

around

metrics. And today, the

the implementation for tracing is already is already in beta and we're already, I think, pretty close to being done. Yeah. So between the the tracing

implementation and the metrics implementation, that's kind of the the interfaces that you you'll wanna use from from Python. And I know too that there is some initial work on being able to standardize on the formatting of log data so that it can be correlated with the spans and metrics that are being collected.

I'm wondering

how you are approaching that in the Python ecosystem, whether you're just working on using the built in logging capabilities

or leaning on something like the Struct Log project to be able to provide that structure out of the box and just define the specifics of it for the tracing capabilities or what your thoughts are on that regard? Yeah. So as

the logging capabilities

is is fairly new in

the conversation,

we haven't, as a as a project started looking at how we're going to implement it on in the OpenTelemetry

Python project yet. But I would suspect that we would want to lean on, you know, as much of the standard library as we can, wherever we can. It sort of adds some more color on the logging question. I think

I you know, logging is incubating. It's still kind of in a very fluid process right now, but I I think it's accurate to say that there's really no appetite,

in the project to create

kind of a new logging

API.

Right? There is a plethora of battle hardened

and tested

and very good logging libraries out there, you know, and I don't really think we're in the interest competing with them.

I believe the

at a really high level, I think it's actually what you said earlier was maybe the best way to describe it. OpenTelemetry has, you know, the semantic concept of linking

different forms of telemetry together.

So you could link

a measurement from a metric to

the

span that occurred while that measurement was being collected, for example, or while that measurement was being measured. And I feel like some sort of adapter,

into, you know, a

logging adapter that allows those logs to be context sensitive would be 1 potential implementation

result here. I think another

might be

you could even maybe imagine something where if you have

logs that are already

in a file format, and they have some sort of correlation

information, or, you know, they they're getting correlation information through

instrumentation, and that's going in and that's editing your law you know, that's adding in

the correlation IDs, identifiers.

Then, like, a file beat style processor that is

scraping those and then

sending them off somewhere and adding in sort of the link

through, like, the OpenTelemetry collector, for example. But this is also, like

I think this is where the thinking is more so than where the reality is. I think the the logging stuff is probably,

you know I don't know if it's gonna be

beta by the time the rest of the project is in GA. I would maybe expect it to be more of in an alpha. And then for being able to ensure that you can use open telemetry across different languages and across different service boundaries, how are you addressing the question of feature parity within different SDKs and how to signal the future completeness of a given implementation so that people can effectively evaluate them as they're building out and designing their systems and trying to gain useful observability metrics? Yeah. That's a great question.

I can talk about it kind of in the high level,

but at a high level,

you can think of there being

3 sort of big moving parts. There's the API,

there's the SDK, and then there's OTLP, which is the wire format for actually representing

the trace and metric data.

So

the API,

once we have API

at like a 1.0, and I would even suggest like right now the API is pretty close to 1.0. But once we have that, then that API will

most likely change very slowly.

The SDK you would expect can move a little more quickly under that. But I would also suggest that the OTLP,

the

wire protocol,

will also probably change slowly.

So

1 of the goals of OpenTelemetry

was to to make sure the API and SDK were decoupled.

So as long as, you know, you're using the same API level on both sides,

you could swap out individual components if you need to, assuming that you aren't violating you know, that you aren't breaking the API.

And then

since you can sort of independently upgrade the SDK components,

with OTLP, it's sort of the stop the fallback. And I don't want to say fallback, but the default way to export data.

You can, kind of, do whatever you want in the middle and the middle can move very quickly,

but it needs to start out in a pretty slow way and it needs to end in a pretty slow way. So you're not going to be able to you know, you could definitely see extensions

or

little side projects popping up that add some valuable feature that would need to be marshaled into

working through these API

or,

and then being exported through the existing

OTLP format.

To get back to the issue around feature parity, I think the the process we've taken to address

feature parity across different languages is to basically create tracking issues for any spec changes and address them as they come in on a sig by sig basis. So for each language, basically. For example, before the beta was released, there was a list put together by the technical committee for OpenTelemetry

of all the features that we we knew we needed to implement for the beta to be complete. And so basically, it was up to each segment to then go off and ensure that the list was implemented

within their own language. And anything that wasn't implemented, we would just go ahead and create an issue for and and and track it that way.

For the most part, it's actually maybe less of a huge concern because

the real basic primitives,

things like context propagation,

we're defaulting to w3c

trace context specification.

And I know there's, like, a bunch of other stuff sort of upcoming from trace context,

But the primitives are really well defined at this point. And you could even you know, as long as you're using the same sort of context propagation everywhere,

then

every like, the individual hops don't mean as much. Like, let's say I had 5 front services.

Know, as long as they're all using the same context prop, then I'll get an unbroken trace. Right? I'll I'll get an unbroken

set of tags on my metrics

because

that's all being sent around in a way everyone understands.

And there should be, you know, reasonable fallback. So, you know, if something new comes in, it goes into an additional field

and service

b has been updated and understands it and the rest of them haven't, then you would obviously have, like, oh, service b is maybe something new is happening here, right, that isn't happening everywhere else. But

the the basic functionality of tracing hasn't broken for you because you updated.

And digging more into the specifics of Python and the SDK implementation of it, can you discuss a bit about how that is actually implemented and some of the

initial design decisions or assumptions that were made early on that had to be revised or reconsidered as you got further along in the implementation of it and the overall adoption of its use? Yeah. I guess I I can't think of too many

assumptions that have been

changed since we have started seeing adoption.

Just add, I think we might we might still be a little bit too early in the project

or any assumptions that were made and change might be predating

my time on the project.

But I guess 1 of the more recent changes that occurred was around

the propagation of context

where,

initially,

we didn't have

a context API. What what we call the context API in open telemetry. And so we did have to go

to go back and and find a way to separate out that context API from the the implementation that we were using. And so that that was a fair amount of work.

But,

I can't think of other ways that we really had to rework a bunch of the assumptions that we've made originally. I actually I can't either. I I feel like admittedly I kind of

most of my work is at the community level

the think some of that is actually just a credit to

sort of the people in the sig, obviously, but also Python just at a pretty basic level.

It has all the features you really need, so you don't have to do you know what? Here's a good example I should say, I guess. I can say in process context propagation.

So if I

have a process, and then I have a span, and then I have a function that I wanna trace independently,

or I want to create a span for independently and I'm using multi threading, you know, multi threading of some sort, then I need something

to sort of marshal, like, okay, what's the act of span at any given point? And Python really just has all the tools you need to kind of code that out of the box. You don't have to deal with, like, manually passing stuff around like in Go, and you don't have the

sort of

wonky

support store you have around that, like in JavaScript, since there's no threads, there's different ways to kind of do this. So for the most part, Python has just chugged along very merrily while other SIGs were trying to kind of glom everything together. Python and I think it's also partially just like Python's pretty popular, you know? So there's already been,

you know, open tracing and open census both had very good Python

support.

And so

there hasn't really been a lot of like, oh, we need to, like, invent this thing. It's more like, well, we've done this in the past, so we'll do it, you know, this way, but we'll make it more performant or whatever.

And then for being able

to gain compatibility

and visibility into the broader ecosystem,

I know that there are specific libraries for things like Django or Flask to auto instrument the peculiarities

of those frameworks. But what is the overall strategy or approach for being able to

gain broader adoption

within the overall ecosystem of web or data or, you know, just network systems, things like that? That's a really good question. So, you know, we started out by providing instrumentation for libraries that, you know, we don't we know are are fairly popular frameworks. So like you mentioned, the Django, the Flask and Quest. And what we started seeing is that there's

there's interest from contributors and maintainers of other frameworks to build implementation libraries

that are compatible with OpenTelemetry

for their own projects,

and add to the list, which is which is actually really exciting because, you know, those folks know the tools and the libraries inside and out. And,

as far as, you know, a scaling

scaling strategy, it's a lot more scalable to have folks that are involved in those projects,

directly contributing

instrumentation than to have someone from the OpenTelemetry project go out and, you know, learn everything there is to know about any particular libraries. So I think, you know, hopefully, that's that's kind of where we're going to be driving

as open telemetry gains popularity. And, you know, 1 thing that we've seen in in for example in the dot net world, which I think Austin alluded to earlier was that there is plans to adopt OpenTelemetry

in the language itself, which you know is kind of the best possible solution for for adoption of any particular,

standard.

Yeah. I I think as a a strategy, really, our goal as

a project is to provide

sort of, you know, stable interfaces,

good quality SDK design,

and

a,

you know, a support plan. Right? Like, we we need to

be good shepherds and stewards of

what we've built and give people the confidence that they can go and they can do these integrations, right, and that we will be able to

support them through that by, you know, not boiling the sea every 2 months. I think beyond,

you

know, into this broader ecosystem, like you said, approaching

and thinking about how do we integrate this into,

you know,

more services,

more, you know, you say, see, like, managed resources, things like cloud providers

and their SDKs or

make this available through APIs.

You know, if you

like, let's say you used

a database API or

a GraphQL,

you know,

library to query something, you know, maybe

getting the person that serves that API to also have OpenTelemetry traces and then be able to independently sort of return those to you so that you could actually inspect the performance of your,

you know, your request as it goes

across the WAN into some other system.

And then

same thing with, like, you know, maybe you're running your own database. Right? Like, is there a way that we can get open telemetry

into MySQL or into Postgres or into Mongo? But I think that goes back to you know, the way that will happen is by

running the project well and having it be stable

and

have it have good release management and, you know, make it something other people can build on. And

and just 1 thing to add there. I think it's it's also important to make,

make it as easy

to to to use the libraries and the APIs

as possible for anyone who wants to provide that instrumentation so that they don't have to spend, you know, a tremendous amount of time just learning how to use a particular

tool to be able to, you know, provide benefits of using that tool for their users. And so I think 1 of the things that we're spending some time on in the Python SIG, is around providing, like, an interface for anyone who wants to provide other auto instrumentation for their library through the, like, base instrumentor class and,

also providing, you know, like, the

examples

for how some of the other instrumentations have have gone through and actually been instrumented.

And then once you have instrumented

your application, you're generating all of these data objects and context

for being able to understand what's happening in your system. You still need to have somewhere to send it to and perform analysis on it. So what are some of the existing options

for that, and what are some of the ones that are upcoming that you are keeping an eye on? Yeah. It's a great question. So

1 of the things that I think that OpenTelemetry

really dramatically simplifies is the deployment of, you know, telemetry

through tools like the OpenTelemetry Collector and through, you know, kind of a native format like OTLP.

So what we're seeing is

more organizations

start to

adopt open telemetry protocols, kind of like a native ingest format.

I know, like, at LightStep, we actually I don't know if we've, like, publicly announced it yet, but

it's it's there, so I'll talk about it.

But we'll be accepting

OTLP

formatted data to

our SaaS back end,

very soon. Like, the work is done, and I think it's actually in public right now, but we haven't updated the docs yet. So but in a sort of bigger picture, I think having the OpenTelemetry collector be this vendor neutral way to,

you know, aggregate

traces and metric data from multiple different sources and then, you know, export them to a variety of back end systems,

you know, some of which

will be, you know, open source. Like, Jaeger has already come out and kind of pledged adoption

to or said, like, hey. We're gonna switch over to using the OpenTelemetry collector instead of our own. I would expect other sort of open source projects to

you know, analysis tools to start following. You also have the ability to,

you know, write your own exporter,

and we've seen a lot of adoption

in that. So companies like Google,

Microsoft,

the Splunk, yeah, Honeycomb.

Who else?

There's probably a few I'm forgetting. Datadog. Has Datadog I don't know. Have they done a exporter yet? Yep. And, I found there's an exporter.

But either way, the collector makes it very easy for you to sort of to to really have a separation of concerns between the people that are maybe integrating

OpenTelemetry

into their code base, and then the people that are sort of responsible for

collecting all that data and sending it somewhere. You know, instead of having to redeploy your application for a config change, you can redeploy your collector

and say, like, okay. I want to send this to some other place now. So that's very cool. And then because the collector is also completely open source, you can transform that data however you like. So 1 of the things that I'm kind of keeping an eye on that I've been talking about is, like, hey, you could just write and you could convert your traces

to analytic events. Right? And then you could send them to an analytics provider, or you can, turn them into JSON and do whatever with them. Right? You can put them in a big data thing and,

you know, do all sorts of fun queries. I think it's

going

to, enable a lot of interesting

tools that

we haven't seen yet, things like

using traces

as a part of,

you know, automated testing

to validate

application behavior or logic flows. Right? You can sort of imagine situation where maybe in test, I have

a ton of instrumentation

about what's going on in my application, and maybe I turn some of that off production or whatever.

But being able to sort of take all that out as a dump and then just do a diff against the last we run it ran it to see what's changed. Like, am I calling someone else? There's a new external API, or,

did I add some did I pick up a new service dependency

or or whatnot? That sort of stuff, I think, is really cool. And then in terms of your experience

of working with the OpenTelemetry

community and working on the specification and some of the SDK implementations, what are some of the

most interesting or unexpected or challenging lessons that you've learned in the process? I think for me, on the challenging side, I think it's just how hard it is to get collecting telemetry right,

in a way that's both easy to use and useful to people. I think that's that's been really challenging.

And I think the most pleasant surprise by far has been just watching

so many different people from so many different organization working together and trying to solve these really hard problems. I think it's been it's been a really, really great experience. So if if anybody's out there looking for an open source project to join, I think this is a this is a great place to start. Yeah. I'll echo most of what Alex said there.

I'll echo all of it, but I think, you know, there's this maybe probably there's a perception issue, I guess, in the open source community

more broadly that, you know,

that vendor projects

aren't

good projects or that they're not, you know, reflective of actual community values. And I think in the case of OpenTelemetry, I would I would suggest this is maybe, you know, the counterexample

where, yes, this is a very,

you

know, vendor y project.

Most people working on it are working on it. I mean, they're passionate about the subject, but they're also working for companies that are invested in it. Right? That said,

I think any kind of project you have that is so impact full to

the bottom lines of these companies,

it's been fascinating to see how

much people are just coming together for the good of the project, right? There's not exactly there hasn't really been a lot of bad blood.

There's not a lot of like side taking, like, oh, well, we think this way because our employer thinks this way.

You know, people are genuinely

engaging in really good faith, and I think it's brought what is normally a pretty interesting industry together.

And it's cool to see a bunch of people that's like, hey. We work for different places, and our companies are competitors, but, you know, in here, we're all cool. We're all friends. We're all trying to, you know, solve this very tricky problem

in a really good way. Yeah. And I really like the just general industry trend that I've been seeing in a lot of different

areas of technology

of trying to standardize on the interfaces that are used for interoperability

so that the different specific implementations

of open source or vendor technologies can

innovate on the specifics of how they operate and the capabilities that they provide

rather than trying to lock people into their solution because they're not compatible with the

import or export

options for the other systems that you might want to compose together with them. Yeah. 100%.

Yep. Yeah. There's nothing worse than having to learn a whole new set of tooling and languages just to, like, make something work with a particular vendor.

And then in terms of the OpenTelemetry

project itself, what are the cases where it's the wrong choice and somebody might be better suited just using a standard approach to metrics and logging for being able to gain understanding into their system?

That's a tough 1. Is it never an option?

It's definitely an option.

I I I'll be

I mean, I think there's probably

it's it's an interesting question, right, because, yeah, there's obviously times that it's the wrong choice. But that's less of a technical consideration than it is sort of organization

that

in an organization that can kind of handle and adapt to it. But I think there are broad like, there are wide class applications that maybe don't benefit from the o. Like, example,

tracing has overhead.

Not a ton of overhead, but it does. Tracing on the front end, especially, you know, right now will increase the amount of JavaScript

your browser has to download. That can be a problem for people. And if it's a disqualifying problem for you, then, okay, Use something else. Right? We'll try to fix it. We'll try to make it better, but I'm not out here saying, like, oh, it's perfect, and if you don't use it, you're an idiot. I think a lot of you know, maybe embedded devices or,

you know, things like that, you're probably not gonna get the mileage.

If you don't have right

answer,

it's

at

least a right answer, it's at least a good answer. And what I the 1 thing I would say is

if you're listening to this or you're thinking, you know, the time that it's always the right answer is if you're thinking about trying to do it yourself. Right? If you've looked at your system and you said, I need distributed tracing. I need the sort of things OpenTelemetry can do, but I want to build it myself.

Then it is a 100% the right answer to use OpenTelemetry.

Because

I can guarantee you that,

collectively,

we have probably put, like, more thought and time into building OpenTelemetry

than just, like, 1 person can.

Right? Now you don't have to use OpenTelemetry

with LightStep. You don't have to use it with Datadog. You don't have to use it with any specific vendor. Right? I'm not saying,

like, oh, you have to pay us money. I am saying

don't you know,

there's a cost for everything, and going your own

way is going to hurt more in the long run,

than anything it saves upfront in terms of either time or money. Yeah. So I'll I'll look at what what Austin said there, but, also, I think it's important if, you know, if you're looking at over open telemetry and you think, oh, there's it's not doing the thing that I want for my special case or or whatever it is. I think it's important to to get involved with the community and actually bring that case up because it's likely that other people have also run into the same problems or limitations or whatever it is that's preventing adoption of open telemetry

as as it is. And so I think it's it's good to have that conversation and and at least try to understand

all those use cases. Even even if it's not supported today, it doesn't mean that it won't be supported tomorrow. And I think that's that's something that we're always looking forward to. So And so as you look to the next steps in the near to medium future of OpenTelemetry,

what do you have in store for it as far as plans

and future capabilities?

And what are the areas of contribution

that are most vital as you continue down the path of completing the general availability

road map and moving beyond that?

I mean, I can speak kind of to my my side of things. You know, at a big picture, I think, you know, more integration,

more auto instrumentation,

really trying to make it easy and fast for people to onboard and start using it and getting value from it. That's what I see, you know, this year. Right? Like, we've got kind of the basics

done. We're implementing metrics everywhere.

You know, now let's make that useful. In terms of what I would love to see people kind of coming in and doing, if you're, you know, if you're listening to this and you wanna get involved, The 2 things we really need are, 1, is user feedback. So people to actually try it out and use it and tell us what doesn't work and also tell us what works. But, also, you know, if you wanna go teach people about this, if you wanna educate people about this, we would love to have more people kind of involved in the community, helping out, writing documentation, doing examples. You know, it doesn't have to be on our website,

on the OpenTelemetry site. It can be anywhere. You know, do it on dev.2

or or put it on YouTube or whatever. But if you are interested in that, we actually have resources at our website, open telemetry. Io. Under the documentation, there's, like, a

actually have, like, a workshop there that you can

adapt and use. You know, maybe you wanna do a lunch and learn at your company. You can pick up our slides and use those to

teach other people open telemetry, which I think is a great way to get involved even if you don't wanna, you know, really get down in the in the weeds and get hub. No. And I think specifically around Python

as you mentioned Austin,

reading the docs trying out the examples. We have the open telemetry Python, read the docs website that, you know, is is up there and we would love to get some feedback on that. And from a contribution stand, we would love to see, you know, as many,

folks that are interested in implementing instrumentation for OpenTelemetry

as as possible, come in the door and, you know, attend our our SIG meetings or join us on Gitter. We're we're pretty accessible.

Well, for anybody who wants to get in touch with either of you or follow along with the work that you're doing or contribute to open telemetry, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the Pulumi project. I've started adopting cloud

resources. So definitely recommend taking a look at that if you're, you know, you're, you know, cloud resources. So definitely recommend taking a look at that if you're, you know, even if you're running existing infrastructure and you wanna start converting over, it has great options for adopting existing resources. So definitely worth a look. And with that, I'll pass it to you, Austin. Do you have any picks this week? I've been,

using a lot of Helm 3

recently. I've been getting more back into Kubernetes.

And,

wow, you know, I I I tried Helm when it was years ago, I guess, when it first came out.

And

the headaches of configuration and setup and tiller and all that were extremely frustrating, and I felt made it a not great experience. But Helm 3,

they really,

you know, came around and it it's it's exactly what you wanted to do. It the templates things and

it just works and it's 1 binary and you don't have to install or configure stuff. It's it's great. I love it if you've seen Helm before and been burned by Tiller or by RBAC or anything else.

Definitely recommend giving Helm 3 a shot. And, Alex, how about you? Yeah. So I guess, my pick of the week is Algorithms to Live by the Computer Science of Human Decisions by Brian Christian and Tom Griffiths. It's a great

it's a great book, talking about how

algorithms can be used in everyday life for

applying, you

know, computer

algorithms to how to make decisions,

as humans. It's it's actually a really great and entertaining read. Yeah. I enjoyed reading that 1 myself, so I'll second that pick.

So I'd like to thank the both of you for taking the time today to join me and discuss the work you're doing with OpenTelemetry.

It's definitely a very interesting project and 1 that I plan to start adopting for my own system. So I appreciate all of the effort that you and everyone else involved has put into it, and I hope you enjoy the rest of your day. Great. Thanks for having us. Yeah. Thanks for having us. I'm looking forward to your feedback.

Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast dot com for the latest on modern data management.

And visit the site of pythonpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Closing Announcements

Links

The Python Podcast.__init__