Summary
Once you release an application into production it can be difficult to understand all of the ways that it is interacting with the systems that it integrates with. The OpenTracing project and its accompanying ecosystem of technologies aims to make observability of your systems more accessible. In this episode Austin Parker and Alex Boten explain how the correlation of tracing and metrics collection improves visibility of how your software is behaving, how you can use the Python SDK to automatically instrument your applications, and their vision for the future of observability as the OpenTelemetry standard gains broader adoption.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Austin Parker and Alex Boten about the OpenTelemetry project and its efforts to standardize the collection and analysis of observability data for your applications
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what OpenTelemetry is and some of the story behind it?
- How do you define observability and in what ways is it separate from the "traditional" approach to monitoring?
- What are the goals of the OpenTelemetry project?
- For someone who wants to begin using OpenTelemetry clients in their Python application, what is the process of integrating it into their application?
- How does the definition and adoption of a cross-language standard for telemetry data benefit the broader software community?
- How do you avoid the trap of limiting the whole ecosystem to the lowest common denominator?
- What types of information are you focused on collecting and analyzing to gain insights into the behavior of applications and systems?
- What are some of the challenges that are commonly faced in interpreting the collected data?
- With so many implementations of the specification, how are you addressing issues of feature parity?
- For the Python SDK, how is it implemented?
- What are some of the initial designs or assumptions that have had to be revised or reconsidered as it gains adoption?
- What is your approach to integration with the broader ecosystem of tools and frameworks in the Python community?
- What are some of the interesting or unexpected challenges that you have faced or lessons that you have learned while working on instrumentation of Python projects?
- Once an application is instrumented, what are the options for delivering and storing the collected data?
- What are some of the most interesting, unexpected, or challenging lessons that you have learned while working on and with the OpenTelemetry ecosystem?
- What are some of the most interesting, innovative, or unexpected ways that you have seen components in the OpenTelemetry ecosystem used?
- When is OpenTelemetry the wrong choice?
- What is in store for the future of the OpenTelemetry project?
Keep In Touch
- Austin
- @austinlparker on Twitter
- austinlparker on GitHub
- Alex
- @codeboten on Twitter
- codeboten on GitHub
Picks
- Tobias
- Austin
- Alex
- Algorithms To Live By: The Computer Science Of Everyday Decisions by Brian Christian and Tom Griffiths
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- OpenTelemetry
- Lightstep
- OpenTracing
- OpenCensus
- Distributed Tracing
- Jaeger
- Zipkin
- Observability
- Kubernetes
- Spring
- Flask
- gRPC
- Structlog
- Filebeat
- W3C Trace Context
- OpenTelemetry Python SDK
- OpenTelemetry Django
- OpenTelemetry Flask
- OpenTelemetry Collector
- OTLP == Open Telemetry Protocol
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, s 3 compatible object storage, and worldwide data centers.
Go to pythonpodcast.com/linode, that's l I n o d e today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today, I'm interviewing Austin Parker and Alex Bowden about the OpenTelemetry project and its efforts to standardize the collection and analysis of observability data for your applications.
[00:01:11] Unknown:
So, Austin, can you start by introducing yourself? Sure thing. Hi. I'm Austin Parker. I'm principal developer advocate at Light Step, and I'm a maintainer on the OpenTelemetry project.
[00:01:21] Unknown:
And, Alex, how about you? Yeah. Sure. Hi. I'm Alex Boden. I'm an open source software engineer at LightStep, and I'm a contributor to the OpenTelemetry project, as well as 1 of the maintainers on the OpenTelemetry Python project.
[00:01:34] Unknown:
And so going back to you, Austin, do you remember how you first got introduced to Python?
[00:01:38] Unknown:
Oh, actually, I think by the first time I ever used Python was, way back in college. I feel like that's I want to say maybe my intro to programming type of thing. Before that, I've done a lot with just general scripting, things like Apple script on a Mac and other forms of, you know, like bash scripting and shell scripting. But Python was kind of new to me, and I I really enjoyed it. Actually, I still love to work in Python. And, Alex, how about you? Yeah. So I guess I remember the first time I heard about Python was people complaining from the Java world about the white spacing
[00:02:14] Unknown:
that Python required. But, you know, at the time, I hadn't really used it. And I was really introduced to it about 6 years ago when I joined the team, that was building a platform as a service, and API and the CLI were all written in Python on top of Docker. And so, basically, I jumped in, and I just needed what I needed to to learn the team, move forward, using Python 27 at the time, I think. And in terms of the OpenTelemetry
[00:02:39] Unknown:
project, can you each give a bit of background as to how you each got involved with it and maybe describe a bit about what the project is and some of the story behind it? Sure. I guess I'll start off. So I actually came into OpenTelemetry
[00:02:51] Unknown:
from 1 of its predecessors, OpenTracing. Kind of the short version of the story is in, I wanna say, 2016, 2017 or so, OpenTracing came out, as an open source project. And the goal was really to provide a standard unified API for distributed tracing across multiple different languages. So everything from Java to Python, c sharp, and so forth. And this you know, the underlying idea here was that distributed tracing was, you know, very useful, but often people would apply it in sort of a polyglot environment where you had some services running in Node and some in Python and some in Java. And you wanna be able to for useful for distributed tracing specifically to be useful, you you really want it to be able to kind of go through the entire request, you know, nose to tail as it were. So you need some standard idioms. You need some standard, you know think about it as adjectives and verbs. Right? You need these standard things, between your different languages. So open tracing was designed to fill that hole in the ecosystem.
And it was an API, you know, that had to be reimplemented by different vendors. So you'd have a Jaeger implementation or a Zipkin implementation, but your actual tracing code was independent. So I could use the same tracing code with any number of vendors, and it was great. Now in reality, maybe it didn't work out quite as well as we had hoped. 2018 or so rolls around. 20/17/2018, I wanna say, you saw OpenCensus kind of arrive as a competitor to open tracing in a lot of ways. And OpenCensus was a Google project. Microsoft also joined in. And it had a similar sort of concepts like, Hey, we want a bit of tracing, we want it to be polyglot, but we want to provide, sort of, both the API and the SDK, and then let you, you know, have a pluggable, sort of, exporter model. These 2 projects coexisted for a while, but it was causing confusion in, like, the broader open source community. People weren't sure which to use. You know, I had open source library authors come and say, like, well, I have customers or I have users that want to use tracing, but which 1 of these 2 things should I pick? And, ultimately, I I think what happened really is everyone sort of looked at the situation and said, hey. This is kind of join forces.
Let's do kind of join forces. Let's do a best of both worlds approach. And out of that, you was born OpenTelemetry.
[00:05:31] Unknown:
And, Alex, how did you get involved in the OpenTelemetry project? Yeah. So I I actually came came out of it from the other side. I was I was the user of both,
[00:05:39] Unknown:
open sentence and open tracing. So I was I was right in the middle of all that confusion about, I don't know, maybe a year and a half, 2 years ago now. And, I was introduced to the OpenTelemetry project much like everybody else on the on the user community around the announcements that was done about a year a year and a few months ago now. And, I really joined the project as a contributor when I joined LightStep about 7 or 8 months ago now, And that that was my first,
[00:06:07] Unknown:
real introduction to the project. And so the tagline of OpenTelemetry is that it's trying to help make observability data easier to collect and access. And before we get too much further into the specifics of how it does that, I'm wondering if you can give a bit of a definition as to how you think about observability, and in what ways it's separate from the quote unquote traditional approach to monitoring where you're just collecting different metrics and shipping it off to some host for being able to aggregate them there? Yeah. That that's such a great question. I think if you kind of look around, there's a few different definitions of observability,
[00:06:43] Unknown:
but I like to talk about it and think about it in a pretty simple way, which is observability is about understanding. And it's about having the ability to understand your system, your system's dependencies. It's about being able to understand not only, sort of, the aggregate behavior of a system, you know, in production not you know, what is wrong with some this 1 request, but also to what is wrong with this 1 request, but also to be able to kind of pull back that 30, 000 foot view and understand the entire thing, how all the pieces fit together. And having both that very coarse and fine visibility into what's going on means that you can do a lot of interesting things that you can't do with traditional monitoring.
There's the sort of common, you know, unknown unknowns theory. Right? Where when you're building a system, when you're running a system, even there's a you don't know it's not the things that you know about, and it's not the metrics that you maybe you're collecting and that you think you care about at the beginning, and it's not the things that you sort of discover. The things that are gonna bite you are the things that you don't even know about that you should know about. Right? So the whole principle of observability is it kind of gets down to this instrumentation level, which is where open telemetry really plays, is you need to have a, you know, an SDK that can help you effortlessly collect a lot of different data points and has a pretty deep integration into your underlying frameworks and tools, and then ship that off somewhere to a system that is capable of asking these sort of arbitrary questions.
[00:08:22] Unknown:
I think, Austin, you you hit the nail on the head there. You know, observability is really about understanding. When I think of traditional monitoring, it's always been about putting graphs on a dashboard and kind of watching for, you know, something to change on those graphs. And although those graphs and dashboards definitely play play a role in observability, they tend to only really give you part of that picture. And too often, they're kind of a result or an afterthought. Oh, you know, like, we have an outage with metrics than we have on our dashboard. Well, we better, you know, go ahead and add it right here. But when I think of observability, I think of being able to really dig to the the bottom of the behavior of a system while while code is actually running that system and and being able to answer questions like, you know, is the code doing what we expect it to? If the user is hitting an issue, would I be able to detect it? If an anomaly occurs, is I guess, it's possible for me to dig into the cause of that anomaly right then and there, or do I just have to wait for that anomaly to occur again? And I think observability is changing how software is being built by thinking about observing a system, rather than using it as an afterthought.
[00:09:25] Unknown:
And as you mentioned, the need to retroactively go in and add new metrics collection points to try and understand what happens in an anomalous situation rather than being able to automatically capture the necessary relevant context and information is the real game changer there. And I'm wondering what you see as being some of the biggest challenges in enabling people to capture that necessary information cross cutting way where it's beyond just the specifics of a single log line or a particular counter that you're incrementing when a particular function happens or certain timers that are being collected and just some of the overall goals of the OpenTelemetry project in how it's going to help people achieve that sort of holy grail of observability.
[00:10:13] Unknown:
Sure. You know, I think you raise an interesting point there, and I would kind of turn it not turn it around, but why do people not do this already? Right? There's a pretty pervasive view, I think, in the developer community that things things like distributed tracing, things like observability are only for, like, big companies, you know, your Googles and your Facebooks and your Microsofts, where you have just an eye popping amount of services, independent services, and different line of business things and, you know, unmanageable complexity. Right? And I think that that is really that's an opinion that is kind of brought forth by an unwillingness to kind of grapple with the goals of our existing tools. Because far too often, no matter how many fancy dashboards you make, no matter how many, you know, cool metric data points you put together, I think Alex at the right of it, a lot of times it's just like, we're making this dashboard to make this dash board because this dashboard is sort of proof of life. This dashboard is the thing we can point to when someone says, well, is it is it up? So I think 1 of the goals of OpenTelemetry is to really help change this narrative, and I think we do that by using the fact that OpenTelemetry is so widely supported.
It has extremely broad support from, you know, many, many, many organizations, people you've heard of like Microsoft and Google and Amazon, a huge variety of, you know, monitoring observability tool vendors, also open source vendor open source maintainers, people that are creating things like Jaeger or Prometheus. Right? So because it has this broad support, that means that we can sort of push the point of integration away from the individual dev and maybe go down a step or 2. You know, I think it's a pretty common thought when you're trying to monitor an abstraction, when you're trying to observe an abstraction, you really wanna go 1 step below it. So if I'm trying to understand the behavior of my application, you know, a good way to monitor that is to kind of go to the application container. Right? Is to go to the runtime and look at certain metrics and measurements there. With OpenTelemetry, you can have a similar sort of, behavior where I would say a goal is to integrate it into things like Kubernetes, things like Spring on the Java side or Flask on the Python side. So you're actually getting a more holistic view of what's going on very, very easily without having to spend a bunch of time doing sort of manual instrumentation code. And in terms of the actual process of instrumentation,
[00:12:45] Unknown:
I know that 1 of the use cases for the SDKs is to automate some of that setup and be able to, out of the box, to start collecting useful information without requiring a lot of development effort upfront, but then still being able to provide the option of adding additional collection points for different metrics and traces. And I'm wondering what the developer workflow looks like for actually being able to get set up with collecting those metrics and then being able to perform some useful analysis on them. Yeah. So I guess there's kind of there's kind of 2 ways of thinking about instrumenting an application,
[00:13:22] Unknown:
at least at least in the in the Python side, and I know it's true in a lot of the other 6 as well. There's kind of what we call the auto instrumentation, portion of of instrumenting, which, in the Python world means that we're basically looking at what libraries are going to be utilized by a certain application. And so, if you're if you go ahead and you install the auto instrumentation package as well as some of the auto instrumentation libraries that we already support, what ends up happening is you run a separate script to wrap your your Python executable with, which is just called open telemetry instrument. And it basically will go ahead and instrument all the libraries that you're using that have support for open telemetry with, like, spans and metrics for for those libraries for you. So that's the auto instrumentation piece. The manual instrumentation piece would still, require application developers to to go ahead and set up their their SDKs and start spans or start collecting metrics and the different portions of their applications that they're interested in. But a big piece of what we're trying to do here is we're trying to make sure that the correlation between, the manual and the auto instrumentation, all works and all of the context flows through your application, and it also flows between the services along the wire. And I think that that
[00:14:38] Unknown:
context aspect is the real differentiating factor between just, raw metrics of here is a an integer that represents something, but you need to have enough internal context or awareness of the code base to understand what it means versus being able to have that context propagate along with the metric so that it's a little bit easier for somebody who doesn't Yeah.
[00:15:03] Unknown:
Yeah. Absolutely. I think context is is absolutely critical here. Otherwise, all of that data is really just data that's floating in the air. And stepping back up to the level of the more broad OpenTelemetry
[00:15:17] Unknown:
project and its mission because of the fact that it is focused on support across a number of different language communities and run times and specifics of libraries and frameworks. I'm wondering how that benefits the overall broader software community and just some of the challenges that you face in avoiding the trap of limiting the entire capability of the ecosystem to the lowest common denominator that exists across those different run times?
[00:15:44] Unknown:
I can speak to this a little bit. You know, I think 1 of the things as, like, an open tracing maintainer, certainly, that we heard from the community was, for lack of a better word, people were, let's say, displeased with the quote unquote Java ness of the, open tracing API, right? And from the jump with OpenTelemetry, there's been a pretty concentrated push to make the look and feel of OpenTelemetry feel very native in each language. So that really influenced everything from, sort of, spec writing, where, you know, 1 of there's been a pretty lengthy lengthy now project, a kind of cross language compatibility SIG, that is mostly looking at, like, you know, what are the words we're using in the specification? Are we making sure that we have kind of all the the words we need in the spec to allow for individual languages to implement the spec in a way that feels native for that language? There's also a bit of hewing to kind of whatever the dominant patterns is. Right? So I think if you look at outside of the real raw primitives, things like starting and ending a span, adding attributes or events to a span, a lot of the stuff around metrics, you know, the the up down counters and things like that. You know, outside of these very primitive sort of actions, there's a lot more flexibility in how that gets integrated into the native work flow. A good example I would probably point to is what is what's going on in the c sharp SIG. And I know we've used that word a lot for people that don't know. SIG is special interest group. That's basically the primary way we're organizing the project around these languages or topics have SIGs.
So the C SIG is you know, Microsoft is very heavily involved in this project. And so there's actually a push to integrate OpenTelemetry into the dot net runtime itself and bridge it with sort of the existing, you know, diagnostic information that is already available in dot net. And so you're seeing kind of this 2 forked approach where you can use kind of just existing primitives in the dotnet side of the shop that you might be are already using or already familiar with. It's the dotnet developer. And then with a simple bit of configuration change, those will hook into OpenTelemetry and start emitting OpenTelemetry spans, OpenTelemetry metrics that can then be forwarded to some other part of the OpenTelemetry ecosystem, like the collector component, in order to be exported elsewhere. I think with Python, you see something Go.
A lot of it is giving, you know, making sure that each sig has Go. A lot of it is giving you know, making sure that each sig has the ability to kind of change things as they need to, along with a pretty strong top level community that can make sure that, hey, we're all on the same page. No one's drifted too far from spec or is kind of reinventing the wheel over here in a way that they shouldn't.
[00:18:42] Unknown:
Yeah. And if I can just add a little bit to that. I think the the specification is written in a way that's specific enough around the intent of a particular definition while live leaving enough room for languages to interpret it in a way that makes sense at a language. And another thing we've seen that has been pretty successful is around implementation of proposals around that we call OTAPs, which are open telemetry enhancement proposals, which is basically just a process that we go through before making changes to the spec where you're you're able to propose a change that you want. And 1 thing that we've seen work well is to actually have different, SIGs implement an OTP as as a prototype just to get an idea for whether or not a particular concept works in the line in the language that is defined in each of the languages that we care that we care about. And so that's kind of implemented at the at the sig level by by folks that are working into language itself. Yeah. I I think the OTEP process has been
[00:19:38] Unknown:
very helpful for the project. And just sort of the the requirement to show your work, something that maybe a lot of open source projects could look at. You know, we didn't necessarily get you know, I'm not going to say we came up with it ourselves. It was, Isabelle Redelmeier was actually a very strong proponent of OTP or the OTP process and helped kind of codify it originally. But I know she talked to people that were very involved with creating the, KEPs, right, the Kubernetes enhancement protocols or proposals. I don't remember what the p stands for there, but that's where a lot of the inspiration was drawn from,
[00:20:14] Unknown:
I wanna say. Yeah. There are a number of different communities that have gone down that path where Python has its peps. I know Django has their own process for that. There are a number of other open source communities that are following on that. So it's definitely a good way to bring everybody into the conversation and ensure that you have as diverse of as possible of inputs to make sure that you're not just getting tunnel vision on the ways 1 thing should be implemented and bringing in the voices that are necessary to make sure that it works for the broader community. Mhmm. And then as far as the actual data that you're collecting for being able to gain some visibility into the systems, We've talked about metrics. We've talked about spans. I know that there's initial support for log collection and some of the way that those should be formatted. So I'm curious if we can dig a bit more into the specifics of the information that's collected. And then maybe in terms of the Python SDK, some of the available hooks into the runtime to be able to pull that information out and propagate it appropriately.
[00:21:17] Unknown:
So, basically, the types of information we're looking at collecting with the initial release of the of the project is is really around the implementation and collect collection of distributed traces and metrics. That's that's kind of the the the initial goal. And, basically, I mean, metrics could be anything from, you know, your your memory consumption, CPU, or request timing, or or whatever it is that you you care about in your application. And then on the distributed traces, basically, what we're worried about there is collecting traces and spans that are distributed across your different services.
[00:21:55] Unknown:
1 of the things that I think is really helpful when you think about distributed traces is that in a lot of ways, they're really just semantic. They're structured logs, right? They're structured logs of context. And 1 of the things that we've done in OpenTelemetry is we've tried to sort of extend the span model a little bit and add a lot more semantic meaning to different fields, which is something that I don't believe is gonna be, like, this is not a, you know, it's gonna not gonna revolutionize your life tomorrow, but in a couple years now. I think being able to go and say, like, okay, is a small example.
The status field on a stand in OpenTelemetry supports the gRPC status codes. Right? So there's a lot of ways to actually classify the work that happened under a span in a semantically meaningful way. The difference between, for example, this failed due to a timeout versus this fail, you know, the difference between a 400 or a 404 or a 405. Right? But also things like, you know, this was a context you know, the context for this request was canceled. So as analysis systems sort of catch up to the open telemetry and start to implement the ability to use this data to kind of derive interesting statistical information about your system, I think you'll see a lot of interesting innovation happen in terms of building tools that can sort of really understand what's going on. Because I I 1 of the the sort of last mile problem exists in open telemetry as it does with most monitoring observability things. You have all this data. Great. But you still have to have someone that actually understands what the data represents in order to make heads or tails of it.
But by focusing a lot on sort of having the semantic these semantically accurate spans, I think we'll be able to build better analysis tools in the future.
[00:23:48] Unknown:
To get back to the question specifically around Python. So for the the open telemetry Python project, what we're really focusing on is just wrapping up the work around implementing the spec around metrics. And today, the the implementation for tracing is already is already in beta and we're already, I think, pretty close to being done. Yeah. So between the the tracing
[00:24:09] Unknown:
implementation and the metrics implementation, that's kind of the the interfaces that you you'll wanna use from from Python. And I know too that there is some initial work on being able to standardize on the formatting of log data so that it can be correlated with the spans and metrics that are being collected. I'm wondering how you are approaching that in the Python ecosystem, whether you're just working on using the built in logging capabilities or leaning on something like the Struct Log project to be able to provide that structure out of the box and just define the specifics of it for the tracing capabilities or what your thoughts are on that regard? Yeah. So as
[00:24:51] Unknown:
the logging capabilities is is fairly new in the conversation, we haven't, as a as a project started looking at how we're going to implement it on in the OpenTelemetry Python project yet. But I would suspect that we would want to lean on, you know, as much of the standard library as we can, wherever we can. It sort of adds some more color on the logging question. I think
[00:25:17] Unknown:
I you know, logging is incubating. It's still kind of in a very fluid process right now, but I I think it's accurate to say that there's really no appetite, in the project to create kind of a new logging API. Right? There is a plethora of battle hardened and tested and very good logging libraries out there, you know, and I don't really think we're in the interest competing with them. I believe the at a really high level, I think it's actually what you said earlier was maybe the best way to describe it. OpenTelemetry has, you know, the semantic concept of linking different forms of telemetry together.
So you could link a measurement from a metric to the span that occurred while that measurement was being collected, for example, or while that measurement was being measured. And I feel like some sort of adapter, into, you know, a logging adapter that allows those logs to be context sensitive would be 1 potential implementation result here. I think another might be you could even maybe imagine something where if you have logs that are already in a file format, and they have some sort of correlation information, or, you know, they they're getting correlation information through instrumentation, and that's going in and that's editing your law you know, that's adding in the correlation IDs, identifiers.
Then, like, a file beat style processor that is scraping those and then sending them off somewhere and adding in sort of the link through, like, the OpenTelemetry collector, for example. But this is also, like I think this is where the thinking is more so than where the reality is. I think the the logging stuff is probably, you know I don't know if it's gonna be
[00:27:17] Unknown:
beta by the time the rest of the project is in GA. I would maybe expect it to be more of in an alpha. And then for being able to ensure that you can use open telemetry across different languages and across different service boundaries, how are you addressing the question of feature parity within different SDKs and how to signal the future completeness of a given implementation so that people can effectively evaluate them as they're building out and designing their systems and trying to gain useful observability metrics? Yeah. That's a great question.
[00:27:50] Unknown:
I can talk about it kind of in the high level, but at a high level, you can think of there being 3 sort of big moving parts. There's the API, there's the SDK, and then there's OTLP, which is the wire format for actually representing the trace and metric data. So the API, once we have API at like a 1.0, and I would even suggest like right now the API is pretty close to 1.0. But once we have that, then that API will most likely change very slowly. The SDK you would expect can move a little more quickly under that. But I would also suggest that the OTLP, the wire protocol, will also probably change slowly.
So 1 of the goals of OpenTelemetry was to to make sure the API and SDK were decoupled. So as long as, you know, you're using the same API level on both sides, you could swap out individual components if you need to, assuming that you aren't violating you know, that you aren't breaking the API. And then since you can sort of independently upgrade the SDK components, with OTLP, it's sort of the stop the fallback. And I don't want to say fallback, but the default way to export data. You can, kind of, do whatever you want in the middle and the middle can move very quickly, but it needs to start out in a pretty slow way and it needs to end in a pretty slow way. So you're not going to be able to you know, you could definitely see extensions or little side projects popping up that add some valuable feature that would need to be marshaled into working through these API or, and then being exported through the existing OTLP format.
[00:29:38] Unknown:
To get back to the issue around feature parity, I think the the process we've taken to address feature parity across different languages is to basically create tracking issues for any spec changes and address them as they come in on a sig by sig basis. So for each language, basically. For example, before the beta was released, there was a list put together by the technical committee for OpenTelemetry of all the features that we we knew we needed to implement for the beta to be complete. And so basically, it was up to each segment to then go off and ensure that the list was implemented within their own language. And anything that wasn't implemented, we would just go ahead and create an issue for and and and track it that way.
[00:30:17] Unknown:
For the most part, it's actually maybe less of a huge concern because the real basic primitives, things like context propagation, we're defaulting to w3c trace context specification. And I know there's, like, a bunch of other stuff sort of upcoming from trace context, But the primitives are really well defined at this point. And you could even you know, as long as you're using the same sort of context propagation everywhere, then every like, the individual hops don't mean as much. Like, let's say I had 5 front services. Know, as long as they're all using the same context prop, then I'll get an unbroken trace. Right? I'll I'll get an unbroken set of tags on my metrics because that's all being sent around in a way everyone understands.
And there should be, you know, reasonable fallback. So, you know, if something new comes in, it goes into an additional field and service b has been updated and understands it and the rest of them haven't, then you would obviously have, like, oh, service b is maybe something new is happening here, right, that isn't happening everywhere else. But the the basic functionality of tracing hasn't broken for you because you updated.
[00:31:30] Unknown:
And digging more into the specifics of Python and the SDK implementation of it, can you discuss a bit about how that is actually implemented and some of the initial design decisions or assumptions that were made early on that had to be revised or reconsidered as you got further along in the implementation of it and the overall adoption of its use? Yeah. I guess I I can't think of too many
[00:31:58] Unknown:
assumptions that have been changed since we have started seeing adoption. Just add, I think we might we might still be a little bit too early in the project or any assumptions that were made and change might be predating my time on the project. But I guess 1 of the more recent changes that occurred was around the propagation of context where, initially, we didn't have a context API. What what we call the context API in open telemetry. And so we did have to go to go back and and find a way to separate out that context API from the the implementation that we were using. And so that that was a fair amount of work. But, I can't think of other ways that we really had to rework a bunch of the assumptions that we've made originally. I actually I can't either. I I feel like admittedly I kind of
[00:32:51] Unknown:
most of my work is at the community level the think some of that is actually just a credit to sort of the people in the sig, obviously, but also Python just at a pretty basic level. It has all the features you really need, so you don't have to do you know what? Here's a good example I should say, I guess. I can say in process context propagation. So if I have a process, and then I have a span, and then I have a function that I wanna trace independently, or I want to create a span for independently and I'm using multi threading, you know, multi threading of some sort, then I need something to sort of marshal, like, okay, what's the act of span at any given point? And Python really just has all the tools you need to kind of code that out of the box. You don't have to deal with, like, manually passing stuff around like in Go, and you don't have the sort of wonky support store you have around that, like in JavaScript, since there's no threads, there's different ways to kind of do this. So for the most part, Python has just chugged along very merrily while other SIGs were trying to kind of glom everything together. Python and I think it's also partially just like Python's pretty popular, you know? So there's already been, you know, open tracing and open census both had very good Python support.
And so there hasn't really been a lot of like, oh, we need to, like, invent this thing. It's more like, well, we've done this in the past, so we'll do it, you know, this way, but we'll make it more performant or whatever.
[00:34:26] Unknown:
And then for being able to gain compatibility and visibility into the broader ecosystem, I know that there are specific libraries for things like Django or Flask to auto instrument the peculiarities of those frameworks. But what is the overall strategy or approach for being able to gain broader adoption
[00:34:48] Unknown:
within the overall ecosystem of web or data or, you know, just network systems, things like that? That's a really good question. So, you know, we started out by providing instrumentation for libraries that, you know, we don't we know are are fairly popular frameworks. So like you mentioned, the Django, the Flask and Quest. And what we started seeing is that there's there's interest from contributors and maintainers of other frameworks to build implementation libraries that are compatible with OpenTelemetry for their own projects, and add to the list, which is which is actually really exciting because, you know, those folks know the tools and the libraries inside and out. And, as far as, you know, a scaling scaling strategy, it's a lot more scalable to have folks that are involved in those projects, directly contributing instrumentation than to have someone from the OpenTelemetry project go out and, you know, learn everything there is to know about any particular libraries. So I think, you know, hopefully, that's that's kind of where we're going to be driving as open telemetry gains popularity. And, you know, 1 thing that we've seen in in for example in the dot net world, which I think Austin alluded to earlier was that there is plans to adopt OpenTelemetry in the language itself, which you know is kind of the best possible solution for for adoption of any particular, standard.
[00:36:09] Unknown:
Yeah. I I think as a a strategy, really, our goal as a project is to provide sort of, you know, stable interfaces, good quality SDK design, and a, you know, a support plan. Right? Like, we we need to be good shepherds and stewards of what we've built and give people the confidence that they can go and they can do these integrations, right, and that we will be able to support them through that by, you know, not boiling the sea every 2 months. I think beyond, you know, into this broader ecosystem, like you said, approaching and thinking about how do we integrate this into, you know, more services, more, you know, you say, see, like, managed resources, things like cloud providers and their SDKs or make this available through APIs.
You know, if you like, let's say you used a database API or a GraphQL, you know, library to query something, you know, maybe getting the person that serves that API to also have OpenTelemetry traces and then be able to independently sort of return those to you so that you could actually inspect the performance of your, you know, your request as it goes across the WAN into some other system. And then same thing with, like, you know, maybe you're running your own database. Right? Like, is there a way that we can get open telemetry into MySQL or into Postgres or into Mongo? But I think that goes back to you know, the way that will happen is by running the project well and having it be stable and have it have good release management and, you know, make it something other people can build on. And
[00:37:58] Unknown:
and just 1 thing to add there. I think it's it's also important to make, make it as easy to to to use the libraries and the APIs as possible for anyone who wants to provide that instrumentation so that they don't have to spend, you know, a tremendous amount of time just learning how to use a particular tool to be able to, you know, provide benefits of using that tool for their users. And so I think 1 of the things that we're spending some time on in the Python SIG, is around providing, like, an interface for anyone who wants to provide other auto instrumentation for their library through the, like, base instrumentor class and, also providing, you know, like, the examples for how some of the other instrumentations have have gone through and actually been instrumented.
[00:38:42] Unknown:
And then once you have instrumented your application, you're generating all of these data objects and context for being able to understand what's happening in your system. You still need to have somewhere to send it to and perform analysis on it. So what are some of the existing options for that, and what are some of the ones that are upcoming that you are keeping an eye on? Yeah. It's a great question. So
[00:39:07] Unknown:
1 of the things that I think that OpenTelemetry really dramatically simplifies is the deployment of, you know, telemetry through tools like the OpenTelemetry Collector and through, you know, kind of a native format like OTLP. So what we're seeing is more organizations start to adopt open telemetry protocols, kind of like a native ingest format. I know, like, at LightStep, we actually I don't know if we've, like, publicly announced it yet, but it's it's there, so I'll talk about it. But we'll be accepting OTLP formatted data to our SaaS back end, very soon. Like, the work is done, and I think it's actually in public right now, but we haven't updated the docs yet. So but in a sort of bigger picture, I think having the OpenTelemetry collector be this vendor neutral way to, you know, aggregate traces and metric data from multiple different sources and then, you know, export them to a variety of back end systems, you know, some of which will be, you know, open source. Like, Jaeger has already come out and kind of pledged adoption to or said, like, hey. We're gonna switch over to using the OpenTelemetry collector instead of our own. I would expect other sort of open source projects to you know, analysis tools to start following. You also have the ability to, you know, write your own exporter, and we've seen a lot of adoption in that. So companies like Google, Microsoft, the Splunk, yeah, Honeycomb.
Who else? There's probably a few I'm forgetting. Datadog. Has Datadog I don't know. Have they done a exporter yet? Yep. And, I found there's an exporter. But either way, the collector makes it very easy for you to sort of to to really have a separation of concerns between the people that are maybe integrating OpenTelemetry into their code base, and then the people that are sort of responsible for collecting all that data and sending it somewhere. You know, instead of having to redeploy your application for a config change, you can redeploy your collector and say, like, okay. I want to send this to some other place now. So that's very cool. And then because the collector is also completely open source, you can transform that data however you like. So 1 of the things that I'm kind of keeping an eye on that I've been talking about is, like, hey, you could just write and you could convert your traces to analytic events. Right? And then you could send them to an analytics provider, or you can, turn them into JSON and do whatever with them. Right? You can put them in a big data thing and, you know, do all sorts of fun queries. I think it's going to, enable a lot of interesting tools that we haven't seen yet, things like using traces as a part of, you know, automated testing to validate application behavior or logic flows. Right? You can sort of imagine situation where maybe in test, I have a ton of instrumentation about what's going on in my application, and maybe I turn some of that off production or whatever.
But being able to sort of take all that out as a dump and then just do a diff against the last we run it ran it to see what's changed. Like, am I calling someone else? There's a new external API, or, did I add some did I pick up a new service dependency or or whatnot? That sort of stuff, I think, is really cool. And then in terms of your experience
[00:42:31] Unknown:
of working with the OpenTelemetry community and working on the specification and some of the SDK implementations, what are some of the
[00:42:38] Unknown:
most interesting or unexpected or challenging lessons that you've learned in the process? I think for me, on the challenging side, I think it's just how hard it is to get collecting telemetry right, in a way that's both easy to use and useful to people. I think that's that's been really challenging. And I think the most pleasant surprise by far has been just watching so many different people from so many different organization working together and trying to solve these really hard problems. I think it's been it's been a really, really great experience. So if if anybody's out there looking for an open source project to join, I think this is a this is a great place to start. Yeah. I'll echo most of what Alex said there.
[00:43:18] Unknown:
I'll echo all of it, but I think, you know, there's this maybe probably there's a perception issue, I guess, in the open source community more broadly that, you know, that vendor projects aren't good projects or that they're not, you know, reflective of actual community values. And I think in the case of OpenTelemetry, I would I would suggest this is maybe, you know, the counterexample where, yes, this is a very, you know, vendor y project. Most people working on it are working on it. I mean, they're passionate about the subject, but they're also working for companies that are invested in it. Right? That said, I think any kind of project you have that is so impact full to the bottom lines of these companies, it's been fascinating to see how much people are just coming together for the good of the project, right? There's not exactly there hasn't really been a lot of bad blood.
There's not a lot of like side taking, like, oh, well, we think this way because our employer thinks this way. You know, people are genuinely engaging in really good faith, and I think it's brought what is normally a pretty interesting industry together. And it's cool to see a bunch of people that's like, hey. We work for different places, and our companies are competitors, but, you know, in here, we're all cool. We're all friends. We're all trying to, you know, solve this very tricky problem
[00:44:38] Unknown:
in a really good way. Yeah. And I really like the just general industry trend that I've been seeing in a lot of different areas of technology of trying to standardize on the interfaces that are used for interoperability so that the different specific implementations of open source or vendor technologies can innovate on the specifics of how they operate and the capabilities that they provide rather than trying to lock people into their solution because they're not compatible with the import or export options for the other systems that you might want to compose together with them. Yeah. 100%.
[00:45:16] Unknown:
Yep. Yeah. There's nothing worse than having to learn a whole new set of tooling and languages just to, like, make something work with a particular vendor.
[00:45:24] Unknown:
And then in terms of the OpenTelemetry project itself, what are the cases where it's the wrong choice and somebody might be better suited just using a standard approach to metrics and logging for being able to gain understanding into their system?
[00:45:40] Unknown:
That's a tough 1. Is it never an option?
[00:45:43] Unknown:
It's definitely an option.
[00:45:44] Unknown:
I I I'll be I mean, I think there's probably it's it's an interesting question, right, because, yeah, there's obviously times that it's the wrong choice. But that's less of a technical consideration than it is sort of organization that in an organization that can kind of handle and adapt to it. But I think there are broad like, there are wide class applications that maybe don't benefit from the o. Like, example, tracing has overhead. Not a ton of overhead, but it does. Tracing on the front end, especially, you know, right now will increase the amount of JavaScript your browser has to download. That can be a problem for people. And if it's a disqualifying problem for you, then, okay, Use something else. Right? We'll try to fix it. We'll try to make it better, but I'm not out here saying, like, oh, it's perfect, and if you don't use it, you're an idiot. I think a lot of you know, maybe embedded devices or, you know, things like that, you're probably not gonna get the mileage.
If you don't have right answer, it's at least a right answer, it's at least a good answer. And what I the 1 thing I would say is if you're listening to this or you're thinking, you know, the time that it's always the right answer is if you're thinking about trying to do it yourself. Right? If you've looked at your system and you said, I need distributed tracing. I need the sort of things OpenTelemetry can do, but I want to build it myself. Then it is a 100% the right answer to use OpenTelemetry. Because I can guarantee you that, collectively, we have probably put, like, more thought and time into building OpenTelemetry than just, like, 1 person can.
Right? Now you don't have to use OpenTelemetry with LightStep. You don't have to use it with Datadog. You don't have to use it with any specific vendor. Right? I'm not saying, like, oh, you have to pay us money. I am saying don't you know, there's a cost for everything, and going your own way is going to hurt more in the long run,
[00:47:51] Unknown:
than anything it saves upfront in terms of either time or money. Yeah. So I'll I'll look at what what Austin said there, but, also, I think it's important if, you know, if you're looking at over open telemetry and you think, oh, there's it's not doing the thing that I want for my special case or or whatever it is. I think it's important to to get involved with the community and actually bring that case up because it's likely that other people have also run into the same problems or limitations or whatever it is that's preventing adoption of open telemetry as as it is. And so I think it's it's good to have that conversation and and at least try to understand all those use cases. Even even if it's not supported today, it doesn't mean that it won't be supported tomorrow. And I think that's that's something that we're always looking forward to. So And so as you look to the next steps in the near to medium future of OpenTelemetry,
[00:48:37] Unknown:
what do you have in store for it as far as plans and future capabilities? And what are the areas of contribution that are most vital as you continue down the path of completing the general availability road map and moving beyond that?
[00:48:53] Unknown:
I mean, I can speak kind of to my my side of things. You know, at a big picture, I think, you know, more integration, more auto instrumentation, really trying to make it easy and fast for people to onboard and start using it and getting value from it. That's what I see, you know, this year. Right? Like, we've got kind of the basics done. We're implementing metrics everywhere. You know, now let's make that useful. In terms of what I would love to see people kind of coming in and doing, if you're, you know, if you're listening to this and you wanna get involved, The 2 things we really need are, 1, is user feedback. So people to actually try it out and use it and tell us what doesn't work and also tell us what works. But, also, you know, if you wanna go teach people about this, if you wanna educate people about this, we would love to have more people kind of involved in the community, helping out, writing documentation, doing examples. You know, it doesn't have to be on our website, on the OpenTelemetry site. It can be anywhere. You know, do it on dev.2 or or put it on YouTube or whatever. But if you are interested in that, we actually have resources at our website, open telemetry. Io. Under the documentation, there's, like, a actually have, like, a workshop there that you can adapt and use. You know, maybe you wanna do a lunch and learn at your company. You can pick up our slides and use those to teach other people open telemetry, which I think is a great way to get involved even if you don't wanna, you know, really get down in the in the weeds and get hub. No. And I think specifically around Python
[00:50:20] Unknown:
as you mentioned Austin, reading the docs trying out the examples. We have the open telemetry Python, read the docs website that, you know, is is up there and we would love to get some feedback on that. And from a contribution stand, we would love to see, you know, as many, folks that are interested in implementing instrumentation for OpenTelemetry as as possible, come in the door and, you know, attend our our SIG meetings or join us on Gitter. We're we're pretty accessible.
[00:50:47] Unknown:
Well, for anybody who wants to get in touch with either of you or follow along with the work that you're doing or contribute to open telemetry, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the Pulumi project. I've started adopting cloud resources. So definitely recommend taking a look at that if you're, you know, you're, you know, cloud resources. So definitely recommend taking a look at that if you're, you know, even if you're running existing infrastructure and you wanna start converting over, it has great options for adopting existing resources. So definitely worth a look. And with that, I'll pass it to you, Austin. Do you have any picks this week? I've been,
[00:51:28] Unknown:
using a lot of Helm 3 recently. I've been getting more back into Kubernetes. And, wow, you know, I I I tried Helm when it was years ago, I guess, when it first came out. And the headaches of configuration and setup and tiller and all that were extremely frustrating, and I felt made it a not great experience. But Helm 3, they really, you know, came around and it it's it's exactly what you wanted to do. It the templates things and it just works and it's 1 binary and you don't have to install or configure stuff. It's it's great. I love it if you've seen Helm before and been burned by Tiller or by RBAC or anything else.
[00:52:08] Unknown:
Definitely recommend giving Helm 3 a shot. And, Alex, how about you? Yeah. So I guess, my pick of the week is Algorithms to Live by the Computer Science of Human Decisions by Brian Christian and Tom Griffiths. It's a great it's a great book, talking about how algorithms can be used in everyday life for applying, you know, computer algorithms to how to make decisions,
[00:52:33] Unknown:
as humans. It's it's actually a really great and entertaining read. Yeah. I enjoyed reading that 1 myself, so I'll second that pick. So I'd like to thank the both of you for taking the time today to join me and discuss the work you're doing with OpenTelemetry. It's definitely a very interesting project and 1 that I plan to start adopting for my own system. So I appreciate all of the effort that you and everyone else involved has put into it, and I hope you enjoy the rest of your day. Great. Thanks for having us. Yeah. Thanks for having us. I'm looking forward to your feedback. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast dot com for the latest on modern data management.
And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction to the Podcast and Guests
Introducing Austin Parker and Alex Bowden
Background and Involvement in OpenTelemetry
Defining Observability vs. Traditional Monitoring
Challenges in Capturing Observability Data
Cross-Language Support and Benefits
Data Collection and Instrumentation in Python
Ensuring Feature Parity Across SDKs
Implementation Details of Python SDK
Adoption and Integration in Broader Ecosystem
Options for Data Analysis and Export
Lessons Learned and Community Collaboration
When OpenTelemetry Might Not Be the Right Choice
Future Plans and Areas for Contribution
Closing Remarks and Picks