Summary
As your code scales beyond a trivial level of complexity and sophistication it becomes difficult or impossible to know everything that it is doing. The flow of logic and data through your software and which parts are taking the most time are impossible to understand without help from your tools. VizTracer is the tool that you will turn to when you need to know all of the execution paths that are being exercised and which of those paths are the most expensive. In this episode Tian Gao explains why he created VizTracer and how you can use it to gain a deeper familiarity with the code that you are responsible for maintaining.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. And now you can launch a managed MySQL, Postgres, or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error-handling, monitoring, automatic containerization, syncing with Github, and more. Plus, it comes with over 70 open-source, low-code templates to help you quickly build solutions with the tools you already use. Go to dataengineeringpodcast.com/shipyard to get started automating with a free developer plan today!
- Your host as usual is Tobias Macey and today I’m interviewing Tian Gao about VizTracer, a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what VizTracer is and the story behind it?
- What are the main goals that you are focused on with VizTracer?
- What are some examples of the types of bugs that profiling can help diagnose?
- How does profiling work together with other debugging approaches? (e.g. logging, breakpoint debugging, etc.)
- There are a number of profiling utilities for Python. What feature or combination of features were missing that motivated you to create VizTracer?
- Can you describe how VizTracer is implemented?
- How have the design and goals changed since you started working on it?
- There are a number of styles of profiling, what was your process for deciding which approach to use?
- What are the most complex engineering tasks involved in building a profiling utility?
- Can you describe the process of using VizTracer to identify and debug errors and performance issues in a project?
- What are the options for using VizTracer in a production environment?
- What are the interfaces and extension points that you have built in to allow developers to customize VizTracer?
- What are some of the ways that you have used VizTracer while working on VizTracer?
- What are the most interesting, innovative, or unexpected ways that you have seen VizTracer used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on VizTracer?
- When is VizTracer the wrong choice?
- What do you have planned for the future of VizTracer?
Keep In Touch
- gaogaotiantian on GitHub
Picks
- Tobias
- Travelers show on Netflix
- Tian
- objprint
- Lincoln Lawyer
- bilibili – Tian’s coding sessions in Chinese
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- Viztracer
- Python cProfile
- Sampling Profiler
- Perfetto
- Coverage.py
- Python setxprofile hook
- Circular Buffer
- Catapult Trace Viewer
- py-spy
- psutil
- gdb
- Flame graph
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, and dedicated CPU and GPU instances. And now you can launch a managed MySQL, Post or Mongo database cluster in minutes to keep your critical data safe with automated backups and failover. Go to python podcast.com/ linode today to get a $100 credit to try out their new database service, and don't forget to thank them for their continued support of this show.
Need to automate your Python code in the cloud? Want to avoid the hassle of setting up and maintaining infrastructure? Shipyard is the premier orchestration platform built to help you quickly launch, monitor, and share Python workflows in a matter of minutes with 0 changes to your code. Shipyard provides powerful features like webhooks, error handling, monitoring, automatic containerization, syncing with GitHub, and more. Plus, it comes with over 70 open source low code templates to help you quickly build solutions with the tools you already use. Go to pythonpodcast.com/shipyard today to get started automating with a free developer plan. Your host as usual is Tobias Macy. And today, I'm interviewing Tian Gao about VizTracer, a low overhead logging, debugging, and profiling tool that can trace and visualize your Python code execution. So, Tian, can you start by introducing yourself?
[00:01:44] Unknown:
Hi, everyone. My name is Tian, and I'm the author of VizTracer. And I'm currently working for Microsoft. Very happy to be here. Thank you for inviting me. And do you remember how you first got started working with Python? I actually haven't started working on Python until after I started working. So I've never used it in school. After working, our company's test system is written in Python, so I had to start it write some Python code. And then I got attached to it and started some small projects with Python, and then did some bad projects, then some better ones, and then some more open source projects, mostly tools in Python.
[00:02:27] Unknown:
And in terms of the VizTracer project, can you talk a bit more about what it is that you've built and some of the story behind why you wanted to spend your time creating it and the problem that you're trying to solve?
[00:02:38] Unknown:
A lot of people use Tracer as a profiling tool, and I think it's more than that. So it can be used as a profiling, but I'm I'm thinking that it's more like a program execution visualization tool. So basically it visualizes how your program is executed. The idea behind this is actually from a product from my previous company that we had a really, really cool debugging tool for c and c plus plus which is much, much powerful than BeastTracer. And the good thing is that you can visualize your program, different process, different threads on the timeline, which is a tool that is missing in Python. I mean, basically, when I was writing Python I was always like, oh, I wish I had this tool to see what my program is doing.
So I'm thinking, okay. I'll just create 1. And that's that's basically the background story for this tracer.
[00:03:34] Unknown:
That's the story behind it. Now that you've started the project, you've been using it, other people are using it to solve their own problems, what are some of the core areas of focus or the main that you have with the continued development and maintenance of VizTracer?
[00:03:49] Unknown:
Well, I'm currently maintaining VizTracer by myself, and, of course, there are other contributors that contribute their code. Not a lot. There are some. But at least for now, I'm planning to maintain it. And I do hope that more people start using it because it's actually a new kind of tool in all areas. I mean, people are very familiar with the profiler with the traditional profiler, but not a tracer. So I realized that a lot people are not familiar with the idea of a tracer and what tracer can provide besides the profiling tools can. That probably will help more people to understand their call and using to use better tools to, you know, build their program.
[00:04:34] Unknown:
And when somebody reaches for something like this tracer or another profiling tool, what are some of the types of problems that they're generally trying to solve in terms of understanding how their program is operating and what it's doing?
[00:04:47] Unknown:
Normally, people will only reach q profiling tools when there is a performance issue. So, basically, their code is running slow, and they don't know what happened. That's the most common thing, and that's basically the only thing that a profiler can solve. This tracer can obviously solve a lot of those kind of scenarios, but it also can provide deeper looks into your program, which will help you with debugging even logic issues. So that, I think, is 1 thing that vistrager can do much better than the traditional profilers. Like I said, people often find their code not fast enough. They wanna know how they want to improve their codes to be faster, and that is the most common thing why people are looking for pro profilers.
[00:05:32] Unknown:
In terms of the overall space of profiling, there are a number of different utilities available. There are some that are built into Python. They're all gonna have slightly different areas of focus and different methods of actually performing the profiling. And I'm wondering if you can just talk to how you thought about the approach that you wanted to take with this tracer in terms of how to actually execute the profiling and some of the, maybe, limitations that are imposed as a result of that particular approach. So, for instance, being able to
[00:06:04] Unknown:
profile the code execution as it moves between the boundaries of the Python and the C module or something like that. So let's talk about difference first. So you mentioned that there are a lot of profiling utilities there. For example, cprofile, which is probably used by a lot of people. And there's, of course, PySpice, Gallium. There are a lot of, sampling profilers. And this tracer itself is actually has its own kind. Like I said, it's more like a tracer, not a profiler. I'll try to explain like this. So say you are watching a movie, Avengers, And the traditional profiler basically tells you that, okay, so Captain American was there for 28 minutes, and Ironman was there for 35 minutes, and Hawk was there for 20 minutes.
And you'll know, okay, so Ironman is the lead, so let's kill him. Right? So that's what you do for the normal profilers. Well, this tracer basically shows you, okay. So Ironman appears at timeline 1 minute 20 seconds, and 10 seconds later, Hawk joined. And then after a minute, they all left and then Captain American was there. So, basically, Vistracer has a timeline for everything, and that's how it is different than all the other profilers. As for the user experience, Vistracer actually is a lot alike many of the profilers there. So what I prefer is to use a command line tool.
So instead of, like, doing python your script dot py, you do vistraser your script dotpy, and that's it, which will generate a JSON report and you can open that JSON report with another utilities called VizViewer. You do VizViewer dot JSON report and then it will bring up a very fancy user interface which is a branch of Perfetto that is developed by Google. So you can see all the program execution on a web page. That's how users can use it. And, also, they can use inline profiling, which basically write some code to your program, and some people just prefer that because they don't need to use another command line tool, and that works with Vistraser as well. And the limitation for Vistraser basically ties to how it is different than the other profilers because it logs so many information.
When you try to, say, profile a program that runs for a really long time, and the reason that it's slow is highly dependent on the user input. And you had to say, let the program run for an hour or 2 hours and see the average result of the of the profiling, then this tracer cannot do very well because it has to log a lot of information and there's just not enough memory and disk space for that. That is a very significant limitation of this tracer and why you should go for probably some other profilers in this scenario.
[00:09:11] Unknown:
And in terms of the overall debugging flow, you mentioned that things like this tracer and other profilers are generally focused on trying to understand what are the performance impacts on the overall runtime of my program, how do I figure out what to tune or what to optimize. And I'm wondering if you can talk to some of the ways that people will integrate profiling into their overall debugging process and how they approach the understanding of what their program is doing and figuring out where they should spend their effort on maintenance or improvements.
[00:09:41] Unknown:
I think that is actually the strength of this tracer because there is a timeline of this tracer. So you can see all the call facts, not the averaged or summarized 1. You can basically see function a called function b at this time, and then function b called function c and function returns and then function d is called. So you know your execution flow, how your program is executed, which function is executed, and when. But for the normal profilers, basically, you can only know that this function in this whole process has run for a minute or 2, and that is normally not super helpful for logic debugging.
It's helpful for performance debugging, which you will know that this function costs a lot of time, but not for the real bugs. In integration, it depends. This tracer has a overhead and whether it is significant depends on how you look into it. The worst case is about 2 x, which is probably not acceptable for production, but is acceptable for the CI developing. So you can definitely turn this tracer on by default in your CI or in your development, and that is what sometimes I do. And you can use Vazetracer as a development tool just to, you know, take a look at your program execution when you feel there is a problem.
And for some of the other profiling utilities, you can probably use them for your CI. So you have a specific design process for profiling with your benchmark. Every time you're checking your code, you run your code with some profiling tools and generate a result that is easily understandable by people, like, which part has changed after I submit my code just like coverage.pi, something like that. That's what my understanding is for the profiling utils is in, like, continue integration.
[00:11:51] Unknown:
And digging a bit more into this tracer itself, can you talk through some of the implementation and some of the types of data that it's actually generating and some of the different ways that you're able to use that to understand the overall execution flow of your program and some of the different types of analysis that you're able to do as a result? The mechanics is is actually quite simple. There is a hook in Python. It's called set profile,
[00:12:14] Unknown:
which is used by cprofile. That basically adds a function hook every time Python tries to call a function or every time a Python function is returned. And, of course, to minimize overhead, I use the c version instead of the Python version. The Python version will introduce a, probably, 20 x overhead, which is not acceptable. So the the skeleton, the backbone of the this tracer is what we call FEE, function entry exit. That is normally the most important thing for all tracers. It locks all the function entries and exit with a timestamp, and then you can visualize those later. That is the most important thing, and that is the thing to help you understand what your code is doing by seeing which function is executed and when.
And, of course, there are other features of this tracer that you can use besides function entry and exit. For example, you can log an arbitrary number. For example, if you are doing machine learning, probably you wanna block the cost function, block cost through time through this timeline, and you can see your call stacks and you can see how your cost has changed over time. That can be done in this tracer, and you can log an arbitrary object in this tracer just for debugging. Those are some of the things that you can log and, of course, you can log an arbitrary event. For example, you think the granularity of a function is not what I want. I wanna log a specific event here. It is not a function.
It is a couple lines of code, but I still want to log it as a function and react it, you can do that as well. So basically creating your own function at some place in your code. So there are a lot of things you can, let's say, customize in your Vistraser, and some of them do not even require code changing. So you can use the Vistraser command line tool to log variables, like local variables or log, like, things like that using, regular expression. Restraser has some internal ways to handle the AST
[00:14:34] Unknown:
to make that happen. As far as the overall design and approach that you've taken with VizTracer, I'm wondering what are some of the ways that it has changed or evolved as you started working on it and digging deeper into the problem and started using it for more of your own work? In the beginning, when I'm prototyping, I'm definitely using the like I said, there is a set profile function in Python. I'm definitely using that. And after I finished the proof of concept,
[00:14:59] Unknown:
I have realized that the overhead is not acceptable for any serious developing. So I moved to c, and that is 1 major change for Vistracer. And, of course, how the data that I logged is stored. So initially, I thought, okay. I'll just create infinite data, and then I ran off memory really fast. Then I moved to a circular buffer so that you can store some of the latest data and just throw out the old ones. So you do not crash your computer or run all of the memory out while you are using this tracer. So those are the things that I can think of that is related to a major technical change, I think.
[00:15:45] Unknown:
As you have been building this project, what are some of the most challenging or complex engineering tasks that you've had to do to be able to actually get this tracer working the way that you wanted it to, or maybe some of the ways that you've used this tracer to help yourself debug what was going on with this tracer?
[00:16:02] Unknown:
The technical part mostly lies in the part that is not Python. So first of all, the c extension is much more difficult to develop than the Python code. You have to do the object reference by yourself, and you will get a lot of segfault. You will have to constantly check the documentation for which Python c API is borrowed reference and which 1 is new reference. So that 1 is definitely a pain than developing Python code directly. And the other part would be the overall overhead. Because this is a hook for the function calls and function returns, if you wanna minimize overhead, there are a lot of things to do, you know, to delay the unnecessary calculation as late as possible.
That's another piece. Besides c, there is also some JavaScript development. Like I said, I use Perfetto as the front end of this tracer, but I also had to customize Perfetto to achieve some of the things that I wanna do. For example, show source code of each function when you click on that function. And that part, I had to do it in JavaScript, which is not a language that I'm very proficient in. That's another technical issue that I had to face. I did use this tracer to help develop this tracer, but not the tracer itself because the Python mechanics only allows 1 function hooked on the set profile function. I use this tracer to debug this viewer, which is the utility to bring up a HTTP server and host Perfetto and show the user whatever they want. And if there are bugs in this viewer and if there are performance issues with this viewer, I often use this tracer to trace it and see what happens there. And that's why there is a trace self argument in VizTracer that is specifically designed for VizViewer.
[00:18:11] Unknown:
As far as the visualization element, you mentioned that you're using Perfetto as the kind of building block for that. And I'm curious how you thought about the way that you wanted to visualize the information that you're collecting with VizTracer and just some of the types of information that you wanted to be able to convey when somebody just looks at that graph that's generated.
[00:18:33] Unknown:
Like I said, the original idea was from my previous company's product. That's how our product did. I mean, they did a lot of other stuff, of course, but 1 of the important thing is the function actually exit. And I think that is critical, probably the most important thing to show in in a visualization. I did not land professional in the beginning. I tried to write my own user interface, which I suddenly realized is too difficult. Even for a smaller piece of log, there are a lot of things to consider, and it's not easy to write a good UI, and, especially, I'm not a good front end developer. So I moved to it's called Catapult Trace Viewer, which is what Chrome is using.
And it's also developed by Google. But I believe that is deprecated because Perfetto came later. And then I moved to Perfetto, which is also a pain while I'm moving, but I think that is the future. Good thing about Perfetto is that it had so many optimizations inside that can visualize gigabyte level trace blocks, and that is a lot of information which cannot be achieved by a individual developer. And I think for the trace realization, ProFETO is the best tool on the market, if not the only tool.
[00:20:04] Unknown:
In terms of the actual usage of VizTracer, you mentioned that the overhead is sufficient that you wouldn't really wanna use it in a production environment, but the quandary is that a lot of times production is the place that the interesting bugs really happen, and it's hard to necessarily replicate that in a nonproduction environment. So I'm curious what are some of the ways that you've tried to cut down on some of that overhead or some of the maybe tuning parameters that are available to reduce the overall overhead that it introduces for being able to run it in a production environment or at least, you know, adjacent to a production environment?
[00:20:41] Unknown:
The first thing I wanna say is that at least 95% of the bugs are found in phase, not in production phase. Of course, some hard bugs, that only happens in production, but that's a rare case and, of course, you need good tools for that. But a good tool for development phase is very, very helpful. It's much more helpful than even if it's not usable in production phase. And the other thing is you can actually fine tune this tracer to make the overhead much, much smaller. There are a lot of parameters that you can tune in this tracer. For example, there are the maximum stack depth that you wanna log in this tracer. You can set that smaller so that you do not log very deep function calls and access, And there are a minimum duration filter so that you do not log functions that are too short. You only log functions that are large.
And by fine tuning the parameters of this tracer, you can actually reduce the overhead to a extent that that's not affect your production code that much. The reason that I held back about using this tracer in real production is not only about the performance. It's because this tracer is a intrusive profiler, which means it will run extra code in your program. Unlike some of the other profilers, say, PySpy, which will have a minimum impact in your program. C profile is the same thing. It is intrusive, which means it needs a instrumentation, and it needs to run extra code in your production. It's not about overhead. It's more about safety and the reliability. So that is why this tracer, at this point, probably should not be used in production code. I mean, even for, say, some of the very popular packages or libraries, say, coverage pi, which is probably used in a lot of places. I don't think anyone should use coverage pi in their production code even though it's helpful for their debugging. You should use that in your CI. You should use that in your development phase, but not your production code. Yes. The debugging your production code is difficult, of course, and you need logging tools for that. But most of the bugs, and if not most of the time, happens the development phase, not production. That's my opinion.
[00:23:16] Unknown:
No. Definitely all valid points. And so for people who are using VizTracer in their development cycle on their local machines or integrating it into their CI environments, can you just talk to some of the typical approaches that you and others use for actually making that part of their day to day work?
[00:23:35] Unknown:
Like I said, using Vistracer is extremely easy. That is by design. Instead of, like, Python your code, you just do Vistracer code, and it will lock you a file which can be opened in any computer. So, basically, you can copy that file in other environments and it works fine, and you just use these viewer to open that. That is good because that can be done as an artifact. So that in your CI, you can probably do vistraser some code. It it will generate a log and you can download that and, like, see what happens in your local computer. So that is helpful. And you can also easily script this tracer just like you script your other libraries. For example, in my CI, I use unit test. So I need to do a python dash m unit test and to run the test.
And if you wanna log something things for using this tracer, just add another this tracer your program dotpy, and it will log the phrase and you can open that later. I think that piece is extremely easy, and people should try that. And it's not hard for, and it's super fast to set up.
[00:24:45] Unknown:
As far as extending VizTracer, I know that it has a plug in interface, and I'm wondering if you can talk to just some of the types of extension and integration that people are able to do through that interface and some of maybe interesting examples of plug ins that you and others have built. Yes. Vistracer has a plugin interface. It's called vist plug in.
[00:25:06] Unknown:
I developed some of the plugins myself, but I have not been developing it for a while. And 1 of the reason is, of course, that not a lot of people are asking for some extra features that are not supposed to be in the core v's tracer. So the reason that I build the plug interface is because I do not want any dependencies for vstracer, any dependencies that I cannot control. I mean, currently, Vistraser has 1 dependency, which is also my project that I can control. But for some plugins, for example, the official plugin now supports the CPU usage sampling, and that is done by p sutil.
So if you wanna do that, you have to have p sutil as a dependency, and that is not something that I wanna do in the core v tracer. And that's why I developed the plug in interface. So, basically, there are a couple of hooks for any plug ins to be hooked on. I think before vistracer starts, it basically can send out a message. And after it stop collection and before it save the report after save the report, I believe there are some entry points in this tracer that you can hook to and can basically get the full this tracer instance and do whatever you want just like developing this tracer if you wanna do that. An interesting story, though, even though, personally, I do not use this plug in, for a while, I don't think anyone is using that. But there was a paper published who is using this plugin and who is using this plugin only for their project. They're using this tracer's plugin to visualize the CPU and memory usage through time, where they need a timeline, and Vistracer provides that. They do not need any FEEs or call stacks.
So I checked their original code. They actually turned off the call stacks. They only need the feature from the plug in. I think that was pretty funny because that is not why Vistracer is designed. But, somehow, it was used that way to publish a paper.
[00:27:30] Unknown:
Yeah. It's always funny what kinds of things people will use a given tool for just because it's the thing that they have near to hand, and they don't wanna have to go figuring out what's actually the right tool for the job. Exactly. And to your point of being able to pull in things like the CPU information, I can imagine the plug in interface being used in, for example, in debugging a Django application where you want to also bring in information about the IO metrics or CPU time metrics for your Postgres database during the function execution of a given call stack in a Django application.
I'm just wondering if that's something that you or anyone else has tried working with or just some of the other types of information that you're able to bring in to enrich the call stack information.
[00:28:16] Unknown:
Well, actually, you probably do not need the plug in to do that. You can just tune Twin Twin Tracer. Like I said, Twin Tracer has the ability to log certain stuff with proper setup. It's not super intuitive, but you can learn that if you want, and you can instrument your code to log a specific IO to, say, your database or some other IO functions. You can do that as well. And there is a log sparse function, which is very helpful for long running programs. That basically means do not log any function call and returns unless explicitly TOTESO.
That can be used for, say, Django or Flax for all the IO stuff, and especially if you wanna use some decorators. And there is another cool feature that I developed recently, which I have not seen any people use because, I mean, people will always have second thoughts if they had to change their code to use some feature. That's a thing that people are afraid of. They do not want to change their code, especially if they have to re revert it, and they are afraid of side effects on their production code. So this tracer has the ability that you can simply add Python comments in your code, which has nothing to do in your production code unless you use this tracer.
So this tracer will turn that comment into real Python code while this tracer tries to compile your file. But if you run that program without vistracer, that's just normal Python comments. And that's for the people who are extremely
[00:30:09] Unknown:
careful about adding code in their code base. That is probably a thing to go for. It's interesting that you bring that up because that was 1 of the things I had actually meant to ask you about earlier but forgot to. So being able to actually add that instrumentation directly into your code, but then being able to disable it at, you know, deploy time, for instance. That's definitely great that you have that capability in there. Yeah. So there are different ways to disable these tracer. For example, the locks bars feature that I just said, it will disable itself if a vistraser is not attached. But the thing that I mentioned is on the another level because,
[00:30:45] Unknown:
say, if you write some code that claims to be able to disable themselves if this research is not used or in your production mode, you probably have second thoughts. Like, will it do what it claims? I probably can take a look at their source code, and if it's simple enough, then I'll trust it. But common is another thing. You are sure that Python interpreter will not run any of your comments. So that in your production code, as long as you do not use this tracer to execute it, you will be a 100% sure that a common will not affect your code in any level. But with this tracer, it can magically turn that command into some meaningful code and start logging stuff that you want to log.
That is a thing that I'm kinda proud of, but no 1 has used yet.
[00:31:37] Unknown:
And you mentioned the example of somebody who used the plug in interface for being able to help with their work on writing a particular paper. I'm wondering what are some of the other interesting or innovative or unexpected ways that you've seen VizTracer used?
[00:31:52] Unknown:
I actually developed a lot of features for VizTracer that no 1 is using. I often just thinking, oh, it's nice that if VizTracer has this feature and I develop it and I test it, and no 1 used that ever. So people normally just want the basic, features. And if your user base is not large enough, then no 1 is using the advanced features. But I've developed the attach feature for Vistracer. So, basically, you can run your program with Vistracer installed, and Vistracer will not trace anything until your process is attached by another Vistraser command. When I developed it, I was like, okay. This is a super cool feature that you can attach to a process and do all the collections, but no 1 used that.
Until a couple of months later, I realized that Microsoft is using it in their Python app service. So it's kinda official debugging tool from Microsoft that use this phraser to attach to a Python app service process and then download logs. That is the first instance that that I found to use the attach feature. That's definitely very cool. Yeah. I was inspired by that, actually. So after that, I improved the attach part. So, originally, you have to install the Vistracer in your original process to enable Vistracer logging. And after that, I tried another process to inject code using GDB so that you do not need to install Vistracer in your original process at all. You can run your original process as it is, and you can attach this tracer to that and dump logs to see what happened after you would attach this tracer. And you can detach just like nothing ever happened.
[00:33:51] Unknown:
That's definitely interesting capability. And in terms of your own experience of building VizTracer and maintaining it and using it for your own work and supporting people who are using the open source project, what are some of the most interesting or unexpected or challenging lessons you've learned in the process?
[00:34:07] Unknown:
Developing an open source project is much more than programming. That is my experience when I'm working on a open source project. I wrote a lot of code before. I have a lot of reports for repositories. I have developing projects, but not like this. Open source project is not only about the project itself. It's not only about technology, especially as a individual developer. Like, no 1 knows who you are, and no 1 knows what you've been doing. So that you have to make a good readme file. You have to do your documentation, and you have to build a super easy to use interface because every step a user had to go through to run your project, you probably lose 10% of users.
And the most difficult thing is to tell people that, hey. I've got a really cool project here. Wanna try it? That is the part that I've never experienced before as a programmer, as a, like, software developer. And I had to do this, like, as a complete newbie to, you know, do forums, like all the other forums on the Internet. And, I had to send out emails and build up a Twitter account and, you know, to submit some articles to some big websites and even answer questions in Stack Overflow. I think that is the most interesting and important lesson I had while I was developing an open source project, that it is more than the project itself.
[00:35:55] Unknown:
Absolutely. And so for people who are looking to be able to debug their program or understand the execution flow and learn more about it? What are the cases where VisTracer is the wrong choice?
[00:36:08] Unknown:
If you are debugging or profiling a program that had to be run for a really long time, say, hours, then Vistraiser is definitely a bad choice. That is for sure. If you only care about which function took the most time during this run, this tracer is probably not the best choice. It is an acceptable choice, but you probably have better choice for that. This tracer is probably be a fantastic choice if you try to debug or profile a multiprocess or multithread program. That is 1 of the strengths of this phraser. And another thing that I want to mention is that sometimes people think that they need to run a program really long to understand what is slow.
Because they are say, if they are using a sampling profiler and the sample rate is at millisecond level, and their code probably just runs 1 millisecond. So they have to repeat that, do a for loop or a while loop, and repeat that for a thousand times and using a profiler to see what is slow. But with this tracer, that's not the case. You can just run your program once because it has a nanosecond level granularity, and it can basically show you what happens and you can see, okay, which function is slow. So that is probably a good case to use Phrasephraser, that your program actually runs really slow and that your program actually runs really fast.
But for your use case, you still wanted to figure out which function takes the most
[00:37:46] Unknown:
of the time. You note about the multithread, multiprocess visibility, something that we forgot to dig into. So I'm wondering if you can talk a bit more about some of the ways that you built in that capability to be able to actually manage the sub processes and the threads and how you think about presenting that information in the visualization about, you know, is it important to actually segment out which processes and which threads are calling which functions and how that factors into the overall execution flow and timing information.
[00:38:16] Unknown:
I always thought that multiprocessing and multithread support is the killing feature for VTracer. When people start developing multiprocessormultithread programs, it's very difficult for them to understand the synchronization. They can probably using other profilers to see, okay, which function takes the longest, but why? Sometimes processes are waiting for their subprocess to give their useful information, this time spent on IO, and you just don't know which piece is waiting and how much time it's waiting and which processes is waiting. A fundamental feature of vistracer is timeline. With timeline, you can synchronize process. You can know at a specific timestamp which threads, which processes are doing what.
That's a killing feature. And with that, you can not only profile your program, but understand your multiprocess program and to solve some problems like deadlock. Why is it deadlocked? What is waiting for? And why is the process waiting that much? Which process is it waiting for? So if you are working on multiprocess problems, I highly recommend to try these tracer, at least, and see how much information it can generate for you. It could be beyond your imagination. And as for how, because I believe that this is a killing feature, I actually put a lot of effort into it to make it happen. The quick version is to hook the subprocess and multiprocessing module.
Basically, hook every module that could be possibly used to spawn a new process. Thread is actually simple. Python provides a thread set profile function so that you can hook the thread. Thread is very simple. But process, basically, you need to start a new process using v tracer. So different strategies for subprocess module and multiprocessing module and fork module, non module, OS dot fork. You need different strategies, but the concept is always how to start this tracer before start running your code that requires the understanding of how the subprocess module, the p open, and how the multiprocessing dot process works. And as long as you understand the built in libraries, you can know which part you should hook into to make this thing happen and how to collect all the informations from subprocesses into your main process and then dump that out as a full log.
But thanks to Perfetto, the only thing I need to do is to label which function is from which process and which thread. In the trace event, you always need a process ID and a thread ID with your FEE function entry exit. Perfatal will take care of the rest and visualize them correctly a let's say, let's on a row that represents a thread in a process.
[00:41:36] Unknown:
Yeah. I can definitely imagine having that visibility a where you're dealing with deadlock and fighting with trying to figure out what's actually happening at any given point in time. It's probably saved you and many other people days of tearing their hair out. Yeah. Definitely. I mean, but multiprocessing,
[00:41:59] Unknown:
it's much, much nastier than single process code, and there are just a lot of things that people will not easily imagine, and it's less reproducible than single process code. And with this tracer, you only need to reproduce it once and you have the log and you can examine the log instead of, like, setting up breakpoints in different processes or probably log in different processes
[00:42:22] Unknown:
and, you know, things could get ugly. It's like the joke about why did the multithreaded chicken cross the road, 2 other get side fa too. Right. And so as you continue working on this tracer and using it for your own purposes, what are some of the things you have planned for the near to medium term or any particular features or bugs that you're excited to dig into?
[00:42:44] Unknown:
Yeah. I'm actually developing a new feature that I think would be helpful for CI or, like, for the continuation development usage. I am writing a compressed module for vTracer logs. Currently, vTracer logs is huge. If not, multi 100 megabytes, it's probably a gigabyte or so for, like, large logs. And a lot of the information is redundant. And that is required by the trace event format that is used by Google. So if you wanna use Perfatal, you have to follow their specifications. But I'm planning to write a compressing module specifically designed for v tracer so that you can compress the logs to a reasonable amount.
I'm thinking about 90 to 95% compress rate. So with that, you can easily save a compressed log from a remote machine and download that or send that log to your colleagues, to your coworkers, to your friends without worrying the bandwidth.
[00:43:59] Unknown:
Are there any other aspect of the VizTracer project or tracing and profiling in general that we didn't discuss yet that you'd like to cover before we close out the show? I think we covered most of the topic that I wanted to talk about, but I do wanna
[00:44:13] Unknown:
emphasize 1 thing that I mentioned before is if you could try to see what is the difference between a flame graph and a trace graph. They look very similar, but they present quite different amount of information. So that is 1 thing that I really wanted to emphasize, and people sometimes are not aware that vspacer do not produce flamegraph. It produce tracegraph.
[00:44:43] Unknown:
Yeah. It's definitely a good thing to note and something that I hadn't really noticed while I was preparing for the show, so thanks for calling that out. For anybody who wants to get in touch with you and follow along with the work that you're doing or contribute to VizTracer, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose a show I started watching recently called Travelers on Netflix. It's a interesting sci fi show where people's consciousnesses are being projected backwards in time into the bodies of hosts who would otherwise have died at different points, and they're, you know, going on different missions to try and prevent various catastrophes. And so it's interesting concept and pretty good execution, so it's a fun show to watch. And so with that, I'll pass it to you, Tian. Do you have any picks this week? I have a couple of things that I prepared. I'm not sure if it's suitable, so it's up to decide. First of all, I wanna introduce another of my project that I actually used more than Vistracer. It's called object print. It's OBJ print. It's also in
[00:45:46] Unknown:
my GitHub, and it's basically print objects as a human readable form, which is very helpful if if you wanna do some lightweight debugging. As you know, the way that Python prints objects are horrible and not informative. It just lets you know, okay, this is a object of some class at this specific location, which is not helpful at all. Object Cream solves that problem. That's 1 thing. And for the entertainment side, I actually really love the show Lincoln Lawyer on Netflix. I've watched a lot of shows on Netflix and actually prefer their miniseries when they can wrap a story in a single season, at least most of the story in a single season. I think most of them are really good, and Linked Lawyer is 1 of the recent ones that I liked a lot.
And the third thing is that you probably not know, but I'm am also a video uploader. I'm not sure if that's the right term. So I make videos about Python and I sometimes do streaming. So but it's on a Chinese website, and it is purely Chinese. So I actually talked to some of my friends and some people that are also developing Python, and they said they've been listening to your show. So if you can understand Chinese somehow, you can follow me in Bilibili, and my username is Ma Anong Gautian. And I'll send the send the data to, to Tobias because I don't think he can type the Chinese. But, yeah, that's a couple of things that I wanted to mention.
[00:47:26] Unknown:
Definitely. So thank you very much for taking the time today to join me and for sharing the work that you've been doing on Viz tracer. It's definitely a very interesting project and 1 that I will likely be turning to more often to help debug some of the issues I'm run into in my own program. So definitely appreciate all the time and energy you've put into making that utility available, and I hope you enjoy the rest of your day. Thank you. Have a good time. Thank you for inviting me.
[00:47:51] Unknown:
Thank you for listening. Don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest on modern data management, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you learned something or tried out a project from the show, then tell us about it. Email hostspythonpodcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Sponsor Messages
Interview with Tian Gao: Introduction and Background
The Story Behind VizTracer
Core Areas of Focus for VizTracer
Common Problems Solved by Profiling Tools
Approach and Limitations of VizTracer
Integrating Profiling into Debugging Process
Implementation and Data Analysis in VizTracer
Evolution and Technical Challenges of VizTracer
Using VizTracer in Production Environments
Integrating VizTracer into Development and CI
Extending VizTracer with Plugins
Advanced Features and Instrumentation
Unexpected Uses and Attach Feature
Lessons Learned from Developing VizTracer
When VizTracer is the Wrong Choice
Profiling Multiprocess and Multithread Programs
Future Plans for VizTracer
Closing Remarks and Picks