Summary
We write tests to make sure that our code is correct, but how do you make sure the tests are correct? This week Ned Batchelder explains how coverage.py fills that need, how he became the maintainer, and how it works under the hood.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.podastinit.com/linode?utm_source=rss&utm_medium=rss and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
- Visit the site to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Your host as usual is Tobias Macey and today I’m interviewing Ned Batchelder about coverage.py, the ubiquitous tool for measuring your test coverage.
Interview
- Introductions
- How did you get introduced to Python?
- What is coverage.py and how did you get involved with the project?
- The coverage project has become the de facto standard for measuring test coverage in Python. Why do you think that is?
- What is the utility of measuring test coverage?
- What are the downsides to measuring test coverage?
- One of the notable capabilities that was introduced recently was the plugin for measuring coverage of Django templates. Why is that an important capability and how did you manage to make that work?
- How does coverage conduct its measurements and how has that algorithm evolved since you first started work on it?
- What are the most challenging aspects of building and maintaining coverage.py?
- While I was looking at the bug tracker I was struck by the vast array of contexts in which coverage is used. Do you find it overwhelming trying to support so many operating systems and Python implementations?
- What might be added to coverage in the future?
Keep In Touch
Picks
- Tobias
- Ned
Links
- edX
- Lotus Notes
- Zope
- Coverage.py
- Gareth Rees
- Trace in stdlib
- Fig Leaf
- State Machines
- CodeCov
- Coveralls
- Cobertura
- Turing Completeness
- Django Templates
- Jinja2
- Mako
- Hy-lang
- GCov
- Jython
- Code Triage Service
- Who Tests What
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode at ww w.podcastinnit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app or experimenting with something that you hear about on the show. You can visit the site at www.podcastinit.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. To help other people find the show, please leave a review on Itunes or Google Play Music, tell your friends and coworkers, and share it on social media. Your host as usual is Tobias Macy. And today, I'm interviewing Ned Batchelder about his work on coverage dot py, the ubiquitous tool for measuring your test coverage. So, Ned, for anybody who doesn't know who you are, could you please introduce yourself? Hi. So I'm I'm Ned, as you said. I've been active in the Python community
[00:01:08] Unknown:
for a long time, 20 years almost maybe. And I've been the maintainer of coverage dot py for last 11 or 12 years that's going on now, a long time. I work at edX on online education from MIT, Harvard, and lots of other great universities, and it's all an open source project. I'm really happy to be able to work on open source for my day job. And it's all in Python, so I'm really happy about that too because it's basically the only skill I've got.
[00:01:39] Unknown:
And do you remember how you first got introduced to Python?
[00:01:43] Unknown:
Yeah. Years ago, I was working at Lotus on Lotus Notes, and someone told me that there was this project called Zope that had some interesting similarities to Lotus Notes. And so I looked into Zope, and I thought, well, Zope, it's kinda interesting. It's a little unusual. It's not really for me, but this language that's written in Python looks really interesting. And from that point on, I was using it for whatever tooling and side projects I could use it for, and, eventually, I got to use it in in my day job. And I've been doing all Python for the last 15 years or so.
[00:02:17] Unknown:
So for coverage dot py, I'm wondering if you can just give a high level overview of what it is and how you first got involved with the project and ended up taking over the maintainership.
[00:02:27] Unknown:
Sure. So first, good on you for knowing that I didn't write it to begin with. That's something that most people don't understand. So I because coverage dot py, to begin with, is a tool which can observe your program being run, usually a test runner, and then it can tell you what parts of your code got executed and what parts of your code didn't get executed. That's it in a nutshell. In some ways, it's a very, very simple proposition. Although fulfilling that promise is a lot more complicated. So coverage dot py is a way that you can understand what parts of your code are running and what parts aren't. And when you're running your tests, that means that you can tell what parts of your code the tests are running, which is a very valuable thing to be able to understand. I got first got involved with coverage.
Thing to be able to understand. I got first got involved with coverage dot py, I think in 2, 004. I was working on some Python project. I forget now what it was. It might have been an ad hoc SQL query interface where you could be a little bit flexible with SQL syntax, like having 2 where clauses and it wouldn't mind. And I had some tests for it, and I wanted to know how my tests were doing, and I looked around to see how I could measure coverage on a Python program. And I found this simple little 1 file project called coverage dotpy. And it worked pretty well, but I wanted to improve it a bit, and I wrote to the maintainer, to the author, Gareth Rees, and I didn't hear back from him. It seemed like he was offline or had abandoned it.
So I went ahead and made the change that I wanted to make. I tried to submit it back to him, but I didn't get a response. So I wrote to him again, and I said, I'm not hearing from you, so I'm just gonna publish my change, the way it is. I'll I'll put it out there with my change. And from that point on, I was the maintainer of coverage dot pie, and I've been doing it ever since.
[00:04:22] Unknown:
Have you ever heard back from Gareth to sort of officially bequeath you the project, or has he just been radio silent since that first time? You know, it's funny. I should know more about our exact interactions.
[00:04:33] Unknown:
He he may have written at some point, but in any case, it's pretty clear that he doesn't mind that I'm doing this since it's been very visible for a long time. I continue to say that not that I am the author of coverage dot py, but that I am the current maintainer of coverage dot py even though the project has changed radically since I started on it. And at this point, essentially, every line of the code is something that I have written. I'm sure there's still lines in there that are as Gareth originally wrote them, but they've at least been refactored and moved and put into classes or whatever, since then. So but I I don't want people to think that I have created this thing from whole cloth. In fact, I I think it's a very interesting point about open source work that although I am well known as the the guy behind coverage dot py, I'm clearly associated with it. It doesn't mean that I had to come up with the idea and start it all by myself. People can approach an existing project and start working on it and eventually become the owner of that project through,
[00:05:36] Unknown:
you know, whatever attrition or change of interest in the original authors and projects can move from hand to hand. Yeah. It's always interesting seeing the succession path for open source projects because quite often they arise from a particular need that somebody has, and they'll fulfill that need. And then eventually, they might move on to other projects or other companies, and then they no longer have any direct interest or stake in that particular problem domain. And so then the project will either wither on the vine or there may be enough of a community around to take over ownership of that project and continue continue with its progression. So it's always interesting to see how that dynamic plays out. Right. Exactly. And it's not a failure of any sort. It's just an evolution of the project and an adaptation over time as interests move and shift.
[00:06:22] Unknown:
It'd be great if open source were more professionally supported, maybe financially. That's a whole other topic. But in the meantime, if it's a professional interest, then I mean, if it's a personal interest, then it's bound to shift as people's interest shift.
[00:06:39] Unknown:
And the coverage project has become the de facto standard for measuring test coverage in the Python community. I'm wondering, 1, if there are any other competing projects that have slightly different takes or different approaches, and also why you think it is that coverage has grown to become that standard.
[00:06:57] Unknown:
Yeah. That's a very interesting question to me because I don't quite understand why that is. Lots of people don't know this, but there is a coverage measurement tool in the standard library. It's called trace. And if you run Python dash m trace, it will show you how it works, and you can actually measure coverage without installing a separate tool at all. I guess the reason it's become the de facto partly is it's got the name that if you Google for Python coverage measurement, it's gonna come up because it's just called coverage. It's not a very interesting name, but I guess that helps it. But the other thing is that it's actively maintained and it's got more features than trace does. Trace does a few things that coverage doesn't do, but coverage has better support for the core need of measuring coverage.
So people have found it, people like it, and they keep using it. And I keep extending it and maintaining it, and I guess that's a good dynamic. There are actually third party coverage tools other than coverage dot py. There was 1 called Fig Leaf, by Titus Brown, which had some interesting ideas, some of which have still not made it into coverage dot py. And there was a tool called Canopy, which was a completely different take on it, which never really got off the ground. But it's a great name because it means a covering, and it ends with p y without being really overt about it, which I'm really, really jealous of the name Canopy because it's super cool.
[00:08:19] Unknown:
That that that is a very well thought out name. For people who are using coverage, what's the utility of having that information, and why might they want to incorporate that in their overall test suite? Sure. So the idea behind measuring your coverage is
[00:08:32] Unknown:
you have some product code. I'll call it product even though lots of people feel like that implies some kind of commercialism. But I'll call the code that you're actually writing and you actually care about is called your product code. And then you write tests for your product code. And you have this belief since you wrote those tests that the tests are testing your code. Well, how can you figure out if those tests are testing your code? Essentially, how can you test the tests? That's a very difficult proposition. But 1 thing you can do is if you measure the coverage of those tests, that is which parts of your product code are actually being executed by your tests, that's a good indication that your tests are at least exercising the product code. If you look at the coverage measurement of your product code and you find that there's parts of your product that are not being executed at all by your tests, that's a clear indication that your tests are not testing that part of your product code. So looking at the coverage measurement is a really, really quick good indication of sort of how broadly your tests are covering your product code. And that's why it's called coverage because the tests are meant to exercise all of your product code. But how do you know if they are? You run coverage on it, and it tells you. And oftentimes,
[00:09:44] Unknown:
the pursuit of complete test coverage can be an end unto itself, which cannot which can result in sort of poor choices of how you actually design and execute your tests. So I'm wondering if you can dig a bit into some of the downsides of having that coverage metric and some of the ways that it might be abused.
[00:10:02] Unknown:
Yeah. So that that is the problem, basically. I mean, it's human nature that if you quantify something, then people will start behaving in ways that improve that measurement even if it's not actually improving the thing the measurement was meant to measure. So you can gain the system, so to speak. So there's a few things that are downsides to coverage measurement. The first is you can have a test that runs part of your product code, but doesn't actually test it very well. For instance, you might write it write a test that calls your function, but never actually even looks at the return result. And that test will pass because it has no failing assertions in it, and it will have a 100% coverage because it ran all of the lines in your function. But in fact, your function can return any value at all, and your test won't detect it, and you're not testing anything.
So it's very easy to write tests that pass all of these simple measurements without doing anything useful at all. So you still have to put some thought into how you're writing your tests and what they're actually checking. There's also possibility that you are checking the return value and you are executing all of the lines in your code, but you're not considering all of the actual cases that your tests need to cover. Coverage can only tell you about portions of code executed, not the domains of data that have run through those portions of the code, which is a fascinating idea that I have no idea how to implement. But it would be amazing to be able to say, not only are all the lines in this function called, but, yes, you tested it over the complete domain of its inputs, whatever complete means. And for real functions, complete is infinite. So it's roughly impossible, but it's a fascinating idea.
[00:11:43] Unknown:
Yeah. That's where it really becomes interesting figuring out how you can constrain the problem space and the overall possible states that may manifest in your code. And 1 of the ways that that's often done is by introducing types or by using things like state machines to ensure that there are only certain transitions that can possibly be made within your code. Right. And testing is a fascinating subject to me,
[00:12:08] Unknown:
and how to write them better and make them more useful and tell you more about your code is fascinating to me, which is part of the reason why I got into coverage in the 1st place. And not only tweaked it to do the 1 thing I wanted it to do, but also sort of kept with it because it's a fascinating domain in which to work. How can we understand code? How can tools help us understand code? Other ways that coverage measurement can be bad is that you can choose a particular number. Like, oh, we need to be at 75% coverage. And if we're above 75%, then it's good. And if we're below 75%, it's bad. And the number itself, the the literal number 75% is meaningless because you may have covered the 3 quarters of your product code that's really easy and doesn't have any bugs. And the hard part to test where where all the bugs are gonna be is in the 25% that you didn't cover. So the the really the only value in the number is a bigger number is better than a smaller number. So if you were at 75% and you got to 76%, that's better. People will will focus on the actual number and choose something they think, well, I can't get to a 100%, so 90 seems good. But you don't really 90 doesn't have any information in it beyond 85 other than 90 is more than 85.
[00:13:22] Unknown:
Right. The the number is only valuable in seeing how it changes as you modify your code base. It doesn't have any inherent meaning in an in an isolated context.
[00:13:32] Unknown:
Exactly. Another problem with coverage measurement is you can get to a 100%. So first, getting to a 100% can be an exercise in frustration because the last 1% often are very strange conditions that are very difficult to reproduce. You can spend a lot of time mocking lots of code in order to try to get that 1%. You might feel like it's not worth the effort. Those cases are so unlikely. I can examine the code and see that it's gonna work fine. So why you know, I'm just pursuing that 100%, which is kind of the tool being in charge of the programmer rather than the other way around. So that can be frustrating. The the the next thing you can do when you get to that point is you can start marking lines of your code as this doesn't have to be covered, which in itself is dangerous. Anytime you have a system that's meant to warn you about things and you choose to disable the warning, you are you are taking on some risk that either you are wrong about it not being an interesting warning or you're right now, but in the future, it'll become an interesting warning, but you'll never go back and un silence the warning. So there's lots of there's lots of ways in which that coverage measurement number can push you in directions that aren't really great for your code. And the last thing about coverage measurement is suppose you do get to 100%, now you're kind of at the end of the road. Coverage can't tell you anything more about what your code is doing.
But there's still all of that wide open field about the ways in which your tests aren't great, and coverage can no longer tell you anything about your tests because you've gotten to a 100%. And now you're you have to rely on other ways to understand your your tests, which is good because you should have been thinking about those beforehand. But now you don't even have coverage guiding you how to write more tests to cover more of your code. And 100% doesn't mean that your tests are perfect.
[00:15:18] Unknown:
So you're on your own now. And 1 of the other complicating factors, particularly with modern code bases, is that, well, coverage can measure the Python execution. A lot of times there are mixed languages, notably in web frameworks where there's also JavaScript or templates. And I know that 1 of the things that you recently added as a plug in capability was to measure coverage of Django templates. Yep. So I'm wondering if there are any tools that you're aware of that can sort of wrap coverage and combine the information that it produces about overall coverage of the Python side with the overall coverage of the JavaScript side or whatever other languages might be integrated into this overall application.
[00:15:58] Unknown:
Right. So I don't know of once So coverage is purely a local tool. You run it in your test suite, and it gives you results that you can look at on your own laptop. The the interesting thing that's come up in the last few years are these online services that will aggregate and store coverage information so that you can see it over time or share it with your coworkers. And those services all are multilanguage. So the the 2 big ones are Codecov and Coveralls. And those services integrate well with coverage dot py, but they also read data files from other languages coverage tools. So you can look at multi language scenarios on those services. Coverage dotpy doesn't really do anything to make that easier other than have an XML file format that it inherited from the Java world from Cobaltura.
So it's not exactly a standard, but there are a few different tools that will read that XML file format. But so those online services are really where people tend to be using coverage these days, I think. There's when coverage got started in 2000 or 2001 and when I took it over in 2004, online services like that didn't exist yet. So it was all about running the tool locally and then looking at results locally. Now online tools, CI services, GitHub, repos, code checkers, Pylinters online are all everywhere. So it's very natural for people to look at the to use coverage dot pi locally or on their CI server and then upload the data files to Codecov and then see the results there. And that's where you can get the multilanguage integration because those services aren't Python specific.
[00:17:36] Unknown:
And in particular, the templates is an interesting area that often gets overlooked because people think, oh, it's just a string generator. But in reality, most template languages these days are their own almost tour and complete languages. And so it is important to be able to understand that the logic that is contained within them is properly being exercised. So I'm wondering what are some of the most challenging aspects that you encountered while trying to add that coverage of the Django templates and why you thought it was important as a exercise to actually add that capability.
[00:18:07] Unknown:
So the reason it's important is just what you said. I've been working in Django projects for a decade, and there's plenty of logic inside Django templates. And not just Django templates, there's Jinja templates and Mako templates. And all these templating languages have logic in them. The the goal is to keep as much logic as possible out of the templates, but even Django, which has the simplest language on principle to insist that logic stay in the Python code rather than in the templates, it has loops and conditionals. And you need to know whether you're actually using all of that template. Right? The template isn't data. The template is code. There's stuff happening there that can be wrong, and you want to know that your tests have tried all the possibilities in those templates. So coverage measurement of templates was a really interesting idea.
They're not Python code. So the version 3 of coverage dot py, we're in version 4 of coverage dot py now. Version 3 of coverage dot py didn't understand anything about templates at all. And when I looked around at what was missing from the coverage measurement at work, work, you know, at edx.org where we were using coverage pretty intensively, the templates were clearly outside of that. So I was very interested in how can we get coverage measurement of templates rather than build Django understanding and knowledge into coverage dot py directly, I wanted to build it as a plugin. Because in addition to templates, there's other scenarios that other where other languages are sort of translated down into Python, and you'd like to be able to coverage measure those. An example is high, h y, which is a Lisp implementation that runs in Python by compiling down to Python sort of, and it would be cool to be able to get coverage measurement of high code or any other program that programming language that's sort of built on top of Python. And 1 small reason why it's not just Django built into coverage dot py, but is a plugin system is that at work at edX, we don't use Django templates. We use Mako templates. So I could spend a lot of time building a Django implementation and not solve my work problem at all. To jump to the end, you know, plot twist, we don't have a Mako plugin yet. So I haven't solved the work problem, but plug in architecture is still the best way to go. So what I did was I tried to think about where what hooks would you need in order to coverage measure Django templates and other templating languages?
And I built those into coverage. Py as a plug in system, very simple plug in system, and then built a Django plug in on top of it, both as a way to jump start the development of that plug in, but also to make sure that the plugin support in coverage.py was actually sufficient to solve the problem. As it happens, templating languages have their own differences that are fundamental and go very deep and affected the implementation of the plug in system. To take 2 examples, there's Django and there's Jinja. Django templates are interpreted, meaning that at runtime, the text of the template is read and executed essentially directly from the text of the template.
Jinja templates on the other hand, the text of the template is read and turned into actual Python code and then the Python code is run, so it's more like a compiled template. And coverage measuring both of those styles of template required different hooks in the plugin system. And so the coverage dot py plugin system has hooks for both of those styles of template languages. It's just that the Django plugin, by virtue of Django being much more popular, the Django plugin and also that I started the Django plugin, the Django plugin is much further ahead than any other coverage dot pipe plugin. 1 of the big challenges of writing the Django plugin in addition was that Django itself changes over time.
So Django is very disciplined about letting you know what has changed, but it is also because of that, it is also very willing to make fundamental changes in its implementation and will break its API, but in well defined ways. And so supporting Django 1.4 through 1.11 is very difficult, and the code got very messy. And I'm very grateful that someone eventually came in and took over the Django plugin. I was hoping that Django project itself would own the Django plugin, but at least someone other than me has stepped in and said, I want to maintain it. Another example of someone taking over a project that someone else started, by the way, Pam McAnulty, who's another Boston person, stepped in and said, I wanna own this. And so she is now maintaining it and making it work on the versions of Django that it needs to work on.
[00:22:54] Unknown:
You seem to be focusing primarily on having plugins capable of computing the coverage of templates. Is there also the ability to extend that plugin capability for including coverage from other languages or even particularly from things like the C extensions or what's becoming more popular now with Rust extensions?
[00:23:14] Unknown:
So first, I hadn't heard of Rust extensions. I have to look into that. But no. So these plugins are still about measuring what happens during Python code execution. So the fact that Django templates, when they are actually producing output, are at some level executing Python code is why the plugin can work inside coverage. Py. And for Jinja templates, they are compiled into Python code. And then when they are producing output, they are running that compiled Python code is why coverage dotpy can be involved. For other languages like, Cython or C Extensions or Rust or JavaScript or anything else, coverage dotpy has no tuples to bring to that problem, and so that will have to be some other kind of tool.
C extensions could be measured by gcov, which I've never actually tried to use, but that is a tool that can do c language coverage measurement.
[00:24:10] Unknown:
And how does the actual coverage algorithm work, and how has that evolved from the time that you first started working on the project?
[00:24:19] Unknown:
Okay. So that's a big topic. So let's let's start very simple. So the coverage dot py, and in fact, any coverage measurement tool at at its most basic 10, 000 foot level is a very simple proposition, which is there's a phase during which it will observe your program running and note what got executed. And then there is a phase during which it will try to understand your program to figure out what could have happened. And to put it in terms of lines, that first phase will note which lines got executed, and that second analysis phase will figure out what lines even exist in the program that could possibly have been executed. And then the reporting phase is to take those 2 bits of data and essentially subtract them. You know, what could have happened minus what did happen is the stuff you're missing.
And then produce a report that shows these are the lines that did happen and these are the lines that could have happened but didn't happen. So that's coverage dot py in a nutshell. Now how those phases actually work? The first phase, the the run phase, there is a fundamental bit of the Python interpreter called the trace function, which is why the Python standard library implementation tool is called trace. And the trace function is a integral part of CPython and every other Python implementation, which you give it a function, and it calls your function for every line of code that gets executed. That's sort of that without that, coverage dot py would be impossible, let's say. Like, we could debate whether there's a way to do it without that, but that is clearly the way to build a coverage tool in Python.
So Python will call my function for every line that gets executed. So in some sense, for that first phase, all I have to do, in quotes, all I have to do is write a trace function that gets invoked for every line of Python. The challenge in writing that trace function is how do you do it so that it doesn't slow down your program dramatically? You can write a trace function in 3 lines of Python and register it as the trace function to call on every line executed. And then you run your program and it will run 40 times slower than it would have, which isn't a problem if you're just writing a toy. But if you're trying to write a testing tool that will actually be part of every test execution, you don't want it to slow down your program too much. It's inevitable that it'll slow it down some, but it's important that it slow it down as little as possible. Otherwise, people won't use it. So coverage dot py originally was just Python code. 1 of the things I added to it was an implementation in c of a trace function so that the execution of the trace function would go as quickly as possible. And even that c code has a lot of thought given to how can I respond as quickly as possible and get back to Python execution so that I'm not slowing down, the execution time? And some of the things that it does are noticing that a particular function isn't really interesting to trace at all, so don't bother tracing the lines of that function at all. And there's ways that you can report that back to the Python interpreter, but and that gets kind of complicated. There's bookkeeping to make sure you know which frames in the stack are being traced and which are not and how to get all the data recorded. So that's the first phase.
The second phase, the analysis phase, is where Python is a little bit tricky because it's a dynamically dynamic language. It's not always obvious what might be executed and what not what might not. So the second phase, the analysis phase, the challenge is you've got a pile of Python files. How can you understand what might happen when you execute them? For simple line coverage, it's not that hard because as it happens, the Python byte compiler will record a little table of what line numbers correspond to what bytecodes in the dottyc files. So you can just read that table, and it gives you a set of all the line numbers that are executable in the dot in the Python file. Historical note, the very first thing that I needed to change in coverage dot py was that it believed that docstrings were executable. And so when it reported on what code could have run but didn't get run, it reported all of your docstrings as missing from your test coverage, which is incorrect. Docstrings don't get executed. That was the first thing I added to it to make the results more accurate. So that static analysis phase is where it can get kind of interesting. 1 of the things that I added to coverage dot py to go beyond line coverage is branch coverage.
So if you have an if statement, you'd like to know not just that the the if line was executed, but that the 2 different places the if could jump to, that both of those conditions were taken. And figuring out where the branches are in the code is complicated. And in fact, I recently, in the last year or so, I forget how long it's been, completely reimplemented how the static analysis of branches works. The first time I implemented it, I was examining the bytecode under the false belief that bytecode was simple enough that it would be clear where the branches were. Not only was that not true when I first got started, but the async features that are now part of Python 3.5 and Python 3.6 completely blew that code out of the water. I had no idea how to interpret the bytecode, so I threw away all of that code. And now I'm analyzing the code for branches by examining the abstract syntax tree of the code, which is much easier for me to reason about because I understand what lines of Python do much better than I understand what bytecodes do. So that was an interesting exercise to completely reimplement a significant feature of coverage dot py.
And that's to get back to your question of what's the big challenge, analyzing Python code is always difficult. There's lots of code. There's lots of modules in the standard library to help with that. And what I found is that most of those modules get you about 95% of the way, and then you spend a lot of time on those last 5%, which ironically is exactly what people do when trying to get to a 100% test coverage. But I was trying to get to sort of a 100% Python coverage in the static analysis of Python code. So there's a lot of reverse engineering what Python's going to do and trying to make coverage dot py understand that well enough that I'm not gonna get bug reports from everyone.
1 of the downsides of having a widely used package is that small bugs will still create many bug reports because there's just a leverage factor of the number of people using the tool. So that's the analysis phase. The reporting phase is actually fairly simple. You just have basically a set of line numbers that did get executed, a set of line numbers that didn't get executed. And now the challenge is how can I best present that information to the user? We have an HTML report, for instance, that tries to nicely syntax color the code and put in the red lines and the green lines, and it has JavaScript so you can click on things, and that's fun. That's not particularly tricky. It's just try to come up with a UX that looks nice.
So those are the 3 phases of coverage, and that's how they work. There's a lot of details in there, which I find fascinating and frustrating at the same time. Hopefully, it's works well enough that people like the tool. I think to go back to your original question of why people use coverage dot py, I think it's those last 5%. When the edge case is something that you care about, then that last 5% really pays off because it's gonna be the difference between whether coverage dot py works for you or doesn't work for you. And I take pride in being able to cover whatever Python can execute, which is sometimes tricky.
I just got a bug report from someone who said that coverage dot py doesn't work if there's a null byte in a docstring, which is probably true. And I guess Python doesn't care, but coverage. Py does. At some point, maybe I'll go and fix that. That's a wildly unlikely edge case.
[00:32:16] Unknown:
We'll see. And given the fact that you're using the abstract syntax tree for determining the bits that are potentially executed, does that increase your overall maintenance burden given the fact that the AST can change between versions of Python?
[00:32:30] Unknown:
Well, I so the AST hasn't changed much since I put it in place, but you're right. That could be a problem. The bigger problem is that the thing I was using before was the bytecode, and the bytecode normally can but definitely does change. And like I said, the thing that the thing that completely sank the bytecode, analyzer was the new async bytecodes that came in 3.5,
[00:32:53] Unknown:
which I still don't understand what they're doing, and now I don't have to understand what they're doing. So while it does have its own maintenance overhead, it in in the end, actually reduced the amount that you need to worry about it, it sounds like. Yeah. And the AST, for someone who knows
[00:33:10] Unknown:
how Python code works, like, if I show you a program, you'll know that the 4 line does this and the f line does this. You look at an AST, you understand, oh, that is an exact parallel of the Python code. So, you know, there's some weird tweaks, not tweaks, quirks in there. But, you know, you understand what it means. I show you the byte code and you're like, what's this? Why what does that mean? Setup loop. What's setup loop?
[00:33:36] Unknown:
So I'm I'm happy not to be in the bytecode anymore. And 1 of the things that you alluded to as well is the fact that coverage is used in such a broad array of contexts both across operating systems and Python language implementations and versions of Python. So I'm wondering, whether you find it overwhelming trying to support so many different ways in which coverage is used and some of the ways that you manage that maintenance overhead.
[00:34:04] Unknown:
Yeah. So it's true. 1 of the things that I've tried to do with coverage dot py is I want it to be a tool that any Python developer can count on being available in the environment where that Python developer is working on their code. 1 of the things I do in order to make that true is coverage dot py has no dependencies. So there are no third party packages that you need to install in order to run coverage dot py. And I did that because I would like coverage dot py to be available to any library developer that's porting their library to a new environment. It started with Python 3. Coverage dot py has run on Python 3 for 8 years now.
It's I ported it when 3.0 was out. And I wanted other library developers who were moving their code to Python 3 to not have to wait for coverage dot py's dependencies to move to Python 3. So coverage dot py has no dependencies. But, yes, it runs on many operating systems. Windows and Mac and Unix are are the ones that we care about. That you'd think that that doesn't have much of an effect, but it turns out there's lots of weird little Windows quirks that have to get in the code. Unfortunately, getting just keeping the CI running is tends to be the biggest job for Windows these days. If anyone out there is really into Windows CI, get in touch. I'd love to have your help. Different implementations of Python. Oh, before we get to that, different versions of Python. It used to be that coverage dot py ran on Python 2.3 through Python 3.4, I think it was, which was a very wide range.
And I was bending over backwards to try to make that happen with 1 code base. Luckily, I'm not trying to do that anymore, but I'm still on Python 2.6 through Python 3.6, which means there's some things I still can't use, and I still forget that I have to put brace 0 brace into my format strings rather than just brace brace because Python 2.6 didn't have implicit numbering of format placeholders. So you can probably go and find 4 or 5 different commits where I add back in a 0 because I forgot it, for Python 2.6. The the bigger challenge is Python implementations.
So coverage dot py runs on CPython, of course, on PyPy and on Jython. Jython is a problem because it doesn't have the same introspection capabilities as the other languages, which are needed for the analysis phase. So my my strategy for those more exotic implementations is if it seems like it's just kind of impossible, then I'll just skip that. For instance, what I say is you can do the the run phase on Jython, but you need to do the analysis phase with CPython. So if you're a Jython developer, you would run your tests under Jython. You'd get the coverage data about what got executed. And then when it come came time to analyze your code, you run that coverage phase under CPython. And so far, I haven't heard anyone tell me, well, that doesn't work because of the differences, blah blah blah blah blah. You've got to fix it. So what I've tried to do is give it my best effort to keep things working on other Python implementations.
And if it seems like it's gonna be a pain, I skip that test, and I tell people they have to run it under CPython.
[00:37:25] Unknown:
And so far, it seems to be working out. And do you have any plans for the future of coverage or new capabilities that you'd like to add?
[00:37:33] Unknown:
Yeah. So there so what I've tried to do is be customer driven by what gets into coverage. There's interesting ideas that would be kind of fun to hack on, but that tends to be a little disappointing at the end because no 1 uses them, and then they're a maintenance headache because they're hanging around. And, you know, you as a as a developer, you're probably familiar with this. You get an idea in your head and you think, wow, this is gonna be great. And then you hack on it, and then no 1 cares at the end. So why did I put in all that work? So what I've tried to do is I've tried to be driven by what people are actually asking for.
The big feature that would be amazing that I've been thinking about for years because people have been asking about it for years is is what I call who tests what. And the idea there is instead of just telling me that this line in my product code was executed, I want you to tell me which tests executed that line of code. So if I have a test suite that has a 100 tests, what I wanna know is for this line in my product, which of those 100 tests actually executed that line of code. And that's interesting because, 1, we kind of have that information. Like, when we note that the line is executed, we are down there in the Python execution, and there's the whole call stack above us, so we can look up that call stack and see what test was being executed.
And it's clearly very useful because lots of people have asked for it over the years. I've been interested to have that information. But there are some significant challenges. 1, it drastically bulks up the data that has to be collected. Right now, coverage needs to collect essentially 1 bit for each line of your code. Did it execute or not? If you have a 1, 000 tests in your test suite, now we need to sort of collect a 1, 000 bits per line of code. So we are now multiplying the data by the number of tests out there. And, of course, I mean, that's a huge hand wave. There's lots of clever things you can do. But, essentially, there's a lot more data to collect. And then at the end in the reporting phase, instead of just coloring lines in an HTML report red or green, whether they were executed or not, how do I present that information to you? Now each line has essentially up to a 1, 000 test names that may have executed that line of code. How do you wanna see it? What's the right way to display it?
And then there's also interesting challenges like, how do I know what test was running? I can look at the call stack, but how do I know which is the test? That sounds like a whole other plugin that the test runners might get involved with. So there's all sorts of interesting challenges and difficulties, but that would be really useful. And I wrote a blog post about this, I think, back in November, where I laid out some of the challenges in the hopes that someone would step forward and say, this sounds like the kind of enormous challenge I'm interested in. Let me let me implement this for you with you something. It's open source, dudes. Where's the where's the patches?
So that's 1 idea. There are other things that people keep asking for. 1 of the ones that is more recently, lots of people seem to be interested in is, how do I collect coverage on a long running process? The way coverage dot py works, it collects all its data in memory, and at the end of the process, it writes it to a file. Well, what if essentially the process never ends? And and the idea here is I have a running web server, and I'm going to run it for a while. And without stopping it, I want to see what code has been executed. And that's a fairly small change to coverage dot py, but it seems like it's in the future.
Again, if someone wants to hack on that, that would probably be as a a good manageable project for someone to take on. And that brings up an interesting point, which is the coverage dot py is useful not just for measuring your tests. The idea of this web server is, let me put this on my web server in production and find code that is actually never executed by my real users. Maybe I can just delete that code. So there's other reasons to measure coverage than tests, although tests are 99.999% of the reasons anyone uses coverage. In fact, every once in a while, I get questions from people asking, how do I get coverage to run my tests? And the answer is you don't use coverage to run the tests. You use coverage to run the test runner, and it runs your test. So though that's what's coming up in the future. If you look at the bug reports, there's other ideas for features, but I'm trying more and more to insist that coverage stick to its what it's good at and not get complicated by other things. For instance, 1 thing that people ask for is, can coverage please understand how the path in my operating system works so that when I type coverage run foobar, it will find foobar the same way my operating system did? And my answer is always, I'm not gonna get into that. That seems completely outside of what coverage should be good at, and I am sure there are ways you can use your shell to solve that problem so that I don't have to try to implement that and maintain that and deal with all the weird edge cases.
[00:42:33] Unknown:
Yeah. That's 1 of the things that's always a challenge to manage in any project, but I imagine particularly in open source is feature bloat. Because like you said, it's easy to add a feature thinking, oh, okay. This will be quick, but then you have to own that feature for the entire lifetime of the project itself.
[00:42:52] Unknown:
Exactly. And it it feels really good as an open source maintainer that you hear from someone who's excited about your project, and they have a problem that they think you can solve, and it feels good to be able to solve people's problems. But it's it's a gift that eats, as they say. You know, if I solve that problem now, it's my problem, and it's my problem forever. And I'm probably be better off personally saying no, but I'm probably actually doing a better job for that guy too saying no in that the way they need it solved, they'll probably be happier if they own that solution or get it done by someone who's closer to the shell than I am, you know, etcetera, etcetera. Trying to get coverage to do something wacky off to the side just because it happens to be in the mix when you had the problem isn't a way to get a good solution to the problem.
[00:43:42] Unknown:
Yeah. Oftentimes, any, you know, as as with anything in life, what a what what doesn't happen can sometimes be more important than what does.
[00:43:51] Unknown:
Right. Like I said, the good thing about having a widely used library is the the things that actually apply to many people and will be valuable for coverage to solve, I'll hear about more than once. If someone makes a request and it's only I only get 1 request for it, then it's probably not that important to solve. As an example, so I mentioned earlier that trace the trace module in the standard library does some things that coverage doesn't do. 1 of the 1 things it does that's very cool is it will actually print out every line of your program as it runs, which isn't about coverage measurement at all. It's about being a trace function. If you're debugging a problem, you should look into that because that's a cool way to sort of get at what might be going wrong. The thing it does that is like coverage measurement is it can tell you not only that a line was executed, but how many times it was executed.
So it can count the number of executions of a line. And I have found that useful in the past in other environments, and coverage dot py still doesn't do it. And the reason it doesn't do it is that no one's asked me to make it do it. And it's been in the back of my mind for a decade about someday I might implement that, and how will I take that into account with this thing I'm thinking about right now. So it's sort of I'm sort of being prepared for implementing line counting as well as line marking. For instance, the the internal data structure that coverage dot py uses to note which lines were executed is a dictionary where the keys are the lines and the value is none.
But eventually, it could be an integer. But no one's asked for line counting, so I haven't bothered to do it because I don't wanna build a thing that just seems cool to me, but no 1 cares about. Yeah. The, the old principle of you ain't gonna need it or YAG for short. Right. Well, they aren't gonna need it in this case. Right? I mean Right. I don't even know who these people are. They don't seem to need it because they're not asking for it. So whatever. Someday they will, and I'll be like, great. Now I've been thinking about this. I'm prepped for it. It'll be easy, but we'll wait until then. Are there any other topics that you think we should cover before we start to close out the show? I don't have an idea of another topic to cover. I think people should think about open source as a thing they can participate in, either by asking for features or digging into bug triaging.
For instance, I don't get many code contributions for coverage, perhaps because historically for historic reasons, it's a Mercurial repo on Bitbucket, which I'd love to switch, but I'd lose all the history of all the bug tickets that are there. Or perhaps I don't get code contributions because it's kind of an intricate project with lots of complexities. People should keep in mind that they can contribute to a project in other ways. I had a guy step up last year who I forget exactly why he wanted to do this, but he threw a lot of energy into just triaging bugs and commenting on all the bugs and asking people for details and trying to reproduce bugs, and that was enormously helpful.
So even without any pull requests or code changes, he made a huge contribution to coverage dotpy. There are lots of ways that people can be involved. You don't have to take over the whole project and own it for a decade like I did. You don't have to make a single code change. You can contribute in other ways. Even just going on Stack Overflow and answering questions that people are asking about coverage.py is a contribution to coverage.py. So open source needs to get away from the idea that contribution means changing code. There's lots of other ways to make contributions.
[00:47:29] Unknown:
And on the point of code triage being an important part as well, there's actually a service that I came across a while ago that's run by Richard Schneeman from Heroku, and it's called code triage. And it lets you sign up at a website, pick some different projects that you would like to be notified about, and then it will periodically email you with a list of issues that need to be triaged so that you can just go in, verify whether it is in fact an issue or just add some comments on the issue itself. And then that just that in and of itself can be a huge boon to the project maintainers because it's 1 less thing that they have to do before they can actually potentially start fixing the problem. Right. Interesting. And
[00:48:09] Unknown:
services like that are gonna be the thing that finally pulls me off of Bitbucket, because I bet that guy doesn't integrate with Bitbucket. I'm not sure. I would have to go look. I I would have to go look too. I'm kind of opposed to the GitHub monoculture on principle. I think there's value in diversity, but it's also enormously useful to have all those services that integrate with GitHub. Yeah. It has attained a large amount of gravity, and it's difficult to escape from its orbit. Exactly. Let's hope there aren't demons on the planet or something.
[00:48:39] Unknown:
Alright. So for anybody who wants to follow what you're up to and get in touch, I will add your preferred contact information to the show notes. Great. Yeah. I'm Ned Batt in lots of places like IRC and Twitter
[00:48:51] Unknown:
and nedbatchelder.com
[00:48:53] Unknown:
as a website. And with that, I'll move us to the picks. And this week, I'm going to follow on, previous pick of Org Mode for eMax with 1 called Org Journal, which is a really easy way to manage a daily journal in your text editor. So I have it bound. I just type control c, control j. It will open a new buffer with a prompt with the date and a time stamp. And that's actually what I've been using during my day to day work to keep track of the different things that I've gotten done that day with just a headline and then a little bit a little note to myself. And then I've actually just been using that to post into the stand up channel on Slack. So it's just an easy way to jump into a text buffer real quick, take some notes so that you remember things that you're up to or things that you want to, note down, and it's just been very useful and valuable in my day to day work. And with that, I'll pass it to you. Do you have any picks for us this week, Ned? Sure. So my first pick is
[00:49:45] Unknown:
a Python testing library that you've actually discussed in the podcast about a year ago called Hypothesis. I've been hearing about Hypothesis for at least a few years, and it never quite wrapped my head around how to use it. I have a side project, a new side project as of June, which is 2 dimensional geometry, completely different from coverage dot py and my day job. And I finally managed to figure out how to use hypothesis on that code. And so it's helped me think about how I might use it on other code. Hypothesis is great for functional programs where your functions take a clearly defined domain of input data and then produce results that you can characterize in certain ways, and it will generate input data and try to find failures, which is a very different way of approaching tests than the old way of let me specify specific inputs and specific correct outputs.
It's not perfect. It's kind of a magic genie that can be hard to control at times, but it's definitely an interesting way to test your code. And for my second pick, I'll do a non software thing, which is a podcast called The Infinite Monkey Cage, which is produced by the BBC, and it is a professor of physics and a comedian with panelists who are either scientists or comedians talking about interesting scientific topics, and they get very deep into real science,
[00:51:05] Unknown:
and I find it hugely entertaining. So you should take a listen. Yeah. I'll definitely second that 1. I've been listening to that 1 for a couple of years now. And, yes, it is absolutely hilarious at times. Well, I appreciate you taking the time out of your weekend to share your history with coverage and the different ways that it's been used, and I hope you enjoy the rest of your day. Thanks. You too. It's been great to talk.
Introduction and Guest Introduction
Ned's Journey with Python
Overview of Coverage.py
The Evolution of Open Source Projects
Importance of Test Coverage
Challenges and Downsides of Coverage Metrics
Coverage in Multi-Language Projects
Adding Coverage for Django Templates
Extending Coverage to Other Languages
How Coverage Algorithm Works
Supporting Multiple Python Versions and Implementations
Future Plans for Coverage.py
Contributing to Open Source Projects
Picks and Recommendations