Summary
Unit tests are an important tool to ensure the proper functioning of your application, but writing them can be a chore. Stephan Lukasczyk wants to reduce the monotony of the process for Python developers. As part of his PhD research he created the Pynguin project to automate the creation of unit tests. In this episode he explains the complexity involved in generating useful tests for a dynamic language, how he has designed Pynguin to address the challenges, and how you can start using it today for your own work.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to pythonpodcast.com/census today to get a free 14-day trial.
- Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at pythonpodcast.com/hightouch.
- Your host as usual is Tobias Macey and today I’m interviewing Stephan Lukasczyk about Pynguin, the PYthoN General UnIt test geNerator
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Pynguin is and the story behind it?
- What are the problems that Pynguin is designed to solve?
- What other projects are you drawing inspiration from?
- What are some of the use cases for automatic test generation?
- How is Pynguin implemented?
- What are the challenges that the dynamic nature of Python introduces?
- What are some of the packages and libraries that have been most helpful while building Pynguin?
- Can you talk through the workflow of using Pynguin to generate tests for a project?
- What are some of the limitations on what kinds of projects Pynguin can be used for?
- What are some design or implementation strategies in the code that you are generating tests for that will help make Pynguin’s job easier?
- Once a test suite has been created, what are the next steps?
- What are some of the initial assumptions or goals of the project that have been revised or challenged once you began implementing it?
- What are the most interesting, innovative, or unexpected ways that you have seen Pynguin used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pynguin?
- When is Pynguin the wrong choice?
- What do you have planned for the future of Pynguin?
Keep In Touch
Picks
- Tobias
- Stephan
- Cycling
- Take care of your health
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Pynguin
- University of Passau
- Passau, Germany
- Evosuite
- Hypothesis
- Astor
- Walrus Operator
- MyPy
- Pytest
- UnitTest
- Bytecode library
- Pytype
- Monkeytype
- Atheris from Google – coverage-guided fuzzing
- Blog series about “Python behind the scenes”: Ten thousand meters by Victor Skvortsov
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hi touch is the easiest way to sync data into the platforms that your business teams rely on. The data you're looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL.
Supercharge your business teams with customer data using Hightouch for reverse ETL today. Get started for free at pythonpodcast.com/hitouch. Your host as usual is Tobias Macy. And today, I'm interviewing Stefan Lukaszuk about Penguin, the Python general unit test generator. So, Stefan, can you start by introducing yourself?
[00:01:40] Unknown:
Yeah. First of all, thanks for having me, Tobias. I'm Stefan. I'm a PhD student at the University of Pozau. Pozau is a small town, like, 200 kilometers east of Munich in Germany, between Germany, Austria, and the Czech Republic. And I used to do my studies there in computer science, a bachelor and master's degree, and I'm now pursuing a PhD. And the project we're talking about, Penguin,
[00:02:06] Unknown:
is my main research project here. Do you remember how you first got introduced to Python?
[00:02:11] Unknown:
I thought about that, and I, actually, I cannot really narrow that down. I played around with it before I started my studies, which maybe around, like, 2010 or something. But I really got introduced during my bachelor studies and used it quite a lot. Although our programming experience is basically mainly Java background from what is taught in the study curriculum. But I used Python there a lot. And when I started my PhD, I focused a lot on Python and shifted my main focus basically, in that area, in that universe.
[00:02:50] Unknown:
Now in terms of the Penguin project, you mentioned that that's some of the focus of your current research with your PhD program. I'm wondering if you can just give a bit of a description about what it is that you're building and some of the story behind how you came on this particular problem space to work through and some of the interesting areas of research that it's uncovering as you go through your program?
[00:03:08] Unknown:
1 of the things to note here is that my supervisor, professor Gordon Frazer, he's an expert in testing software, and he developed a tool for Java unit test generation called EvoSuite. Some of you might have heard that name somewhere around when you're interested in testing. And so we were discussing a lot of opportunities. Immediate almost immediately, it shifted my focus and interest into to Python because I thought this would be an interesting world, to see a dynamically typed language and how 1 can apply the research that's actually there. Be it static or dynamic analysis or testing or whatever, which is a lot of it is done in for statically typed languages, mainly Java and stuff.
1 of the things can be applied to Python or to a dynamically typed language in general? And what other ideas 1 can come up with? And after, well, spending quite a good time on investigating what's actually there, what people have done, and what could be done. We came up with the idea of, well, let's see if we can automatically generate tests actually for Python. And that's how it actually started. And yeah. Now I'm here, getting interviewed in the series of podcasts where so many prominent projects have been there, which is a great honor for me. So the project started like 1 and a half year ago, and it's still development. And I hope I can share some light into the project and share some insights. And maybe some of you want to try this project out and give me some feedback.
[00:04:48] Unknown:
In terms of the project itself, you mentioned that the intent is for being able to automatically generate unit tests for Python programs. But I'm wondering if you can dig a bit more into some of the, you know, types of tests that it's able to create, some of the limitations on the types of code that it's able to generate tests for and the sort of the overall sort of problem statement and how you're framing the development approach and how you're determining the areas of focus as you continue to iterate and, as you mentioned, being in a heavy development cycle, sort of how you prioritize that work?
[00:05:20] Unknown:
First of all, we want to have it as general as possible. So in theory, it should be applicable to all kinds of projects that are written in Python. In practice, we cannot hold that premise. So 1 of the problems is that as soon as you don't know about type information, which is an optional feature, a feature I highly encourage people to use. But as soon as you leave that out, it becomes more and more difficult. So the problem here is you have basically 2 things in mind. 1 is you have to create inputs for your functions and methods and your code, whatever is there, which is kind of difficult when you know about what types you have to put in there. So if you know that you have to put in an int or a string, you can generate a particular object of that type. But if you don't have that information present, you can just randomly guess or maybe do an educated guess, but it's basically a guess. And the space of choices is very huge. And the second thing is then to bring this together into a test case with the target of reaching high code coverage. Because we all know when we do a testing coverage is 1 of the metrics that is at least very interesting. I mean, you cannot reason.
About codes you don't have any coverage but can only reason about the the covered parts of the code and so I think this is 1 of the, the biggest problems and there to combine these, calling the right methods with the right parameter values and developing and evolving, the test cases, the process in a way that you achieve a high coverage. Another aspect that's totally out of focus up to now here is you don't only want to have high coverage, but but you also want to reason about the correctness of the values. So as we all know, this testing, And for that, we need assertions that the And for that, we need assertions that the results that we get or the values that we get back from functions and so on. And this is the the next level of complexity. So to say, a second research area, basically, how should an oracle, how should such an assertion look like that we can reason about the code and maybe find some bugs, for example.
[00:07:52] Unknown:
In terms of the research that you're doing and the approach that you're taking with Penguin, I'm curious to get your perspective on how it compares to projects such as hypothesis of doing property based testing or, you know, systems that are doing sort of contract based validation in Python and just some of the relative trade offs of using those different approaches and where you're going with Penguin as it relates to those other types of testing and validation exercises?
[00:08:18] Unknown:
So hypothesis is actually a framework with a lot of possibilities. That's also something that's quite close to what we achieve with Penguin. Hypothesis, the focus, as you said, is on property based testing where you usually have some property in mind or specified by requirements or whatever. And when you know about what you want to put in, a function and what you want to get out. Like, for example, if you have a function that adds 2 values, and if you put in the same value, then you have the property that the result should be twice the value. And there, you know about the property, the underlying, and that's what you specify explicitly. What hypothesis in the end or in the background does is some kind of test generation in a random way because it generates input values to explore basically the space of possible values and to figure out whether there is some violation of the property.
So I guess does something similar, but you don't have to specify anything. Basically, you have to just give it a module and maybe the dependencies of the module, and it can just run on this and tries to generate inputs for all your functions that you have there in order to achieve a maximum coverage level. So we don't focus on proving some or disproving some properties like hypothesis does, but we just want to generally cover all the code, which is something as far as I know hypothesis does not really do that they explore coverage in the first place. I mean, if they can cover a lot of code, that's good for them, but it's not the main target, I guess.
[00:10:01] Unknown:
And another interesting area of discussion in terms of automatically generating tests is the sort of different styles of testing that people aim for, you know, where some people will go for the given when then style or different setups for being able to, you know, put the code into a state where it has the expected state that you're trying to validate and just, you know, what your thoughts are in terms of where Penguin falls as far as the style of tests that it generates or being able to sort of parameterize the setup and tear down approaches that you're building in the tests that you're creating.
[00:10:40] Unknown:
So currently, in this process, we fail quite early. This was not yet our focus, but it's totally an interesting focus. So what we currently do is basically this setup execute and the assertion, the check part is still under heavy construction. We are currently working on that to also being able to generate somehow reasonable assertions and checks for that. So basically, what we have is now the setup and the calling of the functions, but not the checks. And parametrization, for example, as you mentioned, is a very interesting topic. And the setup and teardown code, which is definitely worth future research and definitely on my agenda. And I hope I can get there sometimes after solving all the other problems on the way to that. So 1 of the problems that we are facing, and I also kind of noted this before, is the type information that you have or that you don't have.
This is by far the largest project to extra the problem to actually achieve coverage because you don't know what to put inside. But, yeah, parametrization or set up and teardown code, all the things that you do when you do manual testing or when you write your tests manually, that are definitely things that can be explored and that sometimes should be incorporated into the framework, of course.
[00:12:06] Unknown:
And then another interesting aspect of it is you seem to have settled on pytest as the framework that you're targeting for these test generations. Pytest. And particularly given the fact that unit test itself is modeled after the JUnit suite and your adviser has the Evo suite that targets the Java runtime and just if there were any sort of potential shortcuts that you would have been able to take if you had gone with unit test?
[00:12:41] Unknown:
We actually had an exporter for unit test. So inside the framework, it doesn't make any assumption on how it's exported. It's basically building its internal data structures. And in the end, we create an abstract syntax tree out of it. And basically out of the AST, we write the the code to the files. And we just recently deleted the unit test exporter because of maintenance reasons, actually. So we decided back when we started the project, we had some data. It was, I think, from the Python survey of Stack Overflow or was it JetBrains? I'm not sure.
Nevertheless, they said, okay. Well, most of the people are using the Pytest framework, are using that style. And pytest also is able to execute the standard unit tests style tests. And In the end, I can imagine that we will re add the unit test exporter. The thing why we deleted it was because we had issues with assertions and floats where you not necessarily can check for equivalents, but you have checked whether they are closely enough. And this was then the main cause, together with other things with the assertions, why we deleted the code recently. But we might add it if there's any request on that. Digging more into Penguin itself, can you talk through the
[00:14:07] Unknown:
approach that you've taken for building the project and some of the, you know, design and architectural aspects of it and just some of the challenges that you face because of the fact that Python is so dynamic and being able to build Penguin to generate tests for, you know, this code that might have, you know, monkey patching or unknown types and just this this overall problem space that you're trying to cover.
[00:14:29] Unknown:
Yeah. So you will notice when you look in the code is that we have a strong training background in Java. And many things and design decisions are heavily inspired by the EvoSuite. That's something that might work at some point, but that also caused some struggle at other points. So the whole internal representation was based on EvoSuite, and we saw that this is maybe not the best design decision, contain single classes, which is if you're coming from Java, the natural way, because you have 1 class there per file, but it's maybe not the best way of having it in Python. And so what I learned by heart there is that you get to the import hell, basically, where you have circular imports if you are not careful enough and stuff and have to deal with all these kinds of strange things, how the import mechanism actually works when you're working this Python like you would do with Java.
So that's something that I learned the hard way. And you will see a lot of places in the code. And only recently, I started to clean up whenever I touch something. I clean up those things and remove classes that just contain static methods to, like, top level functions and all the things that are basically standard if you come with a strong Python training background. But if you come from the Java world that you not necessarily do in the 1st place, that you have to learn over time and adapt it. But we are improving on that. And this is something that's currently under heavy change. And so every version that we release might break something if you rely on internals here.
So if there's anybody out there who is developing and then relying on some of the APIs that that are there, please drop me a note. I mean, I don't know of too many people who have played around with the framework. So I'm in the nice situation that I can just break things without asking people. But if there are any people that are relying on APIs, please tell me because then we need to maybe carefully discuss what to do and not just say, look, let's refactor that. Right? Let's break everything. Let's do it in a nicer way and don't care what potential users might wanna have.
[00:17:01] Unknown:
As far as the actual development of Penguin, I'm curious what you have found to be particularly useful from the broader ecosystem as far as libraries, you know, 3rd party libraries or built in modules from the standard library and just how you've gone about selecting which tools to use to be able to build out this test generation framework.
[00:17:23] Unknown:
From a library side, I have to mention, basically, we have, well, 2 libraries. 1 might become obsolete for being used this once we move to Python 39, which is the astro library for converting AST spec to source code, which is the feature we rely on when we write out the tests in the end, where we basically transfer our internal representation into the into Python AST statements and then write them out to source code. And the second, and maybe the library that saved me or saved us a lot of time is the bytecode library. I don't know whether you know it. Basically, a wrapper library to deal with the byte code that's produced and that's interpreted by the Python interpreter.
They provide a nice API to add, for example, statements. So the byte code itself, everything is immutable, but they have some kind of code there that you can convert the byte code to different representations, add some statement, and basically bring it back in a way that you can then afterwards execute it. We need this feature a lot. And so when we started the project, we were thinking about the how to measure coverage, but not only coverage because we actually need more information. So what our current focus is is branch coverage. So covering all the branches in the program and wanna know is not only which branches you have covered, but you also want to know how close you failed. So basically a concept that's called branch distance.
So for example, if you have a statement, if x less than 42 and your current x was 22, then you know that your branch distance, that you are basically 20 an interval of 20 away from covering the then branch. And to have this information, we rely heavily on bytecode instrumentation after dealing a lot with can we maybe also do this with a tracing mechanism from the standard library, which has some serious drawbacks in my opinion. I can come to that later. And so the bytecode library saved us a lot of time that we could actually instrument it in a similar way. 1 would instrument it, like when you do it in Java, where you can just instrument the bytecode, these libraries as well. And maybe on the tracing that I just mentioned, so you might know that tracing only supports 1 tracing function at a time. And when you want to have more of them for different aspects, you basically need to somehow stack them together. And from the outer call, the inner function or whatever, and need some wrapping or the functions need to behave in certain ways or need to be edited, removed in certain ways, which is quite complicated to deal with, especially if you don't know what people are doing. And when you, for example, use coverage pi to measure coverage, they register their own tracer.
And that could just remove your tracing that you are using and disable your measurements at all just because you, for example, run your own unit tests to measure the coverage that you have, which is a bit of a pity actually. It would be great to be able to register just like more tracing functions that are called in parallel, but that's currently not supported. So we decided to do it with bytecode instrumentation, and the bytecode library saved me a lot of time there and enabled us to do this without going back to manipulating basically single bytes in a byte stream or something.
[00:21:03] Unknown:
I wonder if there have been any peps or efforts in the past to modify the tracing interface to be able to add support for multiple tracers or if that's something that you've looked into, but
[00:21:14] Unknown:
that's definitely sounds like a shortcut. I have actually did not investigated in this, whether there are some proposals on that. But, nevertheless, I would strongly support something like that even besides Penguin. Some other projects might use that as well, especially since almost everybody is using coverage pie to measure coverage. They might be happy to have that to also add further traces. For example, if you want to have, like, debugging or some other introspection where you want to rely on the tracing functionality.
[00:21:50] Unknown:
The fact that you're relying so closely on the bytecode implementation, I'm assuming that that means that currently you're only able to operate when running within CPython. And I'm wondering if you've explored the possibility of working with the alternative runtimes that are out there.
[00:22:05] Unknown:
I thought about that. Yes. So currently, as you said, we are relying on CPython. And since the byte code is basically implementation specific, we cannot even tell for sure whether everything works in Python 3 9 or a 3 10. I don't know. What is the alpha version or whatever the current? I think they might have just released the r the first RC. Oh, or the first RC. So 310, I haven't tried. 310 breaks already before when we install dependencies because some of our dependencies are not yet supporting 310. So we have not even tried it. We have 39 running in our CI pipeline. And from our test, it looks good. But our unit tests cannot prove correctness, of course, and so we don't really know it. Whenever I find some time, I will do some benchmarking and comparison maybe on 38 versus 39 to have more confidence whether this works or whether we need to change some things there. And speaking of other run times, this will be nice. I would particularly be interested in pypy interpreter written in Python, basically doing just in time compilation and all these fancy things that are very interesting.
But this has now 2 drawbacks for me. So 1 is we're relying on this bytecode instrumentation, which is something that we would need to be able to solve for supporting PyPI. The other thing is that our code these days relies on features that are introduced in 3.8, mainly Walrus operator to name maybe most prominent feature. And as far as I know, PyPI supports only up to 37. I have seen that they might have some development version that supports 38 and that they are working on that now. But I have not tried it if this would even work in this early stage of their development of supporting Python 3.8. But it's definitely interesting. And maybe we in the future, we can support that when we can come up with a different way of instrumenting the code under execution.
[00:24:19] Unknown:
And as far as the actual use cases for something like Penguin, most developers are probably familiar with writing their own unit tests and figuring out how do I set up this function to be able to be in the state that I wanna validate and then build in the validation functions. And I'm wondering what are the sort of environments or use cases or code bases that are out there where something like Penguin would be beneficial? You know, is it something where you would just bootstrap a set of tests and then manually tune it? Or is the intent to be at a point where you can potentially do away with the manual testing and just use something like Penguin to be able to provide the set of assertions that are most important for being able to get at least a baseline of coverage.
[00:25:03] Unknown:
So I guess 1 could do both. So the first thing is maybe the more the easier or maybe also more reasonable way of starting to have some basic tests. And I think of, like, having tests for basically data classes or classes that do not rely, like, on web APIs or databases or something. For this, Penguin might be a good start to provide you at least a couple of tests that you don't need to do all the way manually. When you generate tests for more mature projects, more mature code basis where you also have tests, of course, you could do that in the end, especially assertions, a problem that you need to inspect manually, at least in, like, code review or something.
1 possible scenario that comes to my mind is 1 could just run Penguin in a nightly build or something and maybe let it create merge request automatically with new tests. And then you inspect those manually and either you accept them and add them, or you just discard them if they are not fitting your needs. But, yeah, it depends largely and heavily on the project that you are doing and what you want to achieve in terms of testing. And there are a lot of possibilities for further testing. We were discussing about hypothesis before, which is something I'm currently digging deeper into because I think it's very interesting also their approach on on dealing with input generation.
So combining a lot of tools here might be a good choice.
[00:26:38] Unknown:
In terms of Penguin itself, are you able to take advantage of an existing test suite to help inform where you might want to focus on generating new tests or being able to inform the existing constraints or sort of because you mentioned that you're also focusing on coverage, you know, test coverage if you're able to use any of the coverage information based on your existing tests to identify code paths that you might want to focus on for generating new tests for.
[00:27:06] Unknown:
Yeah. That's a very recently added feature, basically, and a feature that could also be improved, of course. What to know about this is here how Penguin works internally. So basically, there are 2 styles of using it. 1 utilizes a random test generation where you basically select randomly the next method that you want to call and try to fulfill its parameter values. And the second approach is using evolutionary algorithms where you basically, like in Darwin, in evolution theory, where you basically evolve the test case using crossover, using mutation. So mutating the test cases by adding statements, removing statements, replacing some statements or modifying them, and changing or crossing over to test cases, meaning that you'd the the beginning part of the 1 and the end of the second test and basically flip those.
So using that, especially the evolutionary style, we have the possibility to now load test cases that are existing as long as they are supported by our internal representation. So we don't support all the possible syntax constructs that you could use in a test case, of course. I mean, some people are using loops in their test case, which is maybe considered bad practice in general, but you can do it, which will not be used by Penguin, then will not be loaded. But you in theory, you could also evolve that and and extend that and load existing test cases and let the evolution start from that. So using them as a basis and let them the evolutionary algorithm try to give more coverage by mutating, by crossover, by changing, adding statements, whatever else it wants to do in order to yield higher coverage results.
[00:29:01] Unknown:
And then as far as actually getting started with using Penguin on your existing project, I'm wondering if you can just talk through the process of setting it up or determining what are some useful initial target areas to start generating tests for or if there are any tasks that you would want to do ahead of time to maybe add more type information or if you would need to have some sort of particular architectural style as far as, you know, class inheritance versus composition or, you know, the functional decomposition of the program to make it easier Penguin to do its job?
[00:29:35] Unknown:
So type information is definitely a critical point. And as far as you can go, add types, add all the type information that you can and use type checking tools like MyPy. Just use them on your projects and edit type annotation. It not only gives you a lot more safety there because the type checking tool rules out certain bugs, but also your editor will give you something back because you get better code completion, for example. And having this information is somehow crucial. I mean, Penguin works without it, but less good when it has the type information because it then needs to guess all the input types. And if you have them, it only has to deal with the difficulty of generating objects of certain types. So to get back to your question, when you want to get started, I would start with, like, a small module that has only a couple of functions or methods in a class.
And what you basically need to do is to have best is a virtual environment where you install penguin inside and where you install all the project dependencies that your project has inside it. So 1 approach you could do is adding a penguin to your virtual ends that you have for your project development, or you do it basically vice versa, creating a new environment for Penguin and add your project dependencies on whatever else you want to have there. And then you can basically just invoke the tool. 1 problem that might occur there very fast is depending on what your code does, This can have some unexpected side effects. So for example, and this is a real world example, was also had an issue there on on the GitHub repository, but somebody was asking this.
Penguin executes your code actually. And so whatever your code does can be done. And the particular example that we had was that after running Penguin on the project, the person had a lot of empty folders and files in their file system with almost arbitrary names. This was in the end because their code that they ran penguin on had some usage of the file API, which basically created those folders. And from parameters and random input strings, it produced these random names. And so what you maybe want to have, because this can, of course, cause serious harm. I mean, if your module deletes your whole file tree and you run it as a root user, I hope you better have a backup because it will basically delete everything from your hard disk.
So what you should do is to isolate it as good as possible. And what I do these days is running Penguin in a Docker container. So we provide already a Docker file that 1 can start building on. You can also just install the version from the package index inside your Docker container together with all the other dependencies that you want to have there. And then the worst thing that can happen is that the execution basically breaks the container. But as you restart it, you will have a plain image again. And there can't be too much harm to your hard disk, except you spot some bug on Docker maybe. But then they might be curious to know about that.
So isolate it whenever you can isolate it because it can really cause serious harm whenever your code can cause serious harm.
[00:33:18] Unknown:
And so as far as the sort of execution of Penguin, you mentioned starting with a small module that has a limited scope. But for somebody who wants to kind of get a general coverage of their project, is it possible to say, you know, start at the root module and have Penguin walk through the project hierarchy, or do you have to execute it 1 module space at a time or sort of what the targeting capabilities are for being able to run Penguin and generate the tests and determine, you know, where the tests are going to be written out to?
[00:33:50] Unknown:
So currently, it has to be run on each and every module that you want to generate tests for. And it will generate tests then or try to generate tests for the functions and stuff that's inside this module, which can, of course, involve any dependencies. And you can also generate tests for 1 module, cover parts of other modules, of course, if they are called. But you have to currently invoke it on each and every module that you want to have tests for. And by that, you also have the possibility to have single files for for single modules with the tests in it.
[00:34:29] Unknown:
We've all been asked to help with an ad hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV file via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud data warehouse to SaaS applications like Salesforce, Marketo, HubSpot, and many more. Go to python podcast.com/census today to get a free 14 day trial and make your life a lot easier.
As far as the environment setup, if your code that's under test has dependencies on, you know, a database or, you know, some mocked API endpoints, is penguin able to sort of shell out to sub suite or, like, call out to, you know, your pytest setup to be able to bring up those systems? Or is it something where you would need to bring the system into a state where it's ready to be tested and then call Penguin on the module with the expectation that everything is already there?
[00:35:39] Unknown:
Currently, definitely the latter. Having something like databases is something we currently excluded. We excluded also in our focus several other things, like, in general, APIs that you call remotely, like web services or so. Because you need to have a lot of domain knowledge to actually come to a state where your program is actually in a reasonable state. Like, if you imagine, for example, you want to have JSON BLOBs. Generating a valid JSON blob is 1 problem, which is not trivial. But generating a JSON blob that serves calls a particular API or that has a particular meaning to the code under test, that's basically almost impossible without domain knowledge.
So we are not focusing currently on those things. I mean, I'm happy to accept any contribution if somebody wants to work on that. Because I think that's a great topic and especially for using Penguin in practice by many people, it's something definitely also worth in it. But I might not be able during my research or my PhD to work on that. And the second thing that we're currently somehow excluding, so not because it's a general necessity, but because it's easier for us is everything that relies on native code modules. Like, if you have most prominently maybe NumPy as a dependency where large parts of the library are implemented in c for performance reasons, of course, we are currently not focusing on these libraries. I mean, in theory, it should work, but we don't focus on this just because we don't want to deal with all the compilation, of the source code or having some prebuilt binaries that might work or might not work depending on your system.
So this is also definitely an interesting thing. So basically, the whole, let's say, data analysis thing also coming up this, like, inputs for for data analysis pipeline would be very interesting. And as far as I know, for example, hypothesis also has some strategies to generate at least, like, data frames and pandas or something like that. This will be definitely interesting, but that's not my main focus these days. But, yeah, if you have some ideas on that, I'm happy to talk to you and to have that contributions because I think it would only be good for Penguin.
[00:38:09] Unknown:
And as far as the overall goals of the project and the assumptions that you had going into it, I'm wondering if you can talk through sort of the general destination that you're aiming for and how the sort of goals and ideas of how Penguin would work or what it would be able to do have changed or evolved as you've been working on building it out and exploring new areas?
[00:38:33] Unknown:
So when I started, I had in mind I was like this guy coming from the Java world where everything is nice and typed. And I thought of, well, the feature is not that new. So maybe a lot of people have already adapted it to their project. And this was the first lesson I learned that actually finding reasonably sized projects that have type information available is a nontrivial task. So, yeah, there are quite a bunch of toy projects outside. I call it toy projects now, but projects with only a couple of functions and maybe a few hundred lines of code where people did this effort. But for larger project, that are also widely used, it's very difficult to find some of these. So this is 1 of the assumptions that I had back when we started this that failed and that also shifted somehow the way of what I'm working on and what I'm focusing on. So the idea was basically to build a framework that can generate unit tests for almost arbitrary Python code. We figured out that there are some, like, type information, some other problems, or that we might need to solve or at least come up with some partial solution to allow Penguin to really and reasonably generate tests for it. So my focus was then more shifted, for example, into the type inference direction. What can actually be done and what is actually done? So these days, a lot of people, especially in in research communities, build tools that use some kind of learning, machine learning, deep learning approach to predict some type information, which is something that we might be able to utilize if we can just query such a tool as a black box and say, give me a suggestion for an input type.
And other things in the type inference world as well. So this is 1 of the main problems that we are currently facing. And, well, then we are still like this Java inspired thing where we had some pitfalls there, and we learned a lot that you cannot do things easily in Python. So an example that just comes to my mind is fields and classes, which are added dynamically, basically. And you can do best effort static analysis to detect those, like searching through the constructors and whatever else 1 can come up with, but you will never have a full and complete solution to this problem and detect all the types. And especially also other dynamic features like removing fields dynamically during run time is something that at least I don't know of a way to easily deal with it. And these are all items that are, I guess, important and that are used in real world software that we have also seen there where we don't have a real solution for now. So this is definitely something where we could improve and that we hopefully will improve over time.
But at this point, Penguin in this may be in a too early stage to be able to deal with it and needs a lot of work to support more of the basic features actually of the language. For example, the only reason that edit supports generate input for simple collection types like lists and and sets and dictionaries, which are heavily used, of course, in practice, but which are maybe not that trivial to deal with when you automatically
[00:42:08] Unknown:
generate inputs. Yeah. Especially given the heterogeneity of the potential values within those collections because especially if you have, you know, a dict that is of type string any. You know? Oh, yeah. Anything's game.
[00:42:24] Unknown:
That's a nice 1. So any is the best type that you can have, of course. I don't need mode off. Any is basically if you have to write any as an annotation, usually, you don't need to have an annotation because it's basically the same. And any is a nice placeholder if you don't want to figure out what the actual type is or if you can't for some reason, but it's something that you want to avoid. Also, as a general strategy, if you're adding type information to your project, try to avoid any whenever you can because this won't help you and this will just bring you further problems down the path because then you will easily come to the point where you just can add any as a type annotation because you did this at some methods, down in the corner of your project and everything else can just be any because otherwise, you cannot specify the the type checker anymore.
[00:43:20] Unknown:
And in terms of the type inference and type generation approach, I'm wondering if you've looked at tools such as PyType because I know that that has an execution mode of being able to walk you know, statically analyze a code base and be able to generate at least a best guess set of type stubs for the project.
[00:43:38] Unknown:
Yeah. I'm also investigating on that, and we also played around with things like monkey type, which is a framework that basically instruments or that basically uses the tracing mechanism, traces your code while it's executed, meaning it collects all the information on types that it sees when the code is executed, like, parameter and return types of functions. And also can generate stops on, I guess, they can also apply directly to the code. So basically extract this information into files. And something like that is also there are a lot of interesting ideas and lot of tools that implement those ideas.
And the thing here for me is to to figure out what is maybe promising what might also work in our context and within our tool and then adapt it in a way that we can use this without having too much overhead. So the experiments I did by just calling wipe, the monkey type is a black box, revealed that this is way too heavy. There's way too much overhead of the monkey type framework around it. And this basically slows down test generation too much. But the idea itself is very nice and maybe need to be adopted to also extract information then during the execution of a test for further methods, for example.
[00:45:04] Unknown:
And in terms of the development of Penguin, I'm curious how much you're able to use Penguin to generate the tests for Penguin and, you know, what what that cycle looks like.
[00:45:17] Unknown:
That cycle looks like you fail hardly with exceptions. The problem that you're immediately facing is that basically the tool with the same names and imports in the same name space then you want to generate for, which is prone to failure, I guess. And an idea could be to rename basically all the modules dynamically when you want to do this. But yeah. So I tried it once, and the easy approach just failed. And I did not investigate any further on this. It would be, of course, nice to have, like, a compiler being able to to compile its own code, a test generation tool that can generate test for its own. But we are far away from that, and there are a lot of challenges to be solved besides renaming modules.
[00:46:06] Unknown:
And in terms of the actual usage of Penguin, either in your own research or in the community, I'm wondering what you have seen as being some of the most interesting or innovative or unexpected ways that it's been applied.
[00:46:19] Unknown:
I have not seen too much usage actually in practice by other projects or by other people. I had some reports. Usually, they come with bug reports or with questions on things that are not clear. So I have seen it be you being used by some people that are creating libraries. And also, 1 person was dealing with a database API wrapper, basically, which is like what I said before. Creating tests for those kind of APIs is especially difficult since you need to have at least certain protocols of calling methods in a particular order and stuff, which needs to be added as information somehow.
But I would love to see more people trying out their project and seeing what happens, and especially tell me what happens, what they experienced there.
[00:47:13] Unknown:
As far as your own work on Penguin, I'm wondering what have been some of the most interesting or unexpected or unexpected or challenging lessons that you've learned on the in the process?
[00:47:21] Unknown:
I learn new things almost daily. I learn by heart that, like, the tracing mechanism, that this is well, it would be great to have more than 1 tracing function as we've discussed before. The things with comes to my mind with the with the fields and classes that they are basically only really a little dynamically and that you cannot tell for sure with the static analysis or some static checks. So basically, there's each and everything that that you can think of when it comes to, like, the dynamic nature of the language is something that causes a headache, basically, when you want to generate tests for it because it's something that you maybe don't have in mind in the first place when you think of using the language, like, in a very basic way, but you do not rely on these features or where don't use them because you might not even know about them. And so there is actually too many things to name them or just can name these 2, like, directly from my head.
[00:48:25] Unknown:
As far as people who are interested in exploring Penguin, or they want to be able to have some means of building tests for their application. What are the cases where Penguin is the wrong choice and they'd be better off just building the tests manually?
[00:48:39] Unknown:
Well, the thing is if you have complex APIs that follows certain protocols, like, for example, database communications or that need very particular inputs like communicating with your web service or whatever you have, then Penguin might be the wrong choice. It might also be the wrong choice if you are just hunting for bugs. If you're just hunting for bugs, you might be better served with a fuzzing tool where you basically throw random input on your program and see what happened. And I only recently played a bit around with the Acerus tool by Google, which they released to the public, which is a nice fuzzing approach. And there are others as well. And for bug hunting there, I guess, using something else is maybe easy, maybe also faster because penguin is then the heavy lifted test generation process that's quite complex. And if you just throw random inputs on your application and see whether it crashes or not, this can be achieved in a much cheaper way than what we do.
[00:49:44] Unknown:
And as you continue to work on Penguin and continue through your PH PhD program and also once you've completed your program, what do you have planned for the future of the project either in terms of new features or capabilities or research directions?
[00:49:59] Unknown:
2 things are things that we also already mentioned. So 1 of the things is coming up with proper assertions because that's basically an essential thing if you want to have reasonable and usable generated tests and not bother the user to come up with assertions. The second thing is working more on this, let's say, type inference kind of thingy because this information is basically crucial. And having more information is better for the test generation to work and also supporting more features of the language actually to being able to deal with that. That are definitely things I want to explore and that I want to work on in the next months, maybe years. And of course, there are always things that might be interesting and that are maybe domain specific, but that are interesting to a a broad community. Like I mentioned before, communicating to web services with JSON blobs that you are able to somehow specify, for example, let's say a grammar, how your JSON blobs have to look like or whatever else can specify this automatically in explorable way that we can more easily generate valid inputs for such APIs.
These are definitely things that are worth investigating. And also, I only reasonably stumbled again about a problem I built myself or a bug that I built myself during data analysis. Being able to actually generate tests for a data analysis pipeline is something that, in my opinion, is definitely worth investigating. That is definitely not easy that many people might profit off because so many people are doing data analysis in some sense or another these days. And they all build their scripts and pipelines and maybe use some Jupyter notebooks or whatever. But this is only okay. We write it and we look at the results. And as long as the results look good, we hope that it's correct. But nobody actually tests those things or almost nobody tests those things.
So this is also maybe a domain that would be interesting for future research.
[00:52:10] Unknown:
And are there any other aspects of the Penguin tool or or the overall space of unit test generation or testing in general that we didn't discuss yet that you'd like to cover before we close out the show? There are many things, like, in the context that now of test generation, like, things like mocking,
[00:52:27] Unknown:
which is definitely a thing and definitely a nontrivial thing. So I always struggle with the mocking library of Python in the unit test API, getting the names of the mocked functions and classes. Right? So this is something that, in my opinion at some point could be automatized to not bother the programmer anymore. So this is definitely something that is worth future work. And if you want to do testing for your project, a good thing is to go out to read a lot of blog posts, Twitter feeds, podcasts like yours that explore tools and methodology and give you hints on what you could do and a good thing I would suggest to everybody is to use as much tooling as possible that supports you, especially since the language is so dynamic and allows you to do so many dirty tricks that are prone to introduce bugs.
A lot of them, you can prevent by using type checking and linting and whatever else.
[00:53:37] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing or contribute to Penguin, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the concourse framework for CI and CD. Just recently started using that at my work and have been pretty impressed with the overall design of the system and the capabilities that it comes with. So if you're looking for a CI pipeline or a framework for being able to build general sort of arbitrary pipelines where you have dependency inputs and outputs trigger different downstream tasks. It's definitely worth taking a look at. And so with that, I'll pass it to you, Stefan. Do you have any picks this week? Yeah. First of all, because you asked me how to keep in touch, So there's this GitHub repository,
[00:54:21] Unknown:
and that's maybe the easiest way of coming feedbacks or bugs or whatever you have. And you can also find me on Twitter. Feel free to drop me a note or a message there. And if you are searching for me, you, of course, also find my email at the university and I'm happy to take also emails there, but I guess the easiest if you're related to Penguin, how you can use it or from the bug is to use the tools and especially the issue tracker of the GitHub repository. Because then not only me, but also others are able to maybe help you or to figure out the problem, what's going on there. And you might have a shorter and faster feedback loop than contacting me directly.
Some picked from me. So besides doing all this thing these days with the COVID pandemics going on in many countries and some it's getting better, but in in others, it's getting worse. Do something for your own health. I recently experienced this. I'm going to cycling now regularly again. My body is in better shape. I'm feeling better right now. Eat healthy, have enough sleep, and also you have to respect some of the rules from the pandemics to not meet people or to only meet a few people. Stay in contact with your peers, with your family, with your friends, because otherwise, you will be lonely very, very fast, and being alone is especially difficult in this situation.
And so talk to your peers, meet them if possible, or at least do some phone calls or video calls or whatever. Use whatever technology is available if you can't meet in person. And all of you stay healthy and safe, and we hope that we can overcome this pandemic in the near future to be able to also meet in person and hopefully also meet at events in person and talk personally, which is something I'm missing. Yeah. Definitely.
[00:56:23] Unknown:
Well, thank you very much for taking the time today to join me and share the work that you're doing on Penguin. It's definitely very interesting project and an interesting area of research, and definitely look forward to seeing where you take it and being able to use it for simplifying the work of writing tests for my own code. So thank you for all of the time and effort you've put into that, and I hope you enjoy the rest of your day. Yeah. Thank you for your newest invitation, and I hope you also enjoy your rest of the day and also enjoy your weekend that's coming up.
[00:56:51] Unknown:
And I hope we can meet in person somewhere down the road in the near future. Absolutely.
[00:56:59] Unknown:
Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com for the latest on modern data management. And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastthenit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Message
Interview with Stefan Lukaszuk
Overview of the Penguin Project
Challenges in Test Generation for Python
Comparison with Hypothesis
Styles of Testing and Penguin's Approach
Building and Designing Penguin
Useful Libraries and Tools
Compatibility with Alternative Runtimes
Use Cases for Penguin
Getting Started with Penguin
Handling Dependencies and Isolating Tests
Goals and Evolution of Penguin
Type Inference and Generation
Using Penguin to Test Itself
Community and Practical Applications
Future Plans for Penguin
Additional Aspects of Unit Test Generation
Contact Information and Closing Remarks