Summary
As we build software projects, complexity and technical debt are bound to creep into our code. To counteract these tendencies it is necessary to calculate and track metrics that highlight areas of improvement so that they can be acted on. To aid in identifying areas of your application that are breeding grounds for incidental complexity Anthony Shaw created Wily. In this episode he explains how Wily traverses the history of your repository and computes code complexity metrics over time and how you can use that information to guide your refactoring efforts.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Anthony Shaw about Wily, a command-line application for tracking and reporting on complexity of Python tests and applications
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what Wily is and what motivated you to create it?
- What is software complexity and why should developers care about it?
- What are some methods for measuring complexity?
- I know that Python has the McCabe tool, but what other methods are there for determining complexity, both in Python and for other languages?
- What kinds of useful signals can you derive from evaluating historical trends of complexity in a codebase?
- What are some other useful metrics for tracking and maintaining the health of a software project?
- Once you have established the points of complexity in your software, what are some strategies for remediating it?
- What are your favorite tools for refactoring?
- What are some of the aspects of developer-oriented tools that you have found to be most important in your own projects?
- What are your plans for the future of Wily, or any other tools that you have in mind to aid in producing healthy software?
Keep In Touch
- anthonywritescode on GitHub
- @anthonypjshaw on Twitter
- Website
- Medium
Picks
- Tobias
- Anthony
Links
- Wily
- Dimension Data
- Pluralsight
- Real Python
- Seattle
- C#
- Cyclomatic Complexity
- McCabe
- Git
- C
- Assembly
- Halstead
- Radon
- The Zen Of Python
- Vocabulary Metric
- Java
- Anti Patterns
- God Object
- Pre-Commit
- Codeclimate
- Glom
- ASQ
- PyCharm
- PyDocStyle
- PyLint
- Black
- Sunburst Chart
- Visual Studio Code
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast.init, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So say hi to our friends over at Linode. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale up. Go to python podcast.com/linode, l I n o d e, to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. And if you're like me, then you need a simple and easy to use tool to keep track of all of your projects.
Some project management platforms are too flexible, leading to confusion of workflows and days' worth of setup, and others are so minimal that they aren't worth the cost of admission. After using Clubhouse for a few days, I was impressed by the intuitive flow. Going from adding the various projects that I work on to defining the high level epics that I need to stay on top of and creating the various tasks that need to happen only took a few minutes. I was also pleased by the presence of subtasks, seamless navigation, and the ability to create issue and bug templates to ensure that you never miss capturing essential details. Listeners of this show will get a full 2 months for free on any plan when you sign up at pythonpodcast.com/clubhouse.
So help support the show and help yourself get organized today. And don't forget to visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And don't forget to keep the conversation going at python podcast.com/chat. Your host as usual is Tobias Macy. And today, I'm happy to welcome back Anthony Shaw to talk about his project Wylie, which is a command line application for tracking and reporting on complexity of Python tests and applications. So, Anthony, could you introduce yourself for anyone who hasn't listened back to your previous episode?
[00:02:05] Unknown:
Yeah. Hi, Tobias. It's really great to be back on the show. It's been, what, almost 2 and a half years now, 3 years nearly. I'm sure a lot's changed since then. So, no, it's great to be back on. Yeah. So I'm Anthony Shaw. I'm a by day, I work in for a company called Dimension Data, looking at technology skills for the organization. And by night, I dabble with Python, make pet projects, and also blog, quite a lot on different forums. And I make courses, Python courses on places like Pluralsight, and I also write for realpython.com
[00:02:38] Unknown:
as well. And do you remember how you first got introduced to Python?
[00:02:42] Unknown:
Yeah. It was quite a few years ago now, and, basically, I was in a hotel in Seattle. And I was traveling for work and, basically, got really, really ill, and I was supposed to fly to New York City the day after and basically had to stay in Seattle, which isn't isn't the worst thing. It's a lovely place. And had nothing to do over the weekend, because I had to stay in bed and rest. So, basically, just got out my laptop and and watched a whole bunch of Pluralsight courses, teaching me Python. So that's basically how I got into the language. I think it was about 4 years ago now, so really not very long ago.
Python's not the first language that I've learned. I can code in quite a number of other languages. C sharp probably the most experience in that.
[00:03:25] Unknown:
And so you mentioned that recently you've been spending some time working on different pet projects because I know that the last time you spoke, you've moved more management position and helping to train the technical organization at Dimension Data. So can you discuss a bit about what the Wiley project is and what your overall motivation was to start working on it? Yeah. So the the Wiley project is a tool that helps you understand
[00:03:53] Unknown:
the complexity of your Python code in an application. And it does that by looking at, complexity from a number of different angles using different metrics, which we can have a chat about. And I think the most important thing is, because there's a number of tools to to calculate code complexity, but they they only really make sense with context. And, you know, if it just spits out a number saying your cyclomatic complexity is 5 and a 502, for example. It's just like, okay. What does that mean? Is that good? Is that bad? And really kind of when I looked around at the tools that were available, that's really the frame of mind that it was in.
This So basically what Wylie does is it gives you a way of archiving that data. So it integrates with Git. So it can go through your Git history and basically run all the complexity metrics against your code and it will give you a breakdown per module, per function, or even per class to show how complexity has changed, over time.
[00:04:50] Unknown:
And before we get too much farther in the conversation, I wanna dig a bit into what software complexity is and some of the different ways that it can be measured and why developers should care about it at all. Mhmm. Yeah. Great. So so I guess software complexity has been been researched
[00:05:08] Unknown:
for a long time. Some of the metrics that we actually use in Wiley, and some of the ones that are used in the broader community were actually the first papers were done in the 19 seventies. So this is before Python, I guess around the advent of, languages like c, but also a lot of people coding in, assembler for example. So basically, the research was done to figure out how much cost there is in in maintaining code of a certain size or length or complexity. So a whole bunch of computer scientists, some of those, for example, such as McCabe, Halstead or another 1, basically write a series of papers looking at complexity and trying to estimate, how complexity could be calculated by by looking at the code. So in in terms of the metrics, I guess, the the crudest is, lines of code. I don't know if you're familiar with that, Tobias.
[00:06:01] Unknown:
Yeah. Line lines of code can definitely contribute to the overall maintenance burden of a piece of software because of just trying to keep everything in your head, and that's usually where you would start to break something apart into modules so that you can compartmentalize some of the logic functions. But it's definitely not necessarily a determining factor as to how complex or how difficult to understand the code is in and of itself. Yeah. Definitely. So I guess that's that's the kind of the crudest 1.
[00:06:29] Unknown:
Lines of code is actually used as 1 of the base metrics in a lot of the other calculations. But, obviously, trying to reduce lines of code doesn't always give you the right behaviors and encourage the right style in Python. I mean, you for example, you are not really supposed to compound statements together in Python even though you can. Spreading things out under, I guess, the recommended line length is really to do with making sure that the code is nicely spread out and readable. So lines of code was kind of useful just to say, okay, is this project a 1, 000 lines or is it 10, 000 or a 1000000 lines of code? At least that gives you some sort of vague idea about how big it is, but not necessarily how complicated.
[00:07:04] Unknown:
And I know that 1 of the tools is named after McCabe that is popular in the Python community, and that's for measuring something called cyclomatic complexity. And in my experience with it, it's generally based on the level of nesting with the logical flow of the program. So if you have an if block and then a for loop and then a while loop and then another if block, then that's gonna be more complex than if you flatten that out and either break it into separate function calls or just figure out a way to make the code more procedural and reduce the number of nested iterations to it? Yes.
[00:07:43] Unknown:
Like, a McCabe metrics, so it's the McCabe tool is really useful as a library. There's another another project called Radon, which basically provides a lot of these base metrics. It does lines of code, cyclomatic complexity, and another 1 called the maintainability index, which we can talk about in a second, which is a bit more complicated. But, yeah, cyclomatic complexity is really looking at the number of potential branches there are in a in a in a code base. So for example, if you start at 1, if you add an if block, or an if statement, then that plus is 1 to the number. If you add a while loop or a for loop, that plus is 1. 4, for example, because Python, you can actually do 4 or something and then else.
So even if you don't have a separate branch, it does add 1 to the complexity. So the idea is that flat is better than nested, which hopefully, some of the listeners have heard that phrase before. But encouraging, I guess, flat code is better for simplicity and maintainability,
[00:08:37] Unknown:
but it also makes it a lot easier to read. And so in Wiley, what are the main factors that you're using for being able to compute a complexity score? So
[00:08:49] Unknown:
in in Wiley, what you'll get is, you basically run the tools, the command line app. You give it the target folder where your source code is, and then you I guess, which metrics you want to collect. So by default, it will look at raw metrics, like lines of code, source lines of code, how many lines of comments you have, and how many modules you have, things like that. Just some basic metrics. And then it will also look at cyclomatic complexity and basically calculate the cyclomatic complexity of each module, sub module, function or class across across your code base. And and then another 1 is, the Halstead metrics. So the Halsted metrics basically build on some of the concepts that are in cyclomatic complexity, but they look at more to do with sort of the volume of the code. So lines of code, I guess, is is the most crude measure, but there are some measures that you have in house, Ted. For example, 1 of those is the number of, operators that you you've used in your code. Things like if, for example, or a less than sign, is an operator, and then also the, sorry, as an operand. And then it would look at the number of variables and other things you've used in the in the code. So you can calculate things like the what's called the vocabulary metric, which is really to do with the number of, names or variables you've used across the code. If that number is higher than it needs to be, it means you've used, you know, tons of temporary variables and that can, you know, obviously consume a lot more memory and there's a whole bunch of other side effects.
And another 1 is to do with the volume, which looks at the number of operators and operands in the code base and then uses that as a way to calculate basically the size of the code if you look at it in terms of it's almost like calculating the size of the the AST.
[00:10:27] Unknown:
And with all these tools in Python, it's definitely useful for being able to quickly get some feedback in your development process of how complex the code is. And because of the interactive nature of Python development, you can even get this information as you're writing the code, particularly if you have these tools integrated with whatever editing environment you're using. But I'm also curious what experience you have in terms of using these complexity measurements in other language environments and some of the other tooling that's available, particularly for projects that might span multiple languages?
[00:11:05] Unknown:
Yeah. So my main experience has been in in csharp.net. So it's a very different language to Python, more similar to Java if anyone, hasn't worked with it before. And cyclomatic complexity is is really important in c sharp as are some of the other metrics. And where I've used it before is working on, really large code bases and basically trying to identify, I guess, some of the hot spots, in the code base, like where are the really complicated bits sitting, and how is that trending over time, because it becomes useful if you want to sit down and do a refactoring an anti pattern such as, like, a god class, for example, and you can see that through the complexity measures. So that's an indicator that you might wanna go and some of the code.
So, really, that's what those what those types of tools are used for. And as you mentioned, having the historical
[00:12:01] Unknown:
of how the complexity of your application evolves over time is a valuable way to determine what are some of the potential risk factors within this code base. And I know that you mentioned that Wyle has the capacity to traverse the git history of the repository for being able to compute that historical metrics. I'm wondering if you can talk about some of the ways that that can signal useful areas of focus and identify potential
[00:12:31] Unknown:
problem areas within your code base. Yeah. So, basically, what happens when you run Wyle for the first time is it will, first of all, it checks that you've committed all your files in Git. Otherwise, you could lose them. So it basically runs off, what I'm assuming is probably, the head, in your in your Git repository or if you checked out a particular branch, then it can iterate through the branch. And it will basically go and look at the history of commits, go and check out each 1 in sequence, working from the the newest backwards, and then go and run all the metrics across the code base, and then it stores them in a in a small data, folder. So you basically got a cache, locally that has got all the metrics across the code archived against the different commits, but also against the commit it will record, when that commit was, who made it, and what the message was when they committed it, so what the git message was. So that's it's basically useful because, Wylea gives you a number of commands to go and graph any of those metrics. So you could pick a particular folder, for example, or a particular module and say, show me how lines of code compares against a cyclomatic complexity for this particular module, and it will go and show you that as a graph, with a line representing those 2 those 2 metrics, and it will each data point will basically be a git commit, and the, the bottom axis will be, so the x axis will actually be the the date that commit was done. So you can kind of look at it as a historical perspective, and you can hover over each data point, and it'll show you the get message, and the author. So you can see if there was a sudden spike in complexity then why that was. So that's kind of, I guess, historic historical perspective.
The other way of using it is to install it as a pre commit hook. So if anyone hasn't seen pre commit hooks before, there are projects Python project which you can install into a particular Git repository, and Wyle can be used as a pre commit hook. So if you run if you make some changes to your code and you run git commit, it will actually run Wyle before it completes the commit process. And on the screen, it will show you the basically a diff in any of the metrics. So if anything has changed across the code base in terms of the size, the complexity, it will give you a break down. So, for example, if you made a change, added a new feature, and you did that in a particular class or function, then when you run git commit, it's gonna pop up with a message saying, okay. The the complexity of this module has changed by 5, the size of that module has reduced by this number or the this function has changed by,
[00:15:00] Unknown:
this metric. So it'll basically give you a a warning about how your changes are impacting the complexity, I guess, and the maintain maintainability of the project. And is that a blocking pre commit hook, or do you just use that as an informative measure so that you know before you push it to the source repository whether you want to take another pass at it? Yeah. It's not a blocking repository whether you want to take another pass at it? Yeah. It's not a blocking 1. I have thought about introducing basically a configuration where you can say, don't
[00:15:28] Unknown:
block it if it in in increases the complexity by a certain amount. But I could just think of so many edge cases where that's just gonna annoy people.
[00:15:36] Unknown:
So I left left that feature out. Yeah. It's definitely a very subjective measure, and particularly if you're willingly making the trade off of increasing the level of complexity in order to be able to implement a feature because it is inherently complex or just because it's the easiest way to get it working right now when you have the intent of going back afterwards to then refactor it maybe once other features are incorporated. And so you were saying that when you're traversing the history and generating this historical perspective of the complexity over time, it's creating these data files in a subfolder to the project.
And 1 thing that I was curious about there is how you implement any sort of collaborative view of that historical complexity. If that's something that you would commit as an artifact with the repository or a way of integrating it into the overall continuous integration life cycle of the project, just just ways of sharing that view of the code and making sure that it's a team effort to maintain the complexity and keep it down and maintain the overall health of the project? Yeah. So this this is a bit of a tricky 1. So
[00:16:51] Unknown:
at the moment, the the way that you can do it, it by default, if you don't specify, it will put all the data, into your home folder in a in a dotwiley directory. Or you can configure it to go in a in another directory. So for example, you could put it in a shared folder, or you could put it somewhere else, which is a git a separate git project. The challenge with having it in the same git repository is that, basically, it's checking out each revision, and it's changing the folder at the same time. So, basically, Git is not gonna be happy because you're gonna have, you know, unstaged changes within your working copy. So it won't let you actually go and check out different copies, without losing the files. So, basically, the wily cache has to be in a separate, either a separate git project or a separate directory.
It can't be in the same 1, at the moment unless you just ignore it completely. So if you if you add the folder to dot get ignored, then then it will work fine. But like I said, by default, it'll put it in your local directory.
[00:17:47] Unknown:
And do you have any potential thoughts of maybe adding some pluggability to the data storage mechanism for those metrics so that perhaps you could have it interact with a shared remote database back end so that you can update the complexity metrics in a sort of shared environment that everybody can view in sort sort of a dashboard?
[00:18:09] Unknown:
Yeah. I mean, there are there are services that do this already. Code Climate would be 1. So I know Code Climate uses some of the similar, some of the similar metrics, and it has a a GUI. So it's I think it's free for open source projects. Really kind of where I wanted to work on widely was, for projects which are not open source or for whatever reason they're proprietary because that does make up the bulk of, of of Python projects are not, all on GitHub. I mean, I know there are some big projects on there, but, you know, lots of people work for, corporates, and they can't really share, this information publicly, especially when it comes to sensitive stuff like bugs and complexity, you know, which could potentially get into the hands of some of their competitors. So it was really kind of designed as a as an offline tool originally.
But, yeah, I can definitely think about ways that it could be could be extended. I guess the 1 way I'm thinking about next is the is 1 thing, but, really, another thing I'm interested in is how the complexity also impacts the performance. And when I've been looking at tools for profiling, Python applications, there are there's a there's a few really good command line tools that can create the data. And there's a web service that the, the Python performance website uses, which is which is quite cool. You can give it a whole bunch of benchmarks to run, but really there's not a there's not a tool that can basically go and give you a a good breakdown of the, I guess, what is consuming all the CPU usage in your in your particular app, but do that historically. So something I'm thinking about next is is how Wiley could potentially do that. And another thing that I was just thinking about is that because of the fact that Wily is able to do static analysis of the code and doesn't actually require execution of it, it simplifies some of the potential issues that could arise of traversing the historical record of the repository
[00:19:58] Unknown:
in terms of needing to maintain the dependencies as they shift over time, whether by adding or removing them or changing the version numbers. So because you're just purely looking at the code as it sits on disk and maybe parsing the abstract syntax tree without actually having to necessarily pull in all of the dependencies, I imagine that that simplifies the overall effort required. Yeah. Definitely. So that that that would make it extremely complicated. So all the benchmarking tools
[00:20:26] Unknown:
today rely on having, the profile hooks in the the CPython runtime or whichever runtime you're using. And they basically look at the time to execute certain, statements and functions across across the AST. So what I was thinking about is how you could do that statically. So without actually running the code, how could you come up with some sort of performance profile? So something I was thinking about is associating a cost, basically with each, node type in the AST. So and and if we'd have a have 1 certain cost, a list comprehension, for example, would have another type of cost. But within that, then you'd associate costs with, comparing strings, for example, looking at floating point numbers.
Calling functions, has a cost as we know in as in Python. So basically giving you a metric that looks at some of the known costs in terms of the CPU cycles potentially for each of the operations without actually running the code. Obviously, it's not gonna be hugely accurate, but it's better than it's better than nothing. Yeah. And if you look at historically a project, going and running that project a 100 times is fine probably on a little, little project if it's just a kit CLI. But if it's a big project that needs databases and APIs and everything set up, I I think it'd just be impossible to go and historically do that. Yeah. Absolutely. And I think that what you were pointing out of having these
[00:21:44] Unknown:
statically defined sort of cost metrics that are in some ways just abstract and arbitrary, but important relative to each other would still give you some useful signals as to areas that might be worth exploring, but definitely worth additional validation. But just a way of being able to focus the effort rather than having to take everything as a potential cost saving measure. Yeah. And other than the complexity and some of the potential benchmarking or profiling
[00:22:15] Unknown:
tools, what are some of the other useful metrics that you found for being able to track and maintain the overall health of the software project? Yeah. I mean, there's a if it is a collaborative project, then something else that's useful is looking at the the number of contributors to the number of maintainers. Is there a single person who seems to do, who has to solve all the hard problems? So, you know, is there basically, like, a not a single point of failure, but is there a lot of knowledge wrapped up in in in 1 individual person on your project? So there's a whole bunch of, tools that do that for open source projects, some online ones, which I'll I'll share with you, which also look at the number of maintainers you've got, how many pea external people come in and contribute, which is which is really helpful. Another, I guess, is looking at the the metrics in terms of the number of bugs reported against different components in the code base. So this is something that I know a lot of large projects do.
Internally, in terms of their bug reporting tools, they try and drill down and associate those with, components or modules in the code. So you could basically look back and say, you know, here's a heat map against where we get bugs reported across, across our code base. Now I'll always remember there's a graph that showed the number of bugs reported against the MySQL project. And if you looked at it historically, it basically had a direct correlation with the number of users. So those metrics you need to be really cautious of because, basically, it's to do with it's to do with footfall. So if if you've got a lot of people using the project, then it's a lot more that they're gonna find issues. So having a large number of users typically means they're gonna explore every possible different scenario that can happen, every different state that could happen with your application. So having a low number of bugs doesn't always mean that your code is really stable.
Sometimes it just means that, very few people are using it. And once you've identified
[00:24:03] Unknown:
some potential problem areas in your code base, whether from finding locations that have a high amount of complexity or issues with potential performance or IO operations. What are some of the possible approaches for being able to improve the overall health of the code base and reduce the complexity and improve performance and some of the ways to ensure that you are doing it safely?
[00:24:29] Unknown:
Yeah. So it depends on the the metric you're looking at. So if if something has a really high cyclomatic complexity, it typically means that there is a lot of nesting in in that piece of code. So you would look at flattening the function, and there's a whole bunch of techniques you can use use to do that. You can return early, for example, and you can use different styles in Python. Or if you're working with nested dictionaries, for example, that tends to introduce a lot of nesting in the in the code because you're checking whether these attributes exist and you're looking at indexes and you're flipping through things. But there's actually a lot of other tools you can use to work with, nested datasets.
So, I've talked about a few of those. GLOM is 1 that's pretty cool. Another 1 is called a s p a s k ASQ. Sorry. ASQ. ASQ is another tool. They're basically sort of like querying tools for dictionaries. So you can put everything in a in a single statement, and it'll give you back give you back some of the results. So, yeah, those are those are other ways you can basically reduce nesting and complexity by querying data, a lot better in the code base. And then if you if you got another metric which says that the volume of code is too high, you know, this particular module is just enormous. You know, it's 7, 000 lines of code. You've got 25 classes defined in it. So basically, you just lumped everything in this mega module, in your project, which makes it really difficult for people to to work on. So that's, I guess, where the refactoring tools come in. So you can't really just break up a module and just shift stuff in all over the place just by copying and pasting. I think you're gonna introduce, a lot of bugs, in terms of the way you can do imports in in Python, and you don't tend to compile up front in Python either. So you're only gonna discover that there's some module expecting to import a class from something you've moved, when it's too late.
[00:26:12] Unknown:
So that's where some of the refactoring tools come into play. And in terms of the refactoring tools that you have dealt with, which of them have you found to be most useful and most user friendly?
[00:26:22] Unknown:
The most powerful 1 I've worked with is built into PyCharm. So when you load a software project into PyCharm, it basically, you know, understands everything within the project, what modules there are, the names, classes, functions. So it's not just a a dumb code editor that you're just changing the files on the file system. Basically, in almost interprets the entire project, and also loads up the virtual environments for you and understands the libraries and the dependencies. And that's really important when you do refactoring because, if you do something as simple as renaming a function, then knowing everything that calls that function is really hard because, you know, for ex you could just do a quick search in the project to just look for that function name and see everyone who's called it. But, for example, people could do renames on imports, and there's a whole bunch of other times where people could call that function, as an object reference or use it as a wrapper, for example. So it's actually quite tricky to find references to things, with just a dumb code editor. So Python PyCharm, sorry, has some really advanced tools for refactoring that are that are great. So if you wanna change signatures, move things around, rename stuff, it'll actually go and look through the whole project, and figure out all of the changes that would need to be made, and it will stage those for you and show them, and then you can approve each 1 in in a sequence. And another thing to keep in mind when you're doing that too is if there are any external
[00:27:42] Unknown:
users of the code that you're modifying. So if you're working on a library project or working on a service that exposes an API that, other projects might be consuming, then you need to be cognizant of the fact that you might be potentially breaking the contract that they're expecting when they're interacting with your software. So making sure that you have a deprecation process or a means of advertising the fact that the API or the function name or the function signature are changing so that anyone else can make the necessary modifications in their code as well. Yeah. Definitely. It's, it's tricky to manage, with Python because, you know, nothing is really private in Python either, so people could have could have used your codes and code in all sorts of weird and wonderful ways. But I guess if you've got a documented,
[00:28:26] Unknown:
API contract on on the library, then it should be
[00:28:29] Unknown:
easier change in a smooth way. And what are some of the aspects of developer oriented tooling that you have found to be most important as you're building various projects such as Wiley or any other tooling that you have had a hand in building and maintaining?
[00:28:46] Unknown:
Yeah. I'm I I guess there's some of the complexity tools they're useful. I also care a lot about the the layouts, and the documentation. So PyDoc style is by the Python, QA team. So this is where other projects like Pylint live for example. So both Pydoc style and Pylint I find really useful. Pylint I find useful because it will warn you about typical bad practices. For example, if you create a class with only 1 method other than the init method, then it probably just should just be a function. So pilot will go and warn you against, all those sorts of things. Or if you've got a function that has, like, 50 parameters, 50 arguments in it, then it will warn you against stuff like that as well. So, I find pilot really useful in giving you warnings, PyDoc style being really useful in terms of looking at the, the docstrings and your functions and methods, and making sure that they're, they're laid out properly and they follow a good a good style. And then on top of that, I've been using black recently as a code formatter, which I found really useful. And again, using it as a pre commit hook. So I mentioned the pre commit, project earlier and using Wiley as a pre commit, but basically using black in the same way. So I don't even need to think about running it. It just does it when I do a commit and then if there are any changes, then it'll tell me that it's it's been updated and then I'll run the commit again and they're and they're all staged.
[00:30:08] Unknown:
And in terms of the way that you approach the design of your own developer tooling, what are some of the best practices that you have found in terms of making sure that it is easy to use and intuitive and makes sense for developers rather than increasing the burden of
[00:30:27] Unknown:
the of their work? Oh, yeah. Yes. They're probably the 1 of the biggest things is consistency in terms of how you name things, the names of the parameters, the sequences in which they go. So if you're building a command line application, for example, making sure things are in the same order across different commands, the output looks similar. I just see so many tools where even even in other languages, for example, I don't know if you've ever worked with, PHP, and I I think you have in the past. But whether or not it's needle or haystack or haystack or needle depends on the function. And and when you're doing string searches.
Some functions have underscores, some of them don't. It can just get really confusing, and it's inconsistent. I know that's changed, in newer versions, but I think consistency is probably the main thing when you're designing a developer oriented tool because people need to memorize these things. They can't look it up in the documentation every time. But, yeah, I can definitely agree to having consistency in terms of the overall
[00:31:22] Unknown:
implementation and being consistent with the broader sort of community standards and idioms so that it's not introducing a lot of cognitive overhead of, for instance, looking at Python functions where they're using camel case so that you're confused as to whether you're in your JavaScript code or your Python code or working in, you know, Java, which uses, you know, maybe Pascal casing. So maintaining community standards and expectations so that there is less effort necessary for people to be able to get comfortable and familiar with your tools. Definitely. And so what do you have planned for the future of Wiley or any other tools that you might have in mind to help the your team and, and help yourself in the projects that you're working on for building and producing healthy and maintainable software?
[00:32:12] Unknown:
Yeah. So I think, it's really come back to the performance point. It's, if if WALL E can give you some kind of CPU cost or operation cost of a project just by looking at the AST. So being able to do that statically, is something that I'd be really, really interested in doing. And then the other thing is looking at the reporting functionality. So at the moment, it gives you a, you know, a basic line chart, and you control the axis and stuff like that. But in future, I'd like the ability to have a more interactive, style. So we have to drill into particular modules, and to see it in something called a sunburst, put it often called a sunburst chart, which is basically it's almost like a series of pie charts, stacked on top of each other.
But it's a really good way of visualising a software project because you'd see in the innermost ring, you know, the 3 or 4 modules you have in there. And then on the outer rings, as you go through the project, you can see all the different levels. So, yeah, I recommend, looking at those. But that's really what I'd like to do is improve the reporting functionality, and come up with some better, metrics around performance.
[00:33:12] Unknown:
And are there any other aspects of your work on Wiley or code complexity
[00:33:18] Unknown:
or over or overall software maintainability that you'd like to discuss further before we close out the show? Yeah. The other I guess the other thing I've been looking at is Visual Studio Code, which has got a couple of refactoring tools in it. But, just discussing with the developers of that at the moment, potentially
[00:33:34] Unknown:
improving those, adding some more features and functions, which I think would be really cool. So that's the other thing I'm I'm kind of looking at at the moment. And for anybody who wants to follow along with you and keep up to date with the work you're doing and get in touch, I'll have you add your preferred contact information to the show notes. And so with that, I'll move on to the pics. And your mention of the sunburst chart reminded me of a tool that I've used called Baobab, which is a, it's a Linux tool for being able to scan and find the various relative sizes of directories on disk so you can figure out which folders are taking up the most space on your hard drive for being able to clean things out and, keep keep your hard disk, you know, available. And then the other pick that I have is a TV show that I started watching recently with my wife called Impractical Jokers, which is just about, it's it's a group of friends who put each other up to various sort of embarrassing or awkward situations, and it's a competition to see who can follow through on the on the on the various scenarios, and, it's pretty funny.
It's painfully awkward at times, but it's it's been pretty entertaining, so it's it's worth a good laugh. And so with that, I'll pass it to you, Anthony. Do you have any picks this week? Yeah. I've got a couple.
[00:34:52] Unknown:
So on the TV side, I've been watching Line of Duty recently, which is, a UK kind of crime drama, but it looks like corruption, in the police force. But it's, yeah, really kind of gripping, thrilling episode. So that's I'd definitely recommend that. And then the other 1 is a podcast. So if anyone out there has kids, if anyone out there has girls, then there's a podcast called Fierce Girls and each 1 is a story about, a famous woman in actually, they're all Australian, but they're basically stories of our 1st female prime minister, 1 of the first female, surfboarders for example. So they're really good stories for the younger ones,
[00:35:31] Unknown:
and they talk about all these amazing women over history. Alright. Well, thank you very much for those pics and for taking the time today to discuss the work you've been doing on Wiley and building developer tooling and maintaining healthy projects. So I appreciate that, and I hope you enjoy the rest of your day. Thanks. Great to see you again.
Introduction to Anthony Shaw and His Work
Anthony's Journey into Python
Overview of the Wiley Project
Understanding Software Complexity
Cyclomatic Complexity and Its Importance
Metrics Used in Wiley
Complexity Measurements in Other Languages
Historical Complexity Metrics
Collaborative Views and CI Integration
Potential Extensions and Performance Profiling
Useful Metrics for Software Health
Improving Code Health and Reducing Complexity
Refactoring Tools and Techniques
Maintaining API Contracts
Developer Tooling Best Practices
Future Plans for Wiley
Closing Remarks and Picks