Laboratory: Safer Refactoring with Joe Alcorn

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. I would like to thank everyone who supports the show on Patreon.

Your contributions help to make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy, it, so you should check out linode@podcast init.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app. And now you can deliver your work to your users even faster with the newly upgraded 200 gigabit network in all of their data centers. If you're tired of cobbling together your deployment pipeline, then it's time to try out GoCD,

the open source continuous delivery platform built by the people at Thoughtworks who wrote the book about it. With Go CD, you get complete visibility into the life cycle of your software from 1 location.

To download it now, go to podcastinnit.com/gocd.

Professional support and enterprise plug ins are available for added peace of mind. You can visit the site at podcastinit.com

to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions, I would love to hear them. You can reach me on Twitter at podcastinit or email me at host@podcastinit.com.

To help other people find the show, please leave a review on Itunes or Google Play Music. Tell your friends and coworkers and share it on social media. If you work with data or want to learn more about how the projects you heard

about on the show get used in the real world, then join me at the Open Data Science Conference happening in Boston from May 1st through 4th. It has become 1 of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective.

To save 60% off your tickets, go to podcastinit.com

/odscdasheastdash2018

and register.

Your host as usual is Tobias Macy. And today, I'm interviewing Joe Alcorn about using laboratory as a safety net for your refactoring.

So Joe, could you start by introducing yourself?

Yeah.

Hi. I'm Joe and I work as a software engineer,

at a company called Marvel.

We're based in London. And I've been running Python

in some form for

about 5 or 6 years. And do you remember how you first got introduced to Python?

Yeah. So

about 5 or 6 years ago, I

got involved with a bunch of people that were running a Minecraft server, basically.

And,

they would do

lots of

development here and there for the game or for the server.

And a lot of that was based in Java, but they also did a lot of Python as well.

And so from hanging around with them, I basically

heard about Python and kind of got involved in a little bit of Python

and a little bit of programming in general, actually. And that was how

I

so sorry. That was how I first went into programming,

and Python was my first language. So that was all down to that, Minecraft server.

And as I mentioned at the introduction,

1 of the projects that you've been working on is something called laboratory. So I'm wondering if you can just describe what the project is and what it does and what motivated you to start it in the first place.

So essentially,

laboratory is a library that is trying to help you gain confidence when you're refactoring. The way it does this is basically

by

taking the old codes

or so you've got some existing code. Right? Let's call this your control block. And then

you

want to refactor it in some way, but you want to make sure it works exactly as it does before. There you know,

It returns the same thing returns the same thing. It doesn't run too slowly or it doesn't break in subtle ways. So you take that block and then you write some new codes. And then

to really make sure that everything works

in production

when,

you know, you've you've had bugs,

and the bugs have led to bugs in your database, basically, because the data's a bit broken. And then you don't know how the new code is gonna react to if the new code is really gonna account for all those corner cases,

all those edge cases, and those

subtle little bugs that can pop up. So what laboratory

does is essentially

runs the old codes, runs the new codes,

compares them,

make sure the new code can't blow up your old the

the main

execution. You know, you don't wanna let any exception slip out that way.

And then you can

basically be certain that your code doesn't

your new code works as it should. And what sorry. What motivated me to start the project was that I read a blog post about GitHub

and how they

used this technique to refactor some of their complex permissions codes. And so at the time,

we were doing something similar at work

and

this is something that could obviously be very useful. The only problem was the library they built was in Ruby

and we use Python.

So that's that's that was the motivation.

And given that you had this existing project

to use as inspiration, I'm wondering

how much of that you looked at to

inspire the

implementation of laboratory and how much of it you had to figure out yourself due to the differences in the target languages and their sort of idiomatic paradigms?

So, yeah, originally, I tried to keep the implementation quite close, and that was mostly because I wanted to get up and running quickly. But it led to an API that was not super

super nice or super idiomatic, I think. So about, see, about the start of this year, I

decided to rework it a little bit. This is it's just less clever now, which is which is normally a good thing for APIs.

And when you were doing that refactoring of the laboratory code, did you happen to use laboratory itself to ensure that the operation was staying consistent?

No. Sadly not. Because I was I was making breaking changes anyway. Right? Yeah. I don't know. Actually, in fact,

if I'm thinking about it, using laboratory for

a library

would probably not work super well. But if you're,

say, running a service,

it you you know, where you have control over

you can gather all the results and whatnot.

That's when it'd really, really be handy. And what have been some of the most challenging aspects of building and maintaining the project?

So

1 thing that I realized recently

is that the docu the docs just weren't very good. Like, some there were some undocumented features,

and it was just kind of hard to there was no proper quick start guide. There was no no easy way to get up and running super quickly. So I I think that's always an issue because

a lot of the times, I just wanna kind of build something and throw it out there. But if you actually want people to use it or if you want to be able to use it yourself in 6 months when you've forgotten how to use it, you need documentation.

So that's been 1 of the 1 of the things. But in terms of

maintenance and building, I wouldn't say it's in too much trouble. See, because, like, I rich originally pushed it out over, like, a weekend

with the original API that I was talking about. And then maintenance has not been very much since then because it hasn't changed a lot until this more recent

API change. Yeah. It definitely seems like the kind of project that has the potential for being essentially done after a certain point because it's a fairly limited scope

of operation.

So once you get it to be

usable

and consistent, then you don't necessarily need to keep iterating on it and adding additional features because that might add to the overhead of the runtime

and

create adverse impacts on the production system that you're using it for because of adding too many bells and whistles to it where you really want to make sure that it's just doing 1 thing and doing it well.

Yeah. Precisely.

I think that's definitely,

a good philosophy to follow.

And for somebody who's interested in using laboratory

in 1 of their existing projects to do a substantial refactor, I'm wondering

that users should watch out for while they're doing that implementation?

So, essentially,

to get up and running, you would obviously need to install the library, which you can do from PIP.

And then all you need to do is make an instance of a class, and you tell it, this is my control block. This is the good code that I know works. And you say, this is my new code that

might work, but

we'll see. And then you can call

a method to start the experiment, and that's

that's really the simplest way. But the hard part is obviously

getting your

new code to work right. So, I mean, Liberace really tries to stay out of the way. And like you said, it's very limited in scope. There's not so many bells and whistles.

Yeah. So there are a few pitfalls

or caveats.

So because

you're

going to do more work, you're gonna run the old code and the new code. Obviously, it's gonna take longer. And if that really matters to your application, then

you should

find another way or you should use

you should only run your experiment 10% of the time, for example, or 1% of the time or, you know, whichever makes sense. So there's that. But also in a web application, for example, if you are running an experiment and your experiment is writing data in your control block, obviously, your candidate is gonna have to do that too.

And that means there's gonna be, well, multiple rights

or or changes in the cash or something. If your

candidate affects any state like that or has any side effects, then that could impact your control block,

which means

your real users are seeing real failures. So you do have to be careful. You have to be mindful when you are designing an experiment. Generally,

if you're just fetching stuff from a database, for example,

or just reading state, it'll be fine

if you can take the performance there, of course. And so a couple questions I have from there is 1 being when you mentioned the

overhead of running both code blocks at the same time. I'm wondering if you have looked at doing anything to parallelize

the different experiments so that you can potentially

spread them out across multiple threads or multiple cores to

reduce the impact of that runtime overhead?

Yeah. So it's not something I've actually looked into,

but part of the reasoning behind

changing the API

at the start of the year was so that this would be possible.

Because, essentially, in the old API, you would have to

use the context processor

and then

call your function yourself.

But in the new 1, laboratory actually does that for you.

So you part you tell it your function, you tell it your arguments,

and then

laboratory takes care of running them. And that means that we should be able to just see experiment to run them

in,

like, a multi threaded way or a multiprocessed way or however you would like to do it. I was thinking of looking into it and adding a contract module or something to use asyncio

and Python 3 or something,

but it's it's not something I've actually looked into much yet. And I've not done much

parallel

Python programming before either.

And the other thing you mentioned

is that

by performing

IO operations or creating side effects

in the code block that you're trying to refactor, it can complicate

the ability of laboratory to run these experiments because the different

pieces of code might stomp on each other and dirty the overall state of the system. So I'm wondering if you have found any useful strategies to allow for either

stubbing out the external interactions

while still executing the rest of the logic flow so that the experiment can run through

most of the process and then maybe just rely on the existing data or even having multiple

sort of output systems, either running 2 databases in parallel and having different configuration options for the experiments or anything like that?

Yeah.

It's not something I've put much thought into because I've I've just tried to avoid it, really. I think it would depend a lot

on

the

specific application,

perhaps.

I I think it's hard to give a good answer anyway.

Something like that, it's definitely very difficult

to speak to in any sort of broad sense because everybody's system is very context dependent, and so trying to provide a universal approach is ultimately going to fall flat for somebody. So it's best to sort of leave that up to the implementation of each person in their own,

project.

And 1 of the

necessary bits to ensure that you're getting value out of laboratory is being able to

retrieve the

results of the different experiments so that you can compare the control against the experiments. So I'm wondering if you can talk a bit about the way the laboratory allows for

retrieving that information

and

some workflows that you have used for,

performing those comparisons to ensure that the candidate code is doing what you want it to?

Yeah.

So, essentially, when when you've run both your bits of code and

you have seen if they they match or not, you can publish this data. And then by publishing it, you can obviously

try and narrow or

hone in on what the what's causing the mismatches. So laboratory doesn't do this for you, because, again, this is 1 of those things that is is really gonna

be so different on each individual system or in each different deployment. You know? But what I normally do is

I use I send a count of

how many things have matched and how many things have not matched to graphite and then make some graphs out of that. Then you can see exactly how many, you know, how many times it's going wrong and how how many times it's working. And then you can also take the actual return values of

the functions you defined. So you're controlling your candidate and then you can maybe pickle them or something, put them into Reddit or write them to disk

or wherever you like and then you can come in later and inspect them and see why they don't match or how they don't match let's say. And you can see any context you've set around them. And then

that way you can use that data you've got to go and make changes to your

codes. And then when you push those out and deploy those,

you'll hopefully see

the effect in your graphs.

And then it's just a case of iterating then because what you've basically done is establish a feedback loop and that will ideally help you figure out how to completely fix it. Or maybe you'll figure out that there are too many edge cases and you need to rethink an approach entirely or something like that. And you mentioned,

at the beginning that library code might not necessarily

be a good candidate for the type of project that you would use laboratory on. So I'm wondering if you can

just provide some examples of some of the types of code that you use laboratory

on for doing these complicated refactorings

and if there have been any cases where you wanted to be able to use laboratory, but because of the structure of the code or the nature of the system, it wasn't really a good fit. Yeah. So 1 example of a time I've used it, we were upgrading or updating rather how team members authenticate

against some projects that belonged to their team. And we had the old method, which we obviously

we trusted to work, but it was there was quite a lot of logic and there were lots of lots of different branches, and it was kind of hard to follow.

So we couldn't you know, it's hard to be sure that

our new code actually matched up. And that was a perfect

that's kind of a perfect fit because

you you don't know or you don't have much confidence in your glucose because you can't. And, also, it's not something that you wanna get wrong because if you do, other people might be able to access data they shouldn't. So that was really a a perfect fit for the library.

And so given the fact that laboratory is intended to be run-in production and provide a certain measure of safety for conducting these experiments. I'm wondering if there are any particular methods that you've used to ensure that users of the library aren't going to suffer from a drastic increase in,

either operational overhead or unintended aberrations in the functionality of their software

or,

sort of unexpected exceptions that might cause some sort of termination of a,

existing routine?

Yeah. So the main way I've tried to do that is by basically keeping it as small as possible. It doesn't do a whole lot,

but it is quite powerful

in that. So there's obviously just not doing much doesn't it means there's less surface area for bugs, obviously.

But then also

development processes.

I've

used lots of tech,

and things like that. But in terms of, like, the actual

code at runtime, it really just catches any exceptions that your control blocks raise.

Sorry. No. Your control blocks. It catches any exceptions your candidate blocks raise. And that way they can't interrupt

your good code. Now it doesn't catch doesn't stop the control blocks from catch from raising exceptions because that is the expected behavior or that is the correct behavior,

whether that is actually a bug or not. And are there any

new features or improvements that you have planned for future releases of laboratory?

The main thing would be to

fix the documentation

or not fix, but fix it up a little bit because at the minute, it's kinda split between

the read me

and read the docs.

So I would like

to move that over and make sure everything is documented.

Other than that, I would like to look into,

what we were talking about earlier, running the

the them in parallel, the old and the new codes. So I think that would be fun to look into.

But other than that, I I really think it's best to keep it simple. Oh, actually so at 1 point, I was working on

a little helper library

to help you store the results

in Redis or

send them to your stats or metrics collection. But, I kind of paused on that.

But I think it'd be interesting to pick up again

if there's some time some free time.

Yeah. I think having some sort of a plug in interface, like you mentioned earlier, for being able to

have

a way to

create these modules

for sending to different metrics back ends or logging out to different systems might be a useful addition to

the laboratory project where you don't necessarily need to ship it in the core runtime that you currently have, but it would still provide

a broader set of functionality that people might be able to leverage or contribute to?

Yeah. Because I mean,

right now if you if you don't have

somewhere that you can send these metrics already,

then it's a lot of work to get started. But with something simple that could,

I don't know, so maybe I was thinking something that will collect them for

you in a very simple way

as well as ones that would send out to like your established

metrics collection service just to make it really easy to get up and running.

For anybody who wants to follow the work that you're up to or get in touch, I'll have you add your preferred contact information to the show notes

And, so then I'll move us into the pics. And for this week, I'm going to choose the Chronicles of Narnia,

the book series I've just started reading recently with my kids, and it's a lot of fun to revisit that.

And we also watched the movies that they made a little while ago of, 3 of those books. So for anybody who enjoys

short,

interesting fantasy read, it's definitely a lot of fun. So with that, I'll pass it to you, Joe. Do you have any picks this week?

Yeah. Actually,

I'm reading a really interesting book at the minute called Why We Sleep

and it's all about well, it's pretty terrifying if you don't sleep enough. It's all about how important sleep is and the effect that sleep deprivation can have on your mind and your body. It's quite interesting. Alright. Well, I really appreciate you taking the time to join me today.

Laboratory is definitely an interesting project and 1 that I'm sure I'll be able to take advantage of,

in the future and 1 that I occasionally wish I had had known about in the past. So thank you again for taking the time and for joining me today, and I hope you enjoy the rest of your day. Yeah. You too. Thank you. Thanks for having me.

The Python Podcast.init

Summary

Preface

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__

Summary

Preface

Interview

Keep In Touch

Picks

Links

The Python Podcast.init