Summary
Continuous integration systems are important for ensuring that you don’t release broken software. Some projects can benefit from simple, standardized platforms, but as you grow or factor in additional projects the complexity of checking your deployments grows. Zuul is a deployment automation and gating system that was built to power the complexities of OpenStack so it will grow and scale with you. In this episode Monty Taylor explains how he helped start Zuul, how it is designed for scale, and how you can start using it for your continuous delivery systems. He also discusses how Zuul has evolved and the directions it will take in the future.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at podcastinit.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Monty Taylor about Zuul, a platform that drives continuous integration, delivery, and deployment systems with a focus on project gating and interrelated projects.
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by explaining what Zuul is and how the project got started?
- How do you view Zuul in the broader landscape of CI/CD systems (e.g. GoCD, Jenkins, Travis, etc.)?
- What is the workflow for someone who is defining a pipeline in Zuul?
- How are the pipelines tested and promoted?
- One of the problems that are often encountered in CI/CD systems is the difficulty of testing changes locally. What kind of support is available in Zuul for that?
- Can you describe the project architecture?
- What aspects of the architecture enable it to scale to large projects and teams?
- How difficult would it be to swap the Ansible integration for another orchestration tool?
- What would be involved in adding support for additional version control systems?
- What are your plans for the future of the project?
Keep In Touch
Picks
- Tobias
- Monty
Links
- Red Hat
- Zuul
- OpenStack
- Jim Blair
- Perl
- SNPP
- Rackspace
- NASA
- Drizzle
- Sun Microsystems
- MySQL
- Continuous Integration
- Continuous Delivery
- Launchpad
- Bzr
- Jenkins
- Jess Frazelle
- Graphite
- StatsD
- graphite.openstack.org
- grafana.openstack.org
- subunit
- Ansible
- Helm
- Software Factory
- Gerrit
- Git
- Perforce
- Subversion
- Zookeeper
- Gearman
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app, you'll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 200 gigabit network, all controlled by a brand new API, you've got everything you need to scale. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute. Go to podcastthenit.com/gocd to learn more about their professional support services and enterprise add ons, and visit the site at podcastthenit.com to subscribe to the show, Sign up for the newsletter and read the show notes.
Your host as usual is Tobias Macy. And today, I'm interviewing Monty Taylor about Zuul, a platform that drives continuous integration, delivery, and deployment systems with a focus on project gating and interrelated projects. So, Monte, could you start by introducing yourself? Yeah. Hi. My name is Monty Taylor. I've been doing,
[00:01:06] Unknown:
actually in in thinking about this, beforehand, I I did the the quick, mental math, and I've been developing in Python for 18 years now, which makes me seem just really, really old and and crotchety, which comes out, at times, which is, kinda kinda crazy. Yeah. I work at Red Hat, in the office of the CTO at the moment, which has been a really great thing. And, in the general Python world, I I work on I work on this project that you may or may not have heard of called OpenStack. We do something about cloud software. You know, something something cloud, is basically how how my wife describes what I work on, and I've been doing that for 8 years now, which is a terrifyingly long amount of time to work on work on a a single effort. Like, I don't I can't think of any any things I've done for that consistently that long, in any other context. So I'm I'm probably useless for for the rest of the world. And the Zuul project we're gonna talk today, grew out of that work. I've been working on the the CI and developer infrastructure systems for for OpenStack since since we put the project together. So if you have the time, I can always tell you old grandpa stories about back in the day, what we did, why we did it, and and bore you senseless. It's it's all sorts of fun. And do you remember how you first got introduced to Python?
I do actually. It's it's and and it's it's strangely topically relevant to Zuul as well. Back in back in 1999, it's always great to start any story with back in 1999. I was in Raleigh, North Carolina, which I grew up in the Raleigh Durham area of North Carolina. And I I moved away, and I was back in town. And I have a I have a background action theater, lighting and sound production. And so I was going with, a friend of mine from from there to a party of other theater people. And he said, hey. There's this guy that's gonna be at the party. He's actually at his apartment named Jim Blair, and he also, does computer stuff. That is, of course, as we all know, terrifying when you're in a non non tech, non work environment for a a a friend in another context. Be like, hey. I've got this friend who does computers. You should talk to them because it never works out. Like, it's always always, like, you know, your your mom's friend's son who does something terrible or or whatever. And so I was like, great. I'm so excited to talk to to your to your computer friend, theater friend. And I got there, and it turned out that he was actually exceptionally smart and, and really good at computer things. And we we hit it off immediately, and I was a Perl programmer at the time. And he, he mocked me for for that life choice.
And and told me that I should, I should really look into Python as a development platform as, as he he found it better. And I was like, great. I will go I will go look at that. And I immediately started shifting, things that I was doing into into working on working on Python. The first thing I did for real was, I got a contract gig to to write a, Python library for the SNPP protocol, which is the simple network paging protocol, that somebody was gonna use for something for, for a thing. And, anyway, it's it's relevant to to Zuul because it turns out Jim and I still work together, and he's the, basically, the the founder of of Zuul as it is. So not only did that did that party introduce me to Python, also introduced me to, to to Jim. We've had a a long running, working relationship since then. So the 1 time when that that, oh my gosh, you've gotta meet this computer friend of mine has actually worked out and been anything other than awful and and, scary. So, also, there's a long time ago.
His his Python thing that he showed me, he's like, hey. You know, I wrote this thing in Python, and it was, and again, I have to preface this with 1999. He had, he had written a system for, listening to, to, to MP threes in his car, which wasn't really a thing, you had. And it, it was exceptionally strange because it, it synced with the database of music in his house over this new, networking protocol, called 80211 b, which he had gotten him so his hands on some some wireless router. So he'd drive the car up to to the parking lot, which is, the spot on the parking lot, which is close enough to the apartment, And it would it would, you know, register to find the the server and sync music down. And that was all written in Python. Right?
There was, like, just this giant computer in the trunk of the car, you know, like, had some, like, you know, relays connecting it to the battery. But that was, like, all totally, totally done in, in in Python. I was like, great. Wow. That's cool.
[00:06:02] Unknown:
The days when you have too much free time on your hands.
[00:06:06] Unknown:
I know. Right? It's like where it's like that's just and then, like, you know, several years later, as you just go to to Best Buy engine, like, we don't even buy MP 3 players anymore. Those are antiquated.
[00:06:18] Unknown:
And, so you've spent a fair amount of time working on Zuul, and you said that it got started fairly early on in your involvement with OpenStack. So can you discuss a bit about what the project is and how it got started? Yeah. Absolutely.
[00:06:33] Unknown:
So, I actually get to, get to really put on my on my grandpa hat here. So you haven't heard of OpenStack itself. It's a it's a Python based cloud operating system. That was originally started as a joint venture, between Rackspace and NASA, which is the closest to space travel I'm probably ever ever going to get. The NASA folks were always like, well, we we've got these satellites going to to to moons of Jupiter and, like, in 10 years, they're gonna spit back a bunch of data. We need to be able to collect all of that data, but we don't wanna just have servers running there. So it's they had a really great use case for, Elastic Cloud Resources. Anyway, that's that's a a sort of rambling tangent. But I was at Rack at the time because I had been working on a project called Drizzle, which was a fork of MySQL that, that had we started when we were working at Sun Microsystems after they bought us. I was at Maestro, then Sun bought us. And, Brian Acre decided to to make a fork of MySQL that, like, ripped out a bunch of the crazy features that had gone in over the years and, like, get back to basics. And then Sun there's a reason that Sun got bought by Oracle, a little bit after that because, Sun's response to that was, yeah. We totally wanna fund that. That sounds like a great idea. Fork the thing we just bought for a $1, 000, 000, 000. So, yep. Great life and business decisions. Anyways, it worked out well for me. In Drizzle, I'm I'm mentioning all that because in Drizzle, we had decided as a sort of development approach, to, you know, to have some, have CI jobs for things. This is what? 2, 000 8, I guess, it started out.
Have CI jobs, and then we had a a policy that your your submission, your, merge request, because we're using BCR and Launchpad at the time, your your merge request had to pass the tests. Right? It had to pass the CI tests, before we land it. And a human in the in the team would, we would take turns sort of being the merge captain, and base guard job was take the merge request, submit its URL to the to the Jenkins job that would kick off things based on that, it would test them. And if they came back green, we would we would click the merge button. And that worked really well. We really liked it as a system because it allowed, you know, it allowed us to have, you know, the the jobs always be be green and and to make sure that the code always works. So if you came and grabbed the source code straight out of our our trunk branch, and and ran make, it it should always work for you. Right? So you could always start development from the place of a working code base. The downside to that is that it was really boring work to be the merge captain. There's there's no way you can do it better. You're basically just clicking a button. And so when we started Oracle Bot Sun, the Driscilla team didn't really feel like working at Oracle and the Rackspace offered us a spot at Rackspace, which is why we were at Rackspace, when the OpenStack project got started. And so they pulled me in, the Rackspace folks that were getting it kicked off, pulled me in to sort of help, make sure that the open source side of it was was well and get the, you know, project infrastructure set up. And so we decided to duplicate the process from the Drizzle, side of things, but automate it more. Right? So that we could we could automate the merge captain side of that out and just have all of those pieces be, be completely just automatic. So that if somebody would do a review, they would approve it. It would go through testing and if it passed the tests, it would merge. And that is that's how we started off, since from day 1, the very first commit into any of the repos once they became OpenStack, went through this, it's a process we refer to as gating, and it went through the gating system.
The first version of this was very cobbled together. It was it was not, it was not an attractive system. I would not recommend the architecture that we use then to anybody, but there was a tool called tarmac that the canonical folks had written for Launchpad, for sort of a similar, idea, and it it would go and grab approved merge requests from the queue on Launchpad, pull them down, run tests on them that you defined, and, and then either merge the the the merge request or or not or reject it. And we had a we had a Jenkins job that ran just it was a timed job that just ran over and over again. So it's basically in a loop. It was a Jenkins job, not because Jenkins is doing anything, but because we needed some web dashboard for people to be able to look at their build logs.
And it basically just was a job that just ran the tarmac command to grab whatever was out there. That worked for the 1st year of OpenStack though. It was actually it was actually great. It didn't really break very often. It was a it was a pretty good system, but then OpenStack kept growing. It did not stay Rackspace and NASA for more than about 5 seconds. And and we kinda had some meteoric growth, which is the person in charge of the project infrastructure is terrifying. It's the type of success that everybody everybody wants except for the infrastructure team. Team. You're like, yay. We're being successful.
We have harder problems. So who wound up hiring Jim Blair, who I mentioned, got me into Python in the 1st place, and, and we started actually doing some real engineering on, on the tooling at hand. And it it took us, I think, about another year to get to the point where, I got a phone call from from Jim, and he was like, hey. So we're doing a bunch of work to kind of talk about bringing in Jim. Yeah. So then we, we we brought in, Jim Blair, the the guy who introduced me to to Python, and we started doing some, some real engineering. And in about a year later, he we were having a a conversation and he's like, hey. I've got this crazy idea.
We've we've got these existing tools and we're using them and they're and they're great, but they don't really understand the thing we're trying to do, which is, which is that we we wanna make sure that, that we that we, you know, test all the patches correctly in between approval and merging them. Right? Like, this this sort of gating thing. It wasn't really what the other tools were designed to do. It's like, what if we wrote a tool to actually that actually wasn't just trying to configure something else that was designed for a different purpose in mind to to get to do that, but just actually understood that that was our goal. And more to the point, what if we apply the idea of of pipelining, an optimistic branch prediction from from microprocessor design to the to the topic. Right? So that you create that we can create a a virtual queue of changes that have been approved and that want to land, and we can test them in parallel to handle sort of the explosive growth of of OpenText. We don't have to keep waiting, so we don't have to skip tests or do anything crazy like that. Keep running all of the testing that we every time, but run it in parallel, but be able to run it in parallel correctly on what the state of the repository will be. Right? So not we're not gonna test your patch. We're gonna test what your patch what the repo would be if we merged your patch into into the repo, which is where making a sort of a virtualized serial queue of changes comes into mind. Because if you have, say, 20 changes be approved, in in a in an hour or in a in the chunk of time that it takes you to to run 1 set of tests, then then you have this, you have this real sort of, race condition, an inherent race condition problem. You're you're testing if you don't do that isn't testing the the resulting state of the world.
And, so I was like, wow, that that sounds like a great idea, Jim. We should definitely do that, or more to the point that you should definitely do that. You can please go please go write that. That'll be that sounds like it'll be a great thing. And so that's that's basically the, the genesis of of where this all all came from is that a sort of initial use case of we we want to as correctly as possible, test the the future states of these of these repositories, and, and but not have to you know, the pathological case, of course, is that that it then goes down to to serial. You can only land 1 patch at a time, and you can't start testing the next batch until the first batch has been, has been landed.
To make things more complicated because that's not complicated enough, OpenStack from day 1 was, it started off as 2 repositories. By the time we were working on Zuul, I think it was probably at, like, 10, or something of that nature, but we'd had this idea. And it's, you know, the the microservices world has made this pattern, I think, much more prevalent. At the time, we thought it was just our craziness. But but the idea of of having the the functionality split across multiple repositories, that are each implementing a specific service of the of the overall product. The thing we ultimately want to test is the overall product. Right? It's if Nova works, but it doesn't work together with Swift, then, you know, that that isn't as useful. Right? So just testing Nova or just testing Swift, just testing any of the individual components in isolation isn't as great of a final outcome as testing the the overall, integrated whole. And and I get into discussions with people from time to time. Like, well, you know, if you have proper API contracts and you test the API contract boundary, it it should all work and you should be able to do it all in in isolation. I'm like, well, yeah. You should be able to do that. But in in the in the the world is a dirty place. You know? Like like, that would that's a great computer science theory.
And in practice, stuff stuff gets weird. And, also, when you're talking about an IS platform for cloud, it's not just about the APIs. There's there's interactions. There's behaviors with the underlying infrastructure that that should be in the API contract. But, like, how do you define them? It gets weird. And and so rather than stand on ceremony and say, now we're just gonna we're gonna spend 5 years hyper defining, API contract boundaries that no one's ever gonna actually fully read the entire specification. We we apply this. So that give us 2 in a very, long winded and and sort of rambling, way sort of the the basis, the fundamental of of what XUL is? It's a gating system. So it's there to, at at its core, it's there to, to allow you to validate proposed changes to a system.
And and to do that across, multiple repositories with the idea that that you might need to test test these repositories together and that that virtual serial queue I was talking about where we're gonna we're gonna do that and then do it in parallel can span projects so that because the interrelationship of those projects is actually the tricky part. So we wanna make sure as we're testing the state of the system, if the patch were to land, that it's the state of the integrated system, not just the state of an individual repository. So it's that multi repository gating, is, is the is the bread and butter of XUL. It's the thing that the thing that it does. It does lots of things, but, but that's the that's its fundamental identity. That's that's what it is and and why it's the gatekeeper.
Of course, Zuul being the gatekeeper is is we're not the only people to have had that thought. The fine folks folks at Netflix have a an edge gateway routing infrastructure, and there's also a JavaScript testing, framework called called Zuul, for for unit tests and and functional tests and things of that nature. So there's there's a few different people who've had the the great idea to use the the name Zuul for something in, in relationship to the the word, gatekeeper. But, I I can say, category I can say that we were the first of the 3, and and I I don't wanna point any fingers, but Jim and I did actually give a talk on OpenStax XUL at Netflix several months before they open sourced their, their XUL. So, you know, I claim dibs.
[00:18:22] Unknown:
And 1 of the things that I, really liked as I was looking through the documentation is the feature you were discussing as far as having the optimistic merges, and then test testing the resulting state. Because fundamentally, that's really the core and 1 of the most useful principles of continuous integration is that you're actually testing the integrated state of the system. Whereas a lot of the other services and platforms that proportionate PCI will execute the tests against the state of a given branch, not necessarily what it looks like after everything's been merged so that you can get, you know, a full clean set of tests and then you go to merge and then all of a sudden everything's broken because you didn't actually test what the final state will be, particularly given what you were saying of having the case where multiple people might be working on parallel branches and then decide to merge them back together at roughly the same time. And then, you know, okay, what what's the order of resolution for these? So having that built into the fundamental
[00:19:26] Unknown:
mechanics of the system makes it a lot more robust. Exactly. And that's that's exactly the the you're you're exactly right about that, and that's the thing. And we found it to be, exceptionally powerful. So in in OpenStack, we we have a, we're we're very fortunate. We have a we have a lot of we have a lot of cloud resources, at our disposal. Maybe have something to do with the fact that we're building a cloud operating system. So we're we're we're we have more available testing bandwidth than, than a lot of people do. Although I I would say that our our amount of it is the actual it's the right 1, and I wish that more companies and projects would fund their testing infrastructure. They would realize the the power of of funding that testing infrastructure, well, rather than thinking of it as just as a as a cost. Right? It's it's really it really can be a powerful enabler.
But because we have that, we we actually test all of the patches that are proposed to OpenStack twice. This isn't this is a thing you can do with with Zuul. It it does not required. None of the things that Zuul is very configurable, so you don't have to do any of the things I'm talking about. But, we we test your patch when you submit it, when you upload it, into the into the review system largely, and we test it largely as it is. And that's mostly as a as a way to, to help the other code reviewers. So the people reviewing your code, it's kind of a waste of their time to review your code if it doesn't pass tests in the first time, and human time is valuable. We don't wanna waste reviewer time with reviewing fundamentally broken patches. But then we once a re once the appropriate amount of reviewers have approved, reviewed it and approved the patch for merging, then we we do the the gating and we run it again. And because of we're kind of running this test twice, in theory, you should expect that all of the tests in the gate side of it would work. Right? We've already tested the patch before at the at our check layer, and that is actually not true. Right? We the the amount the amount of breakages that are caught at that second level is significant.
And and part of that is is because we're a high volume project, but it's a it's a really we've been trying to think about how to quantify this so we can write, like, a more academic paper on on the topic. So we might have to go do a bunch of data mining at some point. But but we have at least anecdotally experienced that the system is is providing a high amount of value and allows people to go fast. Right? So you can that actually several weeks ago, Jess Frizzell, the works on Kubernetes things as well as other container lots of things, you know, tweeted out. She's like, what I really want is just a thing where I can just say, yes, merge this, and then have the bots merge it when it's ready to merge. Right? And I'm like, well, yes. That's exactly what we've we've done. It's that desire. You wanna just do review, click the button, and not really have to worry about it at that point. Have the computers do the right thing for you even if it means retesting that patch 7 times through automation because something ahead of it failed and and so you kick it out of the out of the queue and and you retry things. That's really boring for a human to do correctly, and a computer doesn't get bored. So it's fine, it's fine doing all that as correctly as you as you can. So, yeah, you're exactly right about all that.
[00:22:48] Unknown:
And your point about data mining, I'm curious about what sorts of information and metadata are collected automatically from Zuul, and the types of information that you can append and, create in the process of running the various tests so that you can do Yeah. Some interesting data mining and collect some sort of development statistics to feedback into your overall cycles?
[00:23:13] Unknown:
Yeah. So that's a that's a great question. We we actually collect a bunch of stuff, and and there's a I'm gonna give a quick shout out, to our our the OpenStack developer infrastructure. 1 of this second, I'm gonna I'm gonna mention a couple URLs in here. We run all of the developer infrastructure for OpenStack also as open source projects and as, sort of open ops services. So all of our operations is is open and public. We don't have private secret ops things other than a a file of of passwords that gets fed into into Ansible and Puppet. But all of our Puppet and Ansible is all all open, as our our monitoring systems and and data collection systems. So it's all a 100%.
All of the data that we collect is available for anybody on the Internet who wants to go mine it and look at it, because we're an open source project. It's all open. Anyway, amongst the things we do, we we collect a bunch of, stat type metrics. We stick them into a graphite. We have a a graphite system. So we we stats the graphite, to collect, the things about our our our node pooling, system that's like, what are the what are the build nodes? How are how are they doing? How many of them are we using? Things about success and and failure rates. So that our our systems are instrumented to, to stick stuff into, graphite.openstack.org, which is an open graphite. You can go you can go dig into into anybody can go dig into that if if they want to. We have a, a dashboard for that, set of dashboards for that, using Grafana.
So you can go to Grafana dotopos.org and see the the dashboards that we have put together with that data. But that's just sort of stats and metrics. There's a couple of other chunks that we, that we collect. The biggest 1 is is logging. So in a sort of traditional, CI system, clearly, you have the you have the build log. Right? Here's the here's your the log of the script that you just ran, and and here's your your test output. You know, these tests passed, these tests failed, etcetera. So we obviously collect that. But what we're testing is is a cloud. So for integration testing for us, that means actually installing an entire OpenStack installation, and then running tests against that running cloud. And so we actually also collect all of the operational logs, for all of the services.
So we we collect sort of a a in any given job, we can collect as many logs as was produced by the running of that. Because if you wanna debug why your patch didn't work, it may not the irrelevant information may not be in the in the test case itself because it it might be just in a in a in an operational service log of Libvirt. Right? And so as a developer, you need to be able to do that, which is the same things you're gonna have to do as an operator of of that service. So we have a a log stash, elastic search cluster, as well as, as as sort of the the base logs storage itself. So if you go to log stash dot openstack.org, you can actually search through all of the, stored logging of all of the test of all of the clouds that were involved in all of the test runs that we did inside of the system, and we have some other, we've got some some systems that, so as much as gating is awesome in a in a system like an OpenStack, there's a bunch of asynchronous processes and race condition bugs can sneak in even with all of this perfect gating infrastructure.
And so sometimes people's patches will still fail not because of their patch, but because of a bug that only shows up 1 in every 1, 000 runs. Right? And so we actually have some data mining systems that run after a patch fails, in the gate that will say, hey. We've looked at the we've done with some log stash queries, and we've noticed that you've hit this known race condition bug, or we think that you've hit that based on on analyzing the logs. We've we've got some people right now that are starting to look at applying some AI and machine learning, to those datasets, to see if we can start doing some more advanced, error detection and analysis and and things of that nature. And that's been really nice that our dataset is all public because, when people have come up and said, hey. I'm working on machine learning, and I'd I'd like to talk about how it can interface with CI. I'm like, we have 12, 14 terabytes of of log data given, available at any given point in time that you can can run your stuff on without needing to ask us for permission to to do so.
So people can experiment with that, on their own. If we come up with anything, we can incorporate it into the the base systems as, as best we can. We also have a, a thing that was written, that's also a general tool. We so we use test runner called, test repository, which was written by the folks at, Canonical, back in the day, that uses a a test, streaming output format called subunit, which is, written to allow, force streaming test results, that are interweaved. And we we had a 1 of the people in the QA team write a tool called subunit to to SQL, that takes all of those test results after a thing and shoves all of them into a a giant, SQL database so that people can start doing trend graphing, on, on the results and and see, what what's going on at the test case level.
So there's a bunch of tools. You'll you'll potentially notice that that actually none of those tools are really specific to Zuul. It's like not that Zuul is doing those things. And that's actually a really important, design, choice that we've made is that we we don't want to add graphing to Zuul. Right? That doesn't make any sense to us because there are already things like graphite, and and Kibana and, you know, Grafana and all of those things that that do that job better. So we want to be a facilitator to use those those actual tools that are out there rather than sort of grow a whole bunch of bolt ons onto, onto onto Zuul itself. Oh, yeah. So so that also we do also in Zuul collect we collect build history, and we have we have some sort of real time, status dashboard. So there there is a there's a a Zuul dashboard that will that will allow you to go look at, you know, historical job performance, and and things of that nature.
But because of and this is the thing that that sort of trips people up from time to time when when we're when they're coming from, like, a Jenkins background with the sort of just run tests on a on a on a given repo or in a given branch, that the concept of, you know, green or red, is it is it is the build, you know, is the build running? Like, what's your what's your job failure percentage over time? Like, how is it doing? Those are meaningless con those are sort of meaningless concepts for us because the entire idea is that it's it's a part of a the normal developer process for tests to fail, but everything passes once it goes through the gate. So a given branch, it's always green. Like, there is no trend graph.
But you might wanna find things about over time, you know, is there a is there a test that's causing people to fail in the gate more frequently or or things of the the data mining is sort of on a different axis. We try and collect as much of that data as we can and and let people,
[00:30:43] Unknown:
sort of analyze it and and figure out what's going on. And 1 of the things that happened recently is that Zuul got spun out of the broader open stack umbrella to be its own project. And that sort of brings it more into the view of people who are looking to, set up their own CICD systems. And that raises the question of how Zuul fits into that overall ecosystem because there are so many other options available such as GoCD, Jenkins that we've mentioned, Travis, CircleCI, any of the numerous hosted services Mhmm. Or open source projects or commercial projects. And, so I'm curious for somebody who is in that market, what is it about Zuul that stands out? How does it compare in terms of the use cases that it enables?
And lastly, I guess, it it obviously scales to very large systems, but what are the limitations in terms of, somebody who's considering a very small workflow in terms of team size or project number or project size?
[00:31:53] Unknown:
So, so first of all, I just wanna say, yeah, we're we're really super excited about about the making Zuul a a a top level, project. It's been a, just in the last year, we we released, sort of version 3 of Zuul. And 1 of the the most important parts of that is, we we we took a bunch of feedback that we gotten from people. Zuul originally had basically just been written as a tool for the OpenStack project, and and we didn't take on other people's use cases as primary use cases. Right? We would that was it was there to serve the OpenStack project. If it could serve somebody else at the same time, cool. But that wasn't what we were in that wasn't what we were that wasn't what our our driving focus was. Right? Our driving focus, make sure OpenStack can can deliver its stuff. But we got enough interest from people, enough people trying to trying to use it for their own use cases, and and having trouble because we hadn't really designed it to be reused in that way that we made we went and made it more general. Right? We we we changed some of the the design facets, but we also generalized it. So and took on this is this is now a project that that is not just focused on the OpenStack project. That's just happens to be 1 of our it's our largest user, but, you know, it it's just 1 of the users. And we wanna wanna be able to elevate those other users to the same level of importance to the development team in terms of that. So that's been really exciting, exciting for us. So we have, we have our own website, Zuul zuul dashci.org, is is there, and, we we've got a a bunch of, you know, things moving to to sort of help solidify that story, and to tell a story. And there's also people who, you know, assumed that Zuul would only let them use the the workflow that the OpenStack project had decided on or that you had to have an OpenStack cloud or things of that nature. And and we wanna make sure that it's clear to people that, although we think that using it with an open ZacCloud is the best choice because we like open source infrastructure, that's that's not that's not a fundamental, requirement, for for using Zuul. And so we're we wanna tell that story to people. There is a great ecosystem out there of of sort of CICD systems, and I think another important thing for us or at least for for me, I don't wanna speak for other people. I do wanna speak for other people, but I'm gonna try and avoid doing that. I don't wanna get into a great we we've we've developed, like, this this whole the industry has has sort of taken on the proprietary software mindset of everything being a competition, and I don't I I prefer vibrant ecosystems myself. I think it's great that that Travis is there, that Jenkins is there, Go CD, and and and all of the other the other choice. I think that's good for everybody that there's these there's these choices and that that there are there are different systems that might meet people's needs in different ways.
And and so that's wonderful. And I want to, as best as possible, be complementary to, systems where they where it makes sense or, you know, or or whatnot. You know, the the thing that we covered earlier, the the sort of multi repository, gating is a feature that we have that just does not it is not available in other, in other systems. So that is if that is a thing that you are interested in, I can I without picking on anybody because the other systems don't have it because that's not their that's not the thing you're trying to to give? Nobody's failing at that offering that. It's just not an offering. If that's the thing that you want or the interested in, we're we're the thing that you should be we'll be looking at. That's our that's our primary purpose in life. There's other use cases, other ways to think about CI that we're not gonna be as good at.
We're not super well suited to having a thing that runs jobs. You can run periodic jobs on a on a branch, or you can just run post commit jobs. All that works. But you're not gonna have as good of a that's if that's your primary CI approach, we're we're probably not as as pleasant a system. We we we sort of wanna nudge you towards that gating model as as much as possible. There's some other things that about us, well, I haven't even mentioned yet that that fit into that, like, why you think about us as opposed to to 1 of the other ones In writing Zuul so Zuul v 2 actually was essentially an external scheduler that triggered jobs on a fleet of back end Jenkins servers. So all of our jobs were defined as Jenkins jobs. We actually before Zuul v 3, we we ran 1 of the world's largest Jenkins infrastructures.
So we we actually know a lot about Jenkins. We wrote Jenkins plugins. We've we've, you know, I have I still have core committer status in the Jenkins core. And it's a great project, and and I I really appreciate, those guys and everything they've done for, for the world. It wound up being given once we had Zuul doing its thing that we were essentially using Jenkins as a remote code execution engine. So just to run remote shell scripts, and it's a little bit heavyweight for that. Like, we weren't using the its plugins. We were we were minimizing the the Jenkins plugins we're using, for various reasons.
And, and so in writing Zuul v 3, we replaced Jenkins in the overall architecture with use of Ansible. Because we had we'd written a tool called Jenkins Job Builder, which allowed you to write Jenkins jobs in YAML, and then have them be exploded up with some templating, and and have them be exploded out into the actual Jenkins XML that would then get pushed into a Jenkins by the Jenkins API. And that Jenkins job builder YAML looks remarkably like, like Ansible in the first place. We didn't really wanna write another remote code execution, system. So we just adopted Ansible as the language of, of of writing the tasks.
Another 1 of the main features we added in v 3 was sort of native multi node jobs, and this is the other place where Ansible came in is again, we're we're testing our our main first user is OpenStack. Testing that you can install a cloud on a single machine is a great way to do some functional testing, but most people aren't deploying an entire cloud operating system on a single computer. And so people really wanna do testing jobs that span, multiple nodes. And that's always tricky in a model where you have a build node that's attached to the the sort of central server, and and you run code on it. So by choosing Ansible, it allowed us to still have a simple way for people to express shell snippets they wanted to run on machines, but also for us to be able to for them to be able to associate those with more than 1 machine. So run this command on machine 1, run this other command on machine 2, and then to as part of the job definition in Zuul, basically say, for this job, when when a patch is uploaded, I want you to run this job on this project, and this job essentially wants 4 machines that are gonna be named these things, and, and it wants to run these Ansible playbooks, against those. That's basically what what ZOLL jobs are now for for users.
And and so that sort of brings us to an sort of another depending on who you are and and how you're thinking about things, we've been trying to think of, things that I've been saying the the phrase, you know, test it like you you deploy it. We're trying to avoid having a lot of test infrastructure or testing CI explicit automation and rather provide a facility to use your actual production automation in the testing context. Right? So if I'm having to write complicated logic to install or deploy something at the testing layer or to describe what that workflow is, then again, I may not actually be testing what production is gonna do. What what I ultimately want Zuul be doing is testing proposed new states of production.
So that if you do all that right at the at the gating level and and you and you set that up and you're testing using your actual, production, automation, it should be you should be able to, arrange things so that you can do continuous deployment of your code, without without worrying about it. In fact, we run, most of our well, not most. We run a lot of our the systems Turns out some systems aren't written to handle, a continuous delivery, continuous deployment, system. And when we run other people's code, and sometimes our own, which just hasn't been architected to to to to roll out every every second. But we we run as much of our code for the OpenStack infrastructure systems, continuously deployed. So, like, you we land the patch to 1 of our repos and that is getting that is getting rolled out the instant lands.
But that's because we're we're we're gating it. It's because we're doing that validation ahead of time, and we wanna be able to expand the validity of that by by letting you use your actual, your actual production, automation, as part of the testing workflow. So I think that would be 1 of the other the other focus points we don't want. We don't want domain specific automation for testing. We we we instead want you to write your write your actual production stuff and then use it. It gets you know, there's there's ways and there's times in which that doesn't make sense. You probably don't have production automation to run unit tests. It's it's probably not a thing you have. So there's there's some amount of of test specific logic that's always gonna be inevitable, but also by using using a existing system, such as Ansible, to describe that, we're also hopefully increasing the the shared language between, you know, developers, testers, and and operators, that if they're they're all using the same tools and language, then it's easier for them to co collaborate and to and to figure out what's going on.
Don't really not really a big fan of the whole well, here's the automation we used in testing, but they're using this other thing in in production and, and everybody hates each other and, you know, so we wanna we wanna try and, you know, facilitate that, as much as possible. I think finally and I told you I was gonna try and do this in less than 10 hours, and I was gonna fail. Finally, we're we're, we are both a both designed to run as a as a public service on the Internet. So, like, we run the open OpenStax XOLL runs completely on on the open Internet and has been designed for that use case from day 1. Right? It's not a back office system. It's it's a it's an Internet service, and it's also designed to be deployed locally. Right? So it's, and it's fully open source. So a lot of the systems out there have 1 of those features.
There are 2 of them or or many of them, but most of them don't have all of them. Most of them either if they're designed for, full public large at scale Internet size, and you can deploy them internally. They may not be fully open source, or if they're, or they may just be designed for internal use and they're really not appropriate for, public, you know, at scale. Like, the the scaling idea of it may be that you install 5 of them. The scaling story for Zuul is that Zuul is scalable, and we have yet to find the size of project that it can't handle. If you can if you can, if you can push Zuul past that scaling limits, we'd love to talk to you.
But we haven't found them yet. We have a really large first customer, so we're we're pretty confident in being able to say that it can it can handle the scale. Whatever we are fortunate enough to have to deal with the scale problem ourselves. So we wanna be able to give that to you. But you you also mentioned a question which is very important to that. Scaling up, we absolutely can handle scaling up. We can handle as many. We're a multi tenant system. So you so ZOLL can handle a single ZOLL installation. You could run it at your company and have every department have their own tenant rather than having each of the departments have to have their own ZOLL.
So we've designed it to be able to share resources for efficiency sake. Scaling down is is a really important 1. We've been working on that a lot. We have some some quick start guides. You can install Zuul all in 1 machine, but it is it is designed for scale, so it's not it's not quite as as simple, to get started. That is a primary focus for us. We want to continue to make that easier. I'm actually working on building and and publishing container images, like, literally right this week. So we wanna be able to hand people things like helm charts, to make it because it's a it's a it's a multiservice.
It's a I don't know if there's an official definition of microservices, and I'm not really crazy about the redefinition of distributed architecture as microservices. So it's a distributed, it's a distributed network application, which means there's multiple processes that that talk to each other over the network. If it makes people feel better to call that microservices, cool. But it is that, and so it's not gonna be just apt get installed this 1 piece of software and start it, and it's all it's all running. So we're we're working on making that scale down story, as best as best we can.
We've added a bunch of sort of pluggable drivers in the v 3 rework. So we have we have support for we did not use to have support for predefined static nodes. Everything sort of came from, from dynamic cloud resources. We've got support for that now. So we're trying to get that sort of getting started, the the get yourself going. I I personally have a goal that I would like to be able to run Azul. I would like to be able to sensibly run Azul on my laptop for testing my personal projects, and have that not be crazy. Right now, that's a little bit much. But that is I I want that goal to be easily met, in within the next within the next year or so. You could do it. It'd just be more work, like, not from a resource perspective. It's just there's a little bit more to manage, than I think is is where we want to be ultimately.
But it's definitely we've got we have some of our non OpenStack users out there running it for in some pretty small, pretty small environments. So we're not not all of our users are are sort of mega scale at this point. We do have some people with some smaller bits. And there's other so Red Hat there's a team at Red Hat called, that has a thing called Software Factory, which is not just a distribution of Zuul, but it includes Zuul in it, also includes Garrett and and and a bunch of other things. But they've got a nice, like, installer. So if you just wanna get going, you can you can grab Software Factory and install it, and it's got that sort of, easy to get going, story down.
And that's wonderful, and I I'm thrilled that that team has made that that stuff. We would like for that to be just as easy straight from the upstream. We don't necessarily want you have to rely on a vendor to get an easy version. We do want people to be able to provide it both as a service and as a as a sort of vendor software product, to people because we think there is a value add story there both from a support and a and a services perspective. But we don't wanna we don't want your your reason to go to a vendor giving you a Zul to to be that it's too hard to install. That seems really lame to us. So, we we want to fix that over time and and make that make that story better. And so in the
[00:47:15] Unknown:
architecture for the project, you've made some design choices that you mentioned, including using Ansible for the remote execution engine, and it looks like it currently only supports Git as the source code system for being able to pull down the patch sets. And I know that you used zookeeper as the discovery and coordination mechanism for the node pool. And some of those are systems that people might want to swap out in their own instances. And so I'm curious how difficult that would be and what would be involved in replacing some of those different components for being able to customize an installation or a distribution of XUL? Yeah. That's a that's a great question.
[00:48:00] Unknown:
So I'll I'll start with I'll start with Git because it's the it's the easiest, it's the easiest answer. It would be exceptionally difficult to replace Git with something else. When we first migrated to Git for OpenStack 7 years ago, we were we were pretty excited because everybody's like, hey. Nobody knows BCR. Everybody's using Git. Move to move to Git because, you know, it's it's where every the industry is, and we're like, great. So this will be fun. We'll get to learn everything there is to know about Git from all of the developers who know how it works already, and then we discovered that, in fact, nobody knows anything about Git. And so we had to we had to go learn a whole bunch. Zuul is Zuul is very deeply embedded, with with Git, in the way that it it does things. Now that said, we had a lovely chat with some folks about a year ago, and I think it's about probably about time to to catch up with them who were interested in ZOLL, but were a per force per force shop.
And the thing that we were going to investigate with them is whether the Perforce get sort of gateway integration would be enough to let Zuul do what it needed to do without needing to learn as as much about Perforce as it knows about Git, and and and go that way. So that was gonna be our first. And so I imagine for most of the things most of the other, distributed version control systems, we would we would probably first try and see if it's possible to, for a translation layer to, to to be sufficient. At the end of the day, it is all code. So, I mean, there's certainly nothing to say that it couldn't be done, but it would be it'd be a giant rethinking of of some things. So for now, I would say that that we're if you are on something that isn't Git, we're probably a stretch, for you. If you're a DVCS that that can translate back and forth to Git so that Zuul could clone you, you know, via a translation layer and do all of its merging that way and have those results be okay.
That's actually probably not too unreasonable to do. But if it's something like a subversion, no chance. It's just not it's we're the things we're doing is not I don't think it would be worth the effort on either on either side of of the coin, to try and to try and get it into into a system, like that. The other ones, the this Ansible and then we've got, zookeeper. There's also some Geerman, in there for job scheduling, although we've got a plan to make the, the Geerman go away long term. Ansible is actually sort of the architecturally and we've talked about this a couple of times. Architecturally, there's nothing stopping us from, replacing the Ansible execution layer with something else.
On the other hand, 1 of the things we found to, like, all of our our bulk config management for the, OpenStack infrastructure systems is currently all in Puppet. And, we've got we're we got some changes coming for that for for us. But, so we actually have our Azure jobs. The the Ansible playbook involved in, in running our tests is is really simple. It's a it's a thing that says run puppet. So we we try and we've we're trying to sort of approach that in such a way of if you're not really into doing deep Ansible stuff, that's okay. Like, it's 1 of the things I like about Ansible. You can just write shell scripts. You can do whatever you want to and just kind of see it as a very simple YAML way to describe run these tasks on these remote machines. And those tasks can be run puppet or run chef or, you know, or run kub cuddle or or whatever it is that your your thing wants to do. I think there are some there's some some ideas people have had about some ways in which making that pluggable might be interesting.
But I think we also we in general, and you'll find this refrain from us. ZOLL has a complex job at its core, so we'd like to keep elements of it as as simple and straightforward as we can until we can't. Right? So we'll try sort of, like, with the Git thing. The first stab will be well, let's see. So you're on Mercurial. Let's see if we can do Mercurial to get translation at the at the CI layer and how acceptable that would be. So we'd probably wanna work with somebody who wanted to use it, but with something that isn't Ansible to see if Zuul's use of Ansible is hidden enough that it's not it doesn't really matter. Right? And if not, then that it's always it's just it's just code, and we could we could have that conversation.
And and I think there's some ways in which that could be done that wouldn't be completely insane, but there's also some ancimbalisms. There there's some there's some pieces, so we it'd have to be thought about, for sure. The the zookeeper 1 is an interesting 1 because, especially these days with the popularity of of Kubernetes, etcd comes up, in questions from people a decent amount. And the I have to tell a grandpa story here for a brief second when we were when we were looking at the options for what we're using ZooKeeper for, and part of that involves distributed lock management so that we we can, make sure that we have only 1, only 1 executor handling a task at a time, etcetera.
At that time and at that point in time, etcd 3 hadn't been released. And the word from the etcd folks is that etcd 2 was that we that actually using it in the locking way we wanted to was inappropriate. Like, we that that 1 should not use that CD for the for those tasks. So we didn't, and we went and we did zookeeper, which was which is okay. Since then, etcd has actually grown that functionality, and it would be completely technically feasible to use Zookeeper instead of excuse me, to use that CD instead of Zookeeper for for the ways in which we're using it. But we we've so far made the choice that we we don't wanna make that back end system pluggable because we're a bit worried, about getting the least common denominator between the 2 systems and having a system that doesn't work well on either zookeeper or etcd, which is, sometimes we wanna be able to make use of those features. But it comes up from time to time for sure. And, and I'm sure it's not gonna stop coming up from from time to time. So that's 1 that that definitely, in in conversations, but so far, it hasn't been, it hasn't been enough of a problem or an issue or a blocker for people. I think it was actually somebody who was investigating the there's a zookeeper proxy gateway for for etcd that allows you to talk zookeeper protocol to etcd. And I think somebody was exploring whether or not they could use that from from within Zuul. But so that's that one's a bit, probably a bit harder. That said, it there is inside of the code base an an abstraction.
It's not an abstraction layer, like, we're gonna abstract the differences between MySQL and Oracle, but, like, there is a there is a class layer inside the code. So it it would there is a there is a place where 1 could obviously do that work. The thing that we're not sure about, especially as it goes with multi node locking, is it's 1 of those things where everything works right until it doesn't, and the the devil is in the the failure cases. And we're we're pretty confident that we've gotten those worked out appropriately with ZooKeeper, and I would I would be reluctant to have somebody just go off, replace it with etcd without the the sort of benefit of we we got to ram through ours in in a really high high load environment that's kind of harder to to simulate elsewhere.
And I would hate for people to to not not to be able to reap the benefit of the, the sort of, large scale production, scale testing that ours, happen that we have a, you know, an upstream 1 that that goes goes through. So it's, you know, goes into the into the bucket of it would be it would be possible, but I certainly wouldn't suggest to somebody that they go off on their own and do it. And if it's a if it's like a just a like that much of a a blocker blocker than than chatting with us upstream. Although there are there are people who have so far there are people who have said, hey. That's we don't really wanna use ZooKeeper. We we don't like that. And we went, like, well, we're not gonna replace it anytime in the near future. And, that hard blocker against ZooKeeper goes away, pretty pretty quickly as well when you realize they don't really have to do a lot with it. So, anyway, that's a very long winded and it's it's a it's 1 of those most of those are it it depends things. It is all Python.
So, you know, you can you can always do whatever whatever you want to with it because Python is is awesome like that. But I would I would recommend at least at at first, to the zookeeper needs are pretty are pretty straightforward, and it's pretty simple to to run 1 considering it as 1 of the services of of Zuul rather than, hey. I'm gonna you know, like, we had somebody who wanted Etcd because they wanted to run it in a Kubernetes, and we're like, well, you know you actually shouldn't reuse the Kubernetes Etcd for your applications. Right? Like, that's that's highly found upon from the people. That was sort of the thing they were wanting to do. They wanted to reuse that running that existing etcd. And I was like, that's that's a bad idea. Like, don't do that. Like, that's if you wanna run a second 1 inside of Kubernetes. But if you're gonna run a second etcd inside of Kubernetes, there's no difference in doing that and running a second 1 for for Zuul inside of running a a zookeeper inside of of, Kubernetes as well. So, like, it's not from a from a here's your helm chart to run Zuul perspective. It's not really that big of a win.
Anyway, that's me rambling on on this particular But, but, yeah, it's it's a thing that but there are a bunch of places where we have put in explicit, extension points. We we have GitHub and and both GitHub and Garrett support inside of XUL natively at this point. That's a pretty fleshed out interface. Same thing in in our node pool system. We we have, support for OpenStack and static nodes, but we have patches written and in review for, for Amazon and Google and Azure. And, there's a person who's gonna be writing a Mac Stadium 1. Like, we've got like, there's drivers in all of the places where you would at where it it really is important to support multiple systems at the same time for the for Zuul to be useful to people. So, it's not in in all the cases, it's not out out of the question for us to have a plug in structure for something, but we just we're we're tend to be pretty conservative about which things we make pluggable so that the system has a coherence across installations.
Oh, actually, you've I've that that reminds me of of a thing that we have that is so unique. We don't really even fully know how to we're still learning how to use it ourselves, which is that job configuration for Zuul is in the YAML files in Git repositories. And because Zuul's job is managing state and Git repositories, it it loads all of that dynamically, like, test jobs are are speculatively, configured just like your your patches are, which is all cool. But there's a corollary to that, which is that you can have a repository that just has some some ZOLL jobs in it. And those those repositories of jobs can be directly shared between ZOLL installations. So we have a a standard library, in a repository called ZOLL jobs.
The the expectation of that is that in another ZOLL installations, if you were to install it, that you could just point your ZOLL directly at our ZOLL jobs, our our sort of central, 1 that doesn't contain OpenStack specific jobs in it. That sort of base things like here's how you run talks. And so we're we're trying to make it so that a collection of different Zuul installations won't just be like a construction kit that the operator makes a service that is whatever he wants it to be or he or she wants it to be. But instead that there is a an end developer who is working on a project across a couple different Zuels would have a similar experience in them at least. So we want there to be some coherency. So that's 1 of the reasons that we were conservative about which things we make pluggable, and we wanna make sure that if there's something that's pluggable, that there's a consistent interface at the job layer. So the the the thing that the developer interacts with. I don't care so much if the experience is different for different operators. We certainly wanna make sure that the experience for different users, that if they're using a gate pipeline in Azul, that it it works as they expect it to. And they have a unexpected behavior and the jobs can depend on behavior, and things of that nature. The idea of directly sharing job definitions in a in a sort of live inherently CD manner across installations, that's that's new. Right? Like, that's not a that's not an that's not a normal thing, yet.
So we the the semantics of that and what that means for life are we're discovering those every day. Oh, wow. We put in some information about configuring mirrors that really sure is a weird interface and doesn't make sense for other people. Maybe that shouldn't be in here or or something. Right? So it's it's a it's a fun it's a fun learning challenge.
[01:01:09] Unknown:
And, briefly to your point of it's all just Python, I'm curious if you were to start the entire project over today, would you still make that language choice?
[01:01:22] Unknown:
Wow. Oh, that's we here we are at podcast.init. The Python thing. Probably, yeah. Honestly, it's been really great. We we actually have talked about, the possibility since we are distributed system that that talks, over the network, as as 1 might call microservices. We are architected such that we could replace some of the some of the pieces of the system with a different language. There's a couple times we've talked about replacing a few of them with, with c plus plus, just for for scalability, reasons. But, actually, we haven't actually hit any Python related scaling issues, and we're we're we're pretty pleased with, with it.
I was a long time Python programmer. Obviously, I'm I'm pretty pretty good at it. We've also with this, just as a quick shout out to the Python community, Zuul, v 3 is fully Python 3. It does not support Python 2, at all. So we're we're we're trying to we're trying to sort of, ride the the the modern Python wave, which is very exciting. We're we're currently working as far back as 3.5, but we do have some of our users are are already deploying in production on 3.6, and don't have any problems with that. So that's, that's exciting. I I want us to bump them into 3 6, because I think that the ordered dictionary is being ordered by default, in 3.6 is a is a great feature and will allow us to remove the use of orderdict and make the code cleaner. And I really want those f string format strings.
But I could see there being, there's there's times and the typing in Python 3, has has made some things nicer, but there are definitely times when you get into some really complicated things where having, compiler type checking can be helpful. And in those times when we've when we've hit places where we're like, wow. This is kinda hard to mentally deal with without some better compiler help on the on the things we've we've talked about, bringing some c plus plus into the mix, but we haven't really haven't really done any of that. And I'm I'm not sure that we ever will, but it's, it's definitely allowed us to be very flexible and, to get going, sort of very, very quickly. We've been very pleased with it. Incidentally, then I'm sure this is gonna cause somebody who's listening to this to to have a panic attack and run out of the room screaming.
We are a multi threaded Python application. So I I want to tell anybody who says that you cannot run multi threaded Python at scale, that that is not true. It is it is a fundamentally false assertion. And as with many things that have to do with scaling and optimization, everyone should always be aware of premature optimization. You do have to be aware of what the GIL is gonna do, but depending on what your application is, it may not really matter to you or you can structure it in such a way that it that it makes sense. So we're, we that is that is a that is a feature, of how we are designed, and it confuses people sometimes because they've been taught over the years that you can't do threads in Python. But I'm here to tell you, they work just fine. And the few times when we've sort of had scale panics, like we've hit something, None of them have been GIL related.
At no time has the has the GIL actually been the cause of any problems for us, and we are we are running at at quite a large scale.
[01:04:44] Unknown:
So I'm I'm pretty pleased with, with how Python's done for us as far as that goes. Well, for anybody who wants to follow what you're up to or get in touch or ask any follow on questions, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the Hitchhiker's Guide to the Galaxy series by Douglas Adams because they are fabulous books, and I have enjoyed them multiple times, and I'm revisiting them again recently. So, I definitely recommend anybody who enjoys a good story to check them out. And so with that, I'll pass it to you, Monty. Do you have any picks this week? Yeah. First 1, I say your pick is great, and I I need to go revisit them. It's been a while since I've
[01:05:25] Unknown:
I've read them. There was a I remember very distinctly the the scene where what's his, I can't remember which 1 it is, but is chasing a Chesterfield sofa across the plains of prehistoric Africa or whatever, and then it just always stuck with me as an image. I'm probably getting the details about that wrong. Pics from, from me, I've, and this is this shouldn't surprise anybody who's been in the same room with me over the last, year. But if you have not seen the Netflix show BoJack Horseman, I highly recommend it. I I'm gonna put a a sort of star on that, which is that when you've if you haven't seen it, if you first start watching it, it may not seem like it's a particularly interesting or clever show. It may seem like just another another archer or another, you know, wow. It's a it's a, you know, an anti hero that is drunk and and, wow, I don't really care.
I I promise you it gets really, really
[01:06:19] Unknown:
good. Alright. Well, I want to thank you very much for taking the time today to join me and discuss the work you're doing with Zuul. It's definitely a very interesting project and 1 that I'm considering for some of my own uses. So thank you for that, and I hope you enjoy the rest of your day.
Introduction to Monty Taylor and Zuul
Monty's Background and Python Journey
Overview of Zuul and Its Origins
Continuous Integration and Optimistic Merges
Data Collection and Analysis in Zuul
Zuul's Place in the CI/CD Ecosystem
Zuul's Architecture and Customization
Job Configuration and Sharing in Zuul
Python as the Language Choice for Zuul
Closing Remarks and Picks