Summary
A common piece of advice when starting anything new is to "begin with the end in mind". In order to help the engineers at Wayfair manage the complete lifecycle of their applications Joshua Woodward runs a team that provides tooling and assistance along every step of the journey. In this episode he shares some of the lessons and tactics that they have developed while assisting other engineering teams with starting, deploying, and sunsetting projects. This is an interesting look at the inner workings of large organizations and how they invest in the scaffolding that supports their myriad efforts.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan.
- Your host as usual is Tobias Macey and today I’m interviewing Joshua Woodward about how the application lifecycle team at Wayfair uses Python to
Interview
- Introductions
- Josh Woodward, for the past year have been managing the application lifecycle team at Wayfair. Prior to that, IC on python platforms team. Embed with teams looking to decouple from monolith. See pain points first hand.
- How did you get introduced to Python?
- High school physics class, TI84 Calculator, friend wrote a program to solve vector problems, I thought it was amazing.
- Used TI-Basic to solve specific physics problems for me. (Give fixed inputs, run through equation, get outputs)
- Approaching college, thinking about student loans.
- Heard about python and decided to give it a shot.
- Wrote program to simulate various payback / interest scenarios.
- Went to college for ME, switched to SE when I found out my dorm neighbors were using python to draw cool images with python + turtle
- Can you describe what the role of the application lifecycle team is and the story behind it?
- Story behind it:
- Around 2018, in a state where we had deploy congestion, challenging to iterate and ship changes. tech org invested in containerization and decoupling to directly combat this problem. Teams incentiviced to decouple.
- While on python platforms, the team had already been experimenting with code templating.
- Standard cookiecutter template for flask apps.
- Wayfair experimenting with Kubernetes late 2017.
- Spent 1 year embedding with 4 different teams to help knowledge transfer re: k8s, containers, application setup, python best practices, testing, linting, etc – through that we got a lot of great feedback on our tooling.
- Took senior engineers weeks to get something setup.
- Know who to contact, click the right buttons, file the right ticket
- Approach: Counted manual steps. Something like 60 distinct / atomic activities that had to be performed to get a "hello world" response from a basic flask app in production.
- Focus on reduce manual steps
- Released product (Mamba, on theme of snakes)
- Initially, supporting one main user story.
- User story: "As an engineer, I would like to create a production ready application in 10 minutes so that I can have a reliable and standardized application setup that follows best practices."
- grew out of python platforms, created own team with own scope, that was about 1.5 years ago.
- Story behind it:
- What is your team’s scope now?
- Team Scope is to facilitate the creation, maintenance, and decommissioning of decoupled applications at Wayfair.
- What are the interfaces that your team has to the rest of the organization?
- People Interfaces:
- We value getting feedback on our work to build strong products.
- Make assumptions, Willing to be wrong. Validate assumptions with customers.
- Software Interfaces:
- for mamba, CLI at first
- Backstage (open sourced from spotify)
- Lots of Github
- What is your method of determining what projects to work on?
- (See above). Known pain points. Intuition, Free day fridays. Being comfortable taking risk (using friday time). Vet solution with customers.
- How do you measure the impact of your work on the rest of the organization?
- We don’t force use of our products. Adoption of tooling.
- Number of microservices being spun up.
- Number of automated pull requests being created, merged.
- DORA metrics throughput (deployment frequency, lead time for changes) and stability (change failure rate, mean time to recovery)
- We don’t force use of our products. Adoption of tooling.
- What is the role of Python in your work?
- we use it and love it!
- existing skillset from incubation phase within python platforms
- right tool for the job
- lightweight automation
- hitting lots of APIs
- define lots of user facing specifications (json, yaml)
- pydantic has been great for creating descriptive, human and machine specifications.
- open source (we rely on it, we also have some presence)
- cookiecutter -> columbo
- gitpython -> pygitops
- we use it and love it!
- Can you tell me more about your application creation solution. Who can use it, and what does it actually do?
- Written in python, though it templates out code for any language.
- Runs automation to onboard an application to production
- git repo, build pipeline, calling out to various APIs to signal a new app is present
- Wayfair has a variety of applications (python, java, .net, php, javascript, some go)
- Team interested in integrating with our solution will create a github repository containing 1..* cookiecutter template(s)
- Provide a specification for what questions to ask users.
- Limitation with cookiecutter where the approach to ask questions isn’t dynamic. lack of validation.
- Pat Lannigan -> Columbo (open sourced). Python DSL to describe the set of questions to ask users.
- python fastapi application will have a completely different set of questions than a java library for example.
- You had mentioned that another part of your team scope is to facilitate the maintenance of applications. Can you tell me more about that?
- Reduce engineering toil around keeping applications up to date.
- Average engineer owns several, dozens of repos
- Create automated pull requests:
- Versioned dependencies (Renovate)
- Propagating platform changes (Gator)
- Ex1: python apps use "black" to format code and our python platform team would like to prescribe a line length. Our tooling can be used to declare desired changes. yaml specification -> pr automation at scale.
- Ex2: shared library, new version released, breaking interface change. Code instructions for performing AST manipulation and resolving breaking change for people.
- Shift from: "We need you to do this", "I am proactively letting you know that something needs to change, and I also made the change for you!"
- How do you actually go about creating automated pull requests?
- manual steps would involve cloning, checking out feature branch, applying code changes, staging / committing, pushing up branch, creating the PR
- gitpython is an existing and extremely powerful tool, but its api is fairly involved and (by design) doesn’t provide the type of high level abstractions that we need.
- created pygitops (open sourced), built completely on top of gitpython
- high level abstractions for the workflow I described.
- coolest / most pythonic part about it is the "feature branch" context manager.
- code changes are made in the context of a feature branch
- when you intentionally or accidentally leave the context of a feature branch, we want certain things to be true (default / main branch, clean workdir, no unstaged changes)
- when writing PR automation, don’t have to worry about this!
- Can you describe some of the more technical details about how your change propagation system (Gator) works?
- heavily inspired by kubernetes resource model (resources are defined via a declarative specification)
- Kubernetes itself ships with resources that implement behaviors of common resources (pods, services, etc)
- Gator’s execution model is broken up into two parts:
- what repos to act on (Source)
- what are the changes that need to be applied. (Output)
- Ex: Source to proxy github search. write github search query to get back list of repos
- Output to scan a repo for regex pattern at specified paths and replace with some fixed term. Very popular, engineers love find and replace.
- What are the most interesting, innovative, or unexpected ways that you have seen mamba / gator used?
- resource model of gator supports the idea of we don’t know, what we don’t know
- reference k8s, CRDs, resource model.
- container execution
- log4j identification and remidiation
- automate some of the work for identifying vulnerabilities
- java platform team was able to use java native tooling in the environment of their choosing to identify vulnerable apps.
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on application lifecycle concerns?
- What do you have planned for the future of application lifecycle management/developer experience improvements at Wayfair?
- Hope to start open sourcing interesting aspects of our change propagation tool (Gator)
- As someone who maintains many open source projects, or even at the enterprise level, we think that some of our patterns and approaches can be shared! yaml -> code changes
Keep In Touch
Picks
- Tobias
- Nocciolata hazelnut spread
- Joshua
Links
- pygitops
- columbo
- backstage
- renovate
- DORA metrics
- TI-84 Calculator
- TI BASIC
- Wayfair Python Platforms Team Podcast Episode
- Pydantic
- Helm
- PyUp
- GitPython
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. So now your modern data stack is set up. How is everyone going to find the data they need and understand it? Select Star is a data discovery platform that automatically analyzes and documents your data. For every table in select STAR, you can find out where the data originated, which dashboards are built on top of it, who's using it in the company and how they're using it, all the way down to the SQL queries. Best of all, it's simple to set up and easy for both engineering and operations teams to use.
With SelectStar's data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at pythonpodcast dotcom/selectstar. You'll also get a swag package when you continue on a paid plan. Your host as usual is Tobias Macy. And today, I'm interviewing Joshua Woodward about how the application life cycle team at Wayfair uses Python to accelerate application delivery and improve developer experience. So, Josh, can you start by introducing yourself? Sure, Tobias. My name is Josh Woodward. And for the past year, I've been managing the application life cycle team at Wayfair.
[00:02:04] Unknown:
Prior to that, I was an IC on the Python platform team. 1 of the really cool things that I got to do was embed with teams looking to decouple from our PHP monolith. 1 thing I really enjoyed about that is I got to see a lot of pain points from that experience firsthand, and that kind of folds directly into a lot of what I wanna talk about with you today. And do you remember how you first got introduced to Python? Yeah. It was pretty interesting. In my high school physics class, I, at the time, didn't know much about programming, but I had a TI 84 calculator, which is capable of writing TI basic programs. So I was amazed when a friend of mine wrote a program to solve vector problems. You know, input any amount of vectors with their angle and magnitude, and you can figure out the resultant vector. I thought that was really neat. So I then started using the TI basic to solve other pretty specific physics problems for me. It It was pretty limited and and rudimentary, but, you know, you'd give it a fixed set of inputs. It would run through a very, very specific equation. So I'd write, like, a program for each type of problem I wanted to solve. So I went to college. I was thinking about student loans. I heard about Python and decided to give it a shot. I wrote a really, really basic script to simulate various, you know, payback and interest scenarios.
It wasn't anything super complex. I think I just multiplied some numbers together and printed them out. But that was kind of the primer for when I went to college initially as a mechanical engineer. Within 3 weeks, I think I switched my major to software engineering. Some friends of mine down the hall were using Python to draw cool images for homework. They were using turtle in Python, so I thought that was really cool. And I was jealous they got to draw pictures for homework. So that was kind of all I needed to switch over. Yeah. The turtle module is definitely always fun to play around with, and it's a good way to get people interested.
[00:03:52] Unknown:
For sure. And so you mentioned that for the past year, you've been managing the application life cycle team at Wayfair. I'm wondering if you can describe what the role and context of that team is and some of the story behind its formation and responsibilities.
[00:04:09] Unknown:
Sure. I'll start by kind of giving the context in the background about how the team came to be. Around 2018 at Wayfair, we were in a state where we had a lot of deploy congestion. It was challenging for engineers to iterate and ship changes to our PHP monolith. Our tech org and our leadership had kind of invested in containerization and decoupling as a way to directly combat this problem. So we had already decided that we wanted to double down on decoupling from the monolith. So at that time, you know, teams were incentivized to do that. When I was on Python platforms, the team had already been experimenting with code templating. It was fairly basic. You know, there was a repository that housed a cookie cutter template, and folks interested in creating their own decoupled application could clone that repo and run the basic cookie cutter command to template out some code. And that was all that it did. So it kind of gave you the code itself, but left it kind of at that point.
When I spent about a year embedding with 4 different teams to kind of help knowledge transfer around setting up, you know, containers, the application, Python best practices, testing, linting, we got a lot of great feedback on our tooling. But 1 thing that stood out is it took even senior engineers with a lot of domain expertise weeks to get something set up. You know, you had to know exactly who to contact, how to click the right buttons, what ticket to file, and exactly how to fill it out. And we saw that as, like, a pretty big problem. So kind of the approach we took is we started by counting the manual steps that it actually took to get a very, very basic hello world rest API in our production environment. And to our surprise, it took about, like, 60 distinct steps, which was, like, pretty awful. Again, like, really experienced senior engineers would take weeks to get this set up. So without really having a focus on the product at first, we just had a focus on reducing the amount of manual steps. We knew that no matter what we did, we wanted to kind of drive that figure down. So I think I remember years ago, 1 of my kind of quarterly objectives was, can we take this number of 60 and reduce it to 40? So we kind of got started thinking about that. We ultimately released a product, which we call Mamba. It's on the theme of snakes, which is really cool.
And, initially, the application life cycle team, which at the time was incubating within PyFlats, supported 1 main user story, which was, you know, as an engineer, I would like to create a production ready application in 10 minutes so that I can have a reliable and standardized application setup that follows best practices. That team ultimately grew out of Python platforms. We created our own team with our own scope. At this point, that was about 1 and a half years ago. So
[00:06:48] Unknown:
summer 2020 at that time. Given the time since the initial development of the application life cycle team to where we are now, how has the scope and responsibilities changed or evolved and some of the ways that the team is integrated into the rest of the organization started to formulate and become more standardized?
[00:07:13] Unknown:
So when the team was created, we had the 1 solution to manage the application creation process, but we decided to carve out somewhat of an ambitious scope. And we decided that instead of just focusing on application creation, it made sense to deal with the entire life cycle of kind of abstractly managing a decoupled application. So not only did we want to support the creation of these apps, but we also recognized that even maintaining these apps and keeping them up to date and keeping them running in our production environment involved a lot of toil for our developers, so we wanted to support that as well. Also, we care about application decommissioning.
1 thing we wanted to avoid was having a bunch of things running in our production environment that didn't really have anything referencing it and weren't actually providing production value. So we felt that it was necessary to also spin down applications at the end of their life.
[00:08:03] Unknown:
As far as the sort of formalization of that life cycle where a lot of times you say, at the outset, I have the need to be able to build the service or product, but there's usually not a lot of thought that goes into the sunsetting aspect of a project. It may just stay alive indefinitely and start to become kind of Frankensteined with unrelated functionality just because it's already there. And I'm wondering what your responsibility is as far as being able to help maybe combat that kind of organic growth and helping to create that overall life cycle plan at the outset and just some of the ways that that is becoming standardized in the Wayfair organizational culture?
[00:08:50] Unknown:
I think of the 3 parts of our scope, decommissioning is where we have invested the least amount of effort. So right now, we have a self-service workflow where you can opt in to have your application decommissioned. We'll do some very, very basic stuff, you know, renaming your GitHub repo, spinning down your build pipelines, etcetera. The idea is you should be able to generate and decommission apps indefinitely using the same name and the same terminology. Though, we aren't doing anything super intelligent on the side of, you know, proactively identifying applications that would be candidates for decommissioning.
Though 1 project that we're working on now is kind of getting that insight into the various state of the applications that we have at Wayfair. So we think that insight will help us, you know, focus attention to things that require it. Given the fact that your
[00:09:40] Unknown:
mission is to be kind of an enabling team to the rest of the engineering group, I'm curious how you think about identifying what are the areas of greatest leverage that you can focus your efforts in identifying and scoping the types of projects that you want to work on, whether that's from external inputs from other engineering teams or visibility of the work that's in flight and maybe internal insights that you're building from working with the various stakeholders across the organization?
[00:10:11] Unknown:
Cool. Yeah. So I think 1 of the fundamental approaches we take is we're willing to make assumptions, but we're also willing to be wrong. So anytime we want to build a new solution or kind of experiment, we like to validate our assumptions with our customers. So we'll kind of take baby steps initially and make sure that what we set out to build is indeed being used and provides value. Part of this is knowing customer pain points. So the approach I described earlier where I went and embedded with different teams, that's something that I think has really helped us historically and currently for sure. Another thing that is interesting about the way we work is we have a concept called free day Fridays, which is just about what it sounds like, where on Fridays, engineers are encouraged to spend time solving problems that they really wouldn't get to solve in their, you know, otherwise day to day work, the work that's, you know, fulfilling objectives and key results. That's where our innovation happens. You know, we're comfortable taking risks and using that Friday time to do some things that are a little bit out of the box even if they're thrown away. These solutions that I wanna talk to you about today all came from that 3 day Friday time where we got to experiment.
People ended up liking kind of the the stuff that we were building, and we just took it from there. So we never once said, we're gonna build and release this new product, and it's gonna be great. We were always, you know, solving a problem or kind of, like, building off of innovation
[00:11:35] Unknown:
that we had kind of landed on. Before we dig into the specific projects, I'm also interested in understanding how you approach the kind of marketing of the capabilities that you're providing to the rest of the organization because Wayfair is a fairly large company. My understanding is that there are a number of different physical locations, and I'm sure that over the past couple of years, a lot of your work has been done remotely. And so just the overall challenge of raising awareness of the work that you're doing so that other teams can take advantage of it and some of the ways that you measure the overall impact that you're having on the kind of developer velocity of the engineering group at Wayfair?
[00:12:16] Unknown:
Definitely. Yeah. Wayfair is a pretty big place. There's about 3, 000 plus engineers. So we do some level of broad marketing. You know, we'll send out what we call release notes when something is ready to be announced publicly, but release notes certainly aren't the main contributor to adoption. 1 of the things we focus on is selling our products and our solutions to our, early adopters. So we really love working with people that love working with us. So the people that are giving us positive feedback and helping make our product better, we really enjoy working with them. And those people end up being our promoters. So we try to find, you know, the interesting problems that need to be solved and kinda do our homework to vet that these are indeed problems that need to be solved. But we'll we just try to build a strong product and close collaboration with these early adopters. And, you know, by the time we have a mature product, it's already being marketed for us. So 1 thing about our work is that we don't force people to use our products. 1 of the ways we measure success is by adoption of our tooling. So like I mentioned, we'll market things to the early adopter category and from there, see kind of where things land. So if we see that traffic to a product or net usage is increasing organically, that's usually a good sign. Early on, we cared about looking at, you know, just the net number of microservices being spun up because we are heavily involved in the application creation space. We kind of viewed more is better, which is kind of narrow minded. But, like, I think when we're spinning up that new product and decoupling, that was the way to go. A lot of what we do also has to do with automated pull requests. Like, a lot of our work has to do with that. So we will just measure the wrong number of automated pull requests being created and merged. Again, you know, more automated pull requests being merged isn't necessarily a good thing. So all of our work, we do try to tie to what are called the DORA metrics.
Those are 4 distinct measurements that, Wayfair instruments and other, you know, enterprises do as well, and they have to do a throughput. So deployment frequency and lead time for changes as well as stability, which is change failure rate and mean time to recovery. So the idea is, like, if virtual link can help deploy frequency increase and lead time for changes go down, I think we're doing our job well. Given the fact that the
[00:14:34] Unknown:
initial core software capability of Wayfair is built around this PHP monolith. I'm curious if you can talk to the role that Python has in the work that you're doing and some of the ways that you think about the selection of language and ecosystem and tooling for being able to provide to other members of the engineering org?
[00:14:56] Unknown:
So within our team, we prefer to use Python for a couple of reasons. 1 of the main reasons was just based on history. We do have that existing skill set from our incubation phase within Python platforms. But, also, we do think it's the right tool for the job that we're doing. We run a lot of lightweight automation. We're hitting a lot of APIs. We define a lot of user facing specifications in order to integrate with our tools. So we just find Python to be really, really easy to use for what we're trying to do. An example is something like Pydantic has been fantastic for creating descriptive human and machine readable specifications.
So a lot of the way we interface from a technical standpoint, with our customers is through YAML specifications, and I can talk more about that later. But Pydantic has made it really, really easy to version these specifications and enforce, you know, different validations. That's stuff that, sure, would be possible in other programming languages, but we just find it really, really easy to do with Python.
[00:15:55] Unknown:
In terms of the actual specific tools and work that you're doing, 1 of the things that you mentioned is simplifying the process of being able to spin up a new application and get it into production. I'm wondering if you can talk to the way that you have built that system and some of the approach that you've taken to be able to reduce those manual steps and automate as much as possible while still being understandable and maintainable for the consumers?
[00:16:23] Unknown:
So 1 thing that's really neat about this is although the system itself is written in Python, it can template out code for any language. So Wayfair supports Python, PHP, Java, dot net, JavaScript, some Go. So there's just a lot of variety out there, and we don't see ourselves as being opinionated at all as far as, like, what the right tooling choices, and we want to enable all sorts of solutions to be built. So our system generally is 2 phases. It will both template out code according to templates and run automation to onboard an application to production. So that's things like creating a GitHub repository, creating a build pipeline, calling out to various APIs to signal that a new app is present, stuff like that. There's a lot of, like, some proprietary stuff, but just things that developers would have to go through in order to, you know, stand their thing up. So any team interested in integrating with our solution will create a GitHub repository containing 1 to many cookie cutter templates.
So it is using cookie cutter under the hood. 1 of the cool things is on top of that, they will provide a specification for what questions to ask users. So with cookie cutter, generally, you'll provide a JSON payload describing what questions you want asked and what their default values are. For us, there was a little bit of a limitation where we want our question asking approach to be dynamic. An example is a Python fast API application will have a completely different set of questions that we wanna ask our users than a Java library, for example. Additionally, if you are building a new application and you want to use a database or have an integration with a messaging application such as Kafka or Google Pub Sub, and you say, yes. I want that thing. We may have an entirely different set of questions we wanna ask in addition. So 1 of my teammates, Pat Lanigan, wrote an open source library called Colombo.
And what that offers our template designers is basically a Python DSL to describe the set of questions to ask users. It allows them to determine whether or not a question should be asked in the 1st place based on previous answers. So there's the dynamism. It allows for a rich validation, default values, and a lot of different types of questions that can be asked as well. So you might just wanna echo some text. You might want a Boolean value. You might want a multi choice, those types of things.
[00:18:45] Unknown:
And as far as the templating aspect and being able to multipath the questions that are being asked. I'm wondering if you can talk to some of the ways that you and your team have had to educate other members of the org about how to think about either building out their own templates or some of the edge cases that they need to be aware of or, you know, maybe integrating language ecosystem best practices into these templates to be able to make sure that they're as useful as possible and that you don't, you know, generate a bunch of boilerplate, and then the end user then has to go and change, you know, 20% of what's there? Yeah. I think that's a great question. It puts our team in a very, very interesting position where we aren't super opinionated about the content of the templates themselves
[00:19:34] Unknown:
and want to enable teams that are interested in creating project templates to do so. We don't have too much insight or standards in terms of what the actual content of the generated application is, though we do have ways of verifying that templates are doing what they're supposed to do. So the template repositories themselves, we know where these template designers will create their logic for their templates. We have enabled different build steps to basically verify, hey. For all of these templates that you have and all of these questions that you're going to ask, we wanna make sure that at a minimum, we are capable of successfully running the cookie cutter command to actually generate out a new application.
And there are some various things that we might wanna check and assert on your application. If these base set of expectations aren't true, we will fail your build. A really, really high level example is Wafer is undergoing a migration from on premise data centers to Google Cloud Platform. And in 1 of our manifest files where we describe the data center to deploy containers to, you can imagine, because we're at such a high leverage point in the application creation process, we probably want to assert that all templates now are, at a minimum, deploying new applications to GCP and not our old on premise data centers. So those are the types of opinions we will hold, and we will hold them strongly. But aside from that, it's very much on the implementers to maintain standards.
We've found it to be, like, fairly successful so far where the teams that own these templates are incentivized internally to have these best practices in mind. So that hasn't really been a problem for us. Though if something out in the wild is found as an improvement to be made, we generally fold it right back into the templates.
[00:21:22] Unknown:
1 of the always interesting and challenging aspects of templating applications from a cookie cutter is that at the time that they're created, they represent the best practice and, you know, maybe they have the updated dependencies in the generated lock file, for instance. But as time goes on and bit rod sets in and the language ecosystem evolves, you might go back to that template and say, okay. You know, we used to use unit test and knows. Now we wanna use Pytest, and, you know, now we need to update the version of Django or Flask or what have you in the cookie cutter so that when you start a new project, you're starting at the latest version. And then being able to bring that forward into applications that were previously generated from that template. I know, for instance, there's the copier project that is designed to be able to handle some of that capability. There are some add ons to the cookie cutter ecosystem to be able to handle that aspect. And I'm just wondering how you think about the ongoing maintenance of those templates for being able to bring in those upgrades to dependencies or upgrades to best practices or the organizational standards, and then being able to alert teams who previously used that template to those changes to determine when and how to incorporate those updates?
[00:22:45] Unknown:
Yeah. It's a really, really hard problem. Not just hard, but, like, borderline impossible when you're dealing with the skill that Wayfair does because the diff that you have between a template and applications that exist out in the wild can be anything. We don't know really what the state of all the different applications are. Presents a really big challenge. So that actually kind of folds into the second portion of our team's scope, which is application maintenance. I think our stance is we acknowledge that over time, applications are going to be out of date with the latest and greatest standards. And an app generated today might look great, but 3 years from now might not be reflective of, you know, the desired tooling or standards or even dependencies. So I would say, like, instead of trying to solve the technical problem from a template standpoint, you know, hey. This template's evolving, and let's fold these changes indirectly.
We've taken a little bit of a different approach where through different pieces of tooling that create automated pull requests,
[00:23:45] Unknown:
we use that to try to keep applications up to date. I can tell you a little bit more about that. Yeah. Definitely interested in exploring some of that kind of automated maintenance aspect and some of the ways that you're able to introspect the current structure of the project and the dependencies and understand what types of modifications are useful to make, which ones are safe to make, and just being able to hook that into the overall ongoing life cycle and maintenance of the systems that you're supporting?
[00:24:14] Unknown:
Totally. I'll break up the application maintenance space into 2 broad categories. 1 of them is very easy to describe and another is less easy to describe. First 1, I'll just call versioned dependencies. So if you have an application, Python application, you might rely on Python packages. You might rely on some base docker image. If you're using Kubernetes and a technology such as Helm, you are relying on Helm charts potentially. And if you have a build system like we use, you know, that may have versioned plugins as well. So managing those versioned dependencies is 1 of the problems that we've addressed over the past year and a half.
We started off using a variety of in house solutions, but are now favoring an open source solution called Renovate, which is able to handle all of the things I described above. It's highly configurable, and it allows application engineers to really, really describe the way that they want to manage their versioned dependencies. So our team takes direct responsibility for helping application owners keep their dependencies up to date, and that's kind of 1 half of the application maintenance
[00:25:21] Unknown:
problem. And so the dependency management aspect of it, you know, definitely seen a few different tools that approach that. So there's the Dependabot project that was acquired by GitHub. There are language specific aspects. I have come across Renovate, but haven't used it specifically. So I'm maybe interested in understanding some of the benefits that that provides of its stance as being a multilanguage ecosystem tool and some of the ways that it's able to work across languages and maybe some of the capabilities that would be nice to have from some of these more focused solutions such as the PyUp bot or something like that? The way View Renovate and its advantages over other tooling that we've used before is renovate is very, very, very configurable.
[00:26:06] Unknown:
So we don't have to build our own configuration and kind of reinvent the wheel. So we just get to ask people like, hey. Listen. You know, RenovA open source has really great docs on how to set up auto merging and how to, you know, write your own plug ins for managing various different types of Nuance to dependencies that maybe renovate doesn't handle out of the box. Certainly, that's an advantage there where it's, like, very, very flexible. Another interesting aspect of it is the way that renovate actually ships is through a docker image. So I believe it's a docker image and or, like, an NPM package. So the way that we actually because Renovate is open source and not a vendored solution, we have to kind of stand it up and run it in house. From a technical standpoint, the way that we actually do that is very, very straightforward. Whereas other tools, you know, instead of shipping a Docker image or an NPM package, it's more of an SDK where we have to stitch everything together and kind of reinvent the wheel config wise and figure out how to expose different things to users. We just don't have to solve a lot of those problems with renovate. So for those reasons, it's certainly preferred.
Digression. I don't wanna flame dependabot. We use dependabot at Wayfair. We don't use it in GitHub, but there's open source Dependabot core. And it's basically a Ruby SDK. So we've had to, like, stitch together a bunch of, like, weird janky Ruby stuff and write our own Ruby scripts to figure out how to do this. We've also had to literally reinvent the wheel config wise to, like, figure out how do we take Dependabot's config that they've talked about and then, like, expose that to our users in a way that is compatible with the way that we've implemented the, like, Ruby SDK. So I just didn't wanna speak ill of dependent about, but that's, like, what we're going through. Fair enough. K.
[00:27:49] Unknown:
And the other aspect of the kind of automated maintenance that you mentioned is beyond just keeping the dependencies up to date. There's also the question of maybe doing vulnerability and security scanning, linting, maybe adding in some fitness functions for enforcing different architectural aspects of the application. I'm wondering if you can talk to some of the ways that you have worked with teams to automate some of that and some of the tools or internal capabilities that you've developed as a result.
[00:28:20] Unknown:
Absolutely. Going back to the question that you're asking before about, you know, diff when an app is created from a template and time passes, We realized pretty early on that that was a big problem for us, and we wanted to address it. So the way we view that is we just wanted a kind of general solution for creating pull requests at scale and basically creating a platform that allowed various engineers at Wayfair to describe changes that they wanted to make at scale. So we have an in house solution, called Gator that we've built, which is really, really neat. An example of something you might wanna do with Gator is imagine you're the Python platform team at Wayfair and Python apps are using the black tool to format code. And our Python platform team, for some reason, this wouldn't actually happen, but they're opinionated about the line length that they would like all Python apps to use. Our tooling enables them to make declarations over, hey. I want to find all Python apps. I want to search in a specific file such as like a pyproject.toml, and I might wanna do something like a modification, like a regex replace to look for, you know, the line length of 88 and replace it with 120.
That's fairly basic. But with a couple lines of YAML, we enable the Python platform team to make that type of change at scale. Another more complex example would be, imagine you own a shared library that many, many applications are using. A new version of it is released, but it includes a breaking interface change. Now, historically, you know, that would happen, and we'd ask people to, hey. Please use this new version of this new library, and you have a month to do it. And, obviously, people get upset about that. And when they inevitably don't do it, things break. So to support the shift from we need you to do this to, hey. I'm proactively letting you know that something needs to change, and I made the change for you. We, through the system, allow for very, very complex code manipulation to happen.
We basically allow you to run your own container in your own environment that describes exactly the type of code modification you wanna do. So our Python platform team has gone as far as performing AST manipulation to resolve the breaking change for people. So, you know, the YAML config will reference a Docker container where they've written these instructions up. That's how it integrates with our system. Those are the types of things that are being done with our system. Been very, very helpful on a variety of use cases, things that we wouldn't have expected are being done with it, and it's really neat.
[00:30:53] Unknown:
On the AST manipulation and code restructuring aspect of it. I know that in Python, there are a lot of capabilities built into the AST and pretty similarly with some of the other dynamic languages. I'm curious if there are any language environments where you have run up against complexities or limitations in the kind of introspectability of the software to be able to automate some of these changes and you've had to fall back on other approaches?
[00:31:19] Unknown:
You know, I think 1 of the benefits of being where we are in terms of running this platform is we don't need to know about those details. So we haven't really dug into those. It's kind of like, hey. We'll give you the tools to do whatever you need to do. And if you can't, that's kind of on you to figure out. So, you know, we're not necessarily privy to all of those details. I think, like, within different teams, they're trying to figure out how to do some complex stuff. But I'm kind of unaware of all of that, which I think is kind of a good thing in this case. So we've taken the approach of being as unopinionated as possible with our tooling and and letting people do what they wanna do. To the point that you're making about being able to
[00:31:57] Unknown:
automatically make pull requests and suggest changes to either update dependencies or improve code structures or introduce bug fixes. What is your approach for being able to actually automate the creation and management of those pull requests across such a large number of repositories?
[00:32:17] Unknown:
Yeah. It's an interesting problem. We use the Git Python library, which is an extremely powerful tool. Basically, sits on top of Git and allows you to do anything that Git does. Its API is fairly involved and, by design, doesn't provide the type of high level abstractions we need. Like, if you think about how a human will create a pull request, they'll likely clone a repo, check out a future branch, apply some changes, stage commit, push up the branch, and create the PR. So we thought it made sense to have kind of high level abstractions to mirror that workflow.
We have a couple systems at Wayfair that do create automated PRs. So it made sense for us to first, you know, create those within 1 system and then abstract it out into a library that we could reuse. We ended up creating an open source library called Pygitops. Again, it's built completely on top of Git Python, which we think is an amazing library. It just kind of adds some stuff on top of it. And, again, it's a high level abstraction for the workflow I described. 1 of the coolest parts about it that I think is, like, the most Pythonic part that I'm really in love with is the feature branch context manager.
So, again, as a human, if you are making code changes that ultimately get pushed up, you're likely going to be making those changes in the context of a future branch. So, you know, when you intentionally or accidentally leave the context of a feature branch, we always want certain things to be true. We always want you to be on the default or, you know, main branch of your repo. We want a clean work dir. We want no unstaged changes, etcetera. This is important because when you're creating many, many feature branches and PRs and against many repos kind of all at once in batch, we wanna make sure that the git state of that repository is clean always. We've had some interesting edge cases where changes from 1 PR get pulled into another PR, or we have Git just completely barf and break things because, you know, we had an unstaged change when we check out a future branch or something crazy like that. And we built a lot of those learnings and edge cases into Py GitOps.
You know, it's become more stable over time. When you write PR automation, you generally don't have to worry about those edge cases anymore as a result. PygidOps has 100% unit test coverage. And it's not just a lot of, like, mock unit tests. Because of the way Git works and because a remote repo can just be another directory, we get to use through pytest the temp path fixture in our test cases. So we'll start our test cases by, you know, creating a remote repo on disk. We'll clone it. We'll do some stuff to it, and we'll make a lot of assertions on really a live repo. So we're not really faking anything. We're making sure that things work in a sensible way, which I think is a really, really powerful part about Git, Git Python, Python itself, Pytest, all the above.
[00:35:06] Unknown:
1 of the things that's always interesting to explore is the philosophy around when and how to open source projects when building specifically for internal consumption. And I'm curious if you can talk to how you think about which tools are worth open sourcing, whether you bias towards building the tools that you're supporting the rest of the organization with in a way that it can be released publicly. And if you generally approach the sort of open sourcing of the tools up front or if you build the tool first and then decide, okay. This can be extracted. And then just some of the process that's involved in actually sanitizing it and updating it so that it is more broadly applicable and not tied to your organizational assumptions?
Yeah. Great question. I think
[00:35:56] Unknown:
in both of the open source libraries I referenced, there's a little bit of a different story. So I will say with Colombo, which again is the library that allows for specification of questions being asked in the the kind of entire workflow, the teammate of mine that wrote that, Patrick Lanigan, I think had a more intentional approach from the beginning where he had a good vision for the project itself, knew that it would be a good candidate for open source. And that came after, you know, doing a lot of research on solutions that were available, didn't do exactly what we needed it to do. And so because it led to us having to create a custom solution, I think part of his idea was giving back to the community from the very beginning and and having it be open sourced.
The story with PygadOps is a little bit different. Initially, we have Git automation that touches just about every project that we run in a variety of ways. So it definitely started off, you know, 2 years ago as how do we even do this in the first place? Like, how do you actually technically make this possible? So So it was very scrappy, very, very iterative. We didn't have kind of design in mind, and so it was just a whole mess of various things that we needed to do. And it wasn't until we had system number 2 that wanted to do a very similar thing where I I had even considered abstracting things out. So we then took the step of of kind of, like, making a little bit of a cleaner API, creating a library that was reusable. And by the time we had the 3rd or 4th system actually using Py GitOps, we thought that it would be appropriate to open source just because, you know, we really like Git Python, and we wanted to, like, give it credit and call out the accomplishments of Git Python where possible, but also provide a nice abstraction for other folks looking to create TRs.
[00:37:44] Unknown:
And in terms of your work that you've done with the engineering group at Wayfair and some of the tools that you have built to enable their application delivery and life cycle management and improve the overall developer experience and workflow? What are some of the most interesting or innovative or unexpected ways that you have seen the different tools that you've built used and maybe some of the interesting approaches that you have developed internally to be able to speed up the time to delivery for these different applications?
[00:38:19] Unknown:
I think this answers your question in a way. 1 of the really, really interesting things that happened in recent history, if you're familiar with the Log 4 j issue that happened back in December, the change propagation system I told you about earlier, Gator, has a flexible resource model, and it kind of supports the notion of, like, we don't know what we don't know. So it allows people to very abstractly, you know, target repos and then run some outputs against them. So 1 of the really, really neat things is that Gator was actually used in the remediation of Log 4J where, as I mentioned, we allow for, you know, implementers of these Gator change sets to call out to a container. So we actually had a Java platform engineers writing some bash scripts to detect whether or not our repo had a log4j vulnerability, and that was able to be plugged into our system. The output of all that was we were able to create GitHub issues on these repos to kind of flag them to our internal ops teams. And, that was a pretty interesting way that our tooling was used.
[00:39:21] Unknown:
In your own experience of working on this team and helping to figure out how to reduce friction in the process from going from idea to delivery? What are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:39:37] Unknown:
1 of the lessons that I've learned along the way that I think is applicable to, you know, most of software engineering, especially when you're doing a platform work for a large enterprise, it goes back to the fusion of innovation or the adoption curve that I talked about earlier where, you know, now we know very well to target the early majority, and we know that that's a formula that works for us because of all the reasons I described earlier. 1 hard lesson along the way was kind of this expectation that I had that I'd built a thing and then everyone would immediately wanna use it. Even in recent history, you know, maybe, like, a year and a half ago, I built a solution that I just kind of assume that people would wanna use. Some people were really excited about it, and that was great and reassuring. But it was kind of a letdown when not everyone wanted to adopt it right away. You know, when you think about it and kind of think about things in the context of that adoption curve, it makes total sense. You know, you're marketing to a huge set of people with vastly different concerns and sets of priorities and tolerances for risk. That was definitely a challenge kind of, like, identifying that as, like, a tough pill to swallow. But I like us thinking about our work in kind of this new format where we market to people that wanna work with us. And then from there, you know, things kind of spread out organically. So it's okay if, you know, it takes a year for a solution to reach a different engineer that that wouldn't otherwise adopted immediately. Like, we're totally okay with that. As you continue to work with the various teams at Wayfair and
[00:41:05] Unknown:
manage your own team's concerns, what are some of the things you have planned for the near to medium term that you're excited to dig into?
[00:41:14] Unknown:
Yeah. I think the focus for us for the rest of the year is continuing to do what we're doing and supporting the same products we've been supporting, but with an eye for how it fits into our larger platform. So in other words, we've built some useful stuff that has, you know, smoothed out developer experience and helped reduce toil. And now it's a matter of folding these into other products in a way that's very, very seamless. So if you're a wafer engineer, you don't have to understand, you know, product a, b, and c and tool x, y, z. You're just using, you know, Wafer's platform, and the stuff that we've talked about today is just a part of it. So that's pretty vague and abstract, but a lot of it is, you know, getting our products and our tooling in the direction where it fits more cohesively with with surrounding tools. Are there any other aspects of the work that you're doing on the application life cycle team at Wayfair that we didn't discuss yet that you'd like to cover before we close out the show? Yeah. 1 thing we didn't talk about yet is the change propagation system, which we call Gator. I think it's an interesting thing to consider for open sourcing. The tool itself in the entire product is quite complex, and there's a lot of, you know, wafer specific stuff involved there.
Though the core technology, which, again, very similar to Kubernetes resource model, allows you to specify abstractly ways to grab repos that we care about and what you actually want to, you know, run against them in terms of code changes that automate PRs. I think that that kind of central piece is an interesting part of, you know, the innovation that we've done with Gator that I think would be a good candidate for open source. We've had feedback from others in the Wafer Engineering Org that, you know, maintain open source projects of their own on public GitHub and would be interested in running Gator against those repos on GitHub. And so we think that, you know, if we were to open source, even a very, very minimalistic way to replicate kind of part of what we're doing here, then individuals may run it against their own repos, against their pet projects, and enterprises may even adopt it. We have to be thoughtful about, like, the approach we take there because, again, I think there's a lot of technical challenges with ripping out this central piece. But I think it could be done, and I think that would be a really, really interesting thing for us to consider. So I hope to be able to give you an update in the future, but that may be something to look forward to.
[00:43:37] Unknown:
Alright. Well, for anybody who wants to get in touch with you, I'll have you add your preferred contact information to the show notes. And so with that, we'll move us into the picks. This week, I'm going to choose something called nuchiolata, which is hazelnut spread. It's very similar to the Nutella product that more people will be familiar with. Just a very tasty treat. Just spread it on a piece of toast for a snack. Just add some of that before this show. So definitely always great to add to whatever you're eating. So with that, I'll pass it to you, Josh. What do you have for picks this week? Sure. So I love simulation video games. 1 game that I really like playing on my PC is called Cities Skylines, so I do want to
[00:44:16] Unknown:
recommend that. More specifically, though, I've been watching a YouTuber called City Planner Plays. It's a guy who actually is a city planner by profession, but he also loves simulation games. He has a series called Verde Beach, and it's, like, I think, 75 parts by now. He's been doing it for the past 2 years, but I've been watching him iteratively build a city. He, of course, pays a ton of attention to detail
[00:44:41] Unknown:
and has an eye for all of the things that he considers in his job as a city planner. So it's really, really interesting to watch, and so I recommend that. That's a fun thing to follow along. Yeah. It's definitely interesting to think about how accurate the simulation responds to the real world city planning considerations that he's adding to it. For sure. Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing at Wayfair to help your application teams build and deliver faster. So it's definitely always interesting to think about the platform aspects of being able to help developers do their jobs. So I appreciate all of the time and energy you put into that and the open source projects that you've released out of that effort, and I hope you enjoy the rest of your day. Cool. Well, thanks for chatting with me, Tobias. Really appreciate it.
Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com for the latest on modern data management. And visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Messages
Interview with Joshua Woodward Begins
Joshua's Introduction to Python
Role and Context of the Application Life Cycle Team at Wayfair
Evolution and Integration of the Application Life Cycle Team
Identifying Areas of Greatest Leverage
Role of Python in Wayfair's Engineering
Simplifying Application Deployment
Educating Teams on Template Creation
Ongoing Maintenance of Templates
Automated Maintenance and Dependency Management
Automated Pull Requests and Code Changes
Automating Pull Requests Across Repositories
Open Sourcing Internal Tools
Interesting Uses of Built Tools
Lessons Learned in Reducing Friction
Future Plans for the Application Life Cycle Team
Potential Open Sourcing of Gator
Closing Remarks and Picks