Summary
The way that applications are being built and delivered has changed dramatically in recent years with the growing trend toward cloud native software. As part of this movement toward the infrastructure and orchestration that powers your project being defined in software, a new approach to operations is gaining prominence. Commonly called GitOps, the main principle is that all of your automation code lives in version control and is executed automatically as changes are merged. In this episode Victor Farcic shares details on how that workflow brings together developers and operations engineers, the challenges that it poses, and how it influences the architecture of your software. This was an interesting look at an emerging pattern in the development and release cycle of modern applications.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Tree Schema is a data catalog that is making metadata management accessible to everyone. With Tree Schema you can create your data catalog and have it fully populated in under five minutes when using one of the many automated adapters that can connect directly to your data stores. Tree Schema includes essential cataloging features such as first class support for both tabular and unstructured data, data lineage, rich text documentation, asset tagging and more. Built from the ground up with a focus on the intersection of people and data, your entire team will find it easier to foster collaboration around your data. With the most transparent pricing in the industry – $99/mo for your entire company – and a money-back guarantee for excellent service, you’ll love Tree Schema as much as you love your data. Go to pythonpodcast.com/treeschema today to get your first month free, and mention this podcast to get %50 off your first three months after the trial.
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to pythonpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!
- Your host as usual is Tobias Macey and today I’m interviewing Victor Farcic about using GitOps practices to manage your application and your infrastructure in the same workflow
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by giving an overview of what GitOps is?
- What are the architectural or design elements that developers need to incorporate to make their applications work well in a GitOps workflow?
- What are some of the tools that facilitate a GitOps approach to managing applications and their target environments?
- What are some useful strategies for managing local developer environments to maintain parity with how production deployments are architected?
- As developers acquire more resonsibility for building the automation to provision the production environment for their applications, what are some of the operations principles that they need to understand?
- What are some of the development principles that operators and systems administrators need to acquire to be effective in contributing to an environment that is managed by GitOps?
- What are the areas for collaboration and dividing lines of responsibility between developers and platform engineers in a GitOps environment?
- Beyond the application development and deployment, what are some of the additional concerns that need to be built into an application in order for it to be manageable and maintainable once it is in production?
- What are some of the organizational principles that contribute to a successful implementation of GitOps?
- What are some of the most interesting, innovative, or unexpected ways that you have seen GitOps employed?
- What have you found to be the most challenging aspects of creating a scalable and maintainable GitOps practice?
- When is GitOps the wrong choice, and what are the alternatives?
- What resources do you recommend for anyone who wants to dig deeper into this subject?
Keep In Touch
Picks
- Tobias
- Victor
Links
- GitOps
- CodeFresh
- Kubernetes
- DevOps Paradox Podcast
- Perl
- Cloud Native
- ArgoCD
- Flux
- Observability
- Prometheus
- Helm
- KNative
- MiniKube
- Viktor’s Udemy Books and Courses
- Viktor’s YouTube channel
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to python podcast.com/linode today. That's l I n o d e, and get a $60 credit to try out our Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.
Tree schema is a data catalog that is making metadata management accessible to everyone. With TreeSchema, you can create your data catalog and have it fully populated in under 5 minutes when using 1 of the many automated adapters that can connect directly to your data stores. Tree schema includes essential cataloging features such as first class support for both tabular and unstructured data, data lineage, rich text documentation, asset tagging, and more. Built from the ground up with a focus on the intersection of people and data, your entire team will find it easier to foster collaboration around your data. With the most transparent pricing in the industry at $99 a month for your entire company and a money back guarantee for excellent service, you'll love tree schema as much as you love your data. Go to python podcast.com/treeschema today to get your 1st month free and mention this podcast to get 50% off your 1st 3 months after the trial.
Your host as usual is Tobias Macy. And today, I'm interviewing Victor Farczuk about using GitOps practices to manage your application and your infrastructure in the same workflow. So, Viktor, can you start by introducing yourself?
[00:01:58] Unknown:
Yes. So as you already heard, my name is Victor. I worked for Codefresh Codefresh for maybe 3 weeks now. I've recently changed companies. My kind of focus is the same no matter where I am, and that's usually around making myself obsolete. A lot of automation, a lot of CICD for a while now, containers, Kubernetes, and GitHub. GitHub is the is the thing that keeps my attention right now. I tend to change what I do very often. So today, it's GitOps, and we'll see about next month.
[00:02:32] Unknown:
And, also, when I was getting ready for this interview, I noticed that you have a podcast of your own.
[00:02:38] Unknown:
Yes. Yes. Me and colleague or ex colleague, Darren, are having a devil's paradox podcast where we just speak usually about random things. It's occasionally, we have guests, but more often than not, it's about, hey. What were we doing this week? This or that. Right? And then completely unscripted. Without preparation, we just talk about stuff that we like.
[00:03:02] Unknown:
And do you remember when you first got introduced to Python?
[00:03:05] Unknown:
Yes. That was, oh, that was a long time ago. Maybe 10 years, 15, maybe 20. I don't know. I'm old. I was introduced to Python when I started reaching, you know, 100 lines in my bash scripts. So my first contact was not really creating real application, but, hey, I cannot really live with hundreds of lines of bash scripts. And then I started with Perl and for that specific use cases, and that was a nightmare. And then I discovered Python and, hey, this is a good use case. And after that, I did a lot of well, I spent some time with Python. Mostly, I think, for test cases, writing test code. Right?
I don't, to be honest, I don't remember the frameworks I was using back then, but, yeah, it was mostly for scripting and testing.
[00:03:55] Unknown:
As you mentioned, now a lot of your attention is on the practices and tooling around GitOps. So I'm wondering if you can just give a bit of an overview about what that term means to you and some of the ways that it manifests in the application life cycle.
[00:04:10] Unknown:
Maybe I should start with saying that I don't see GitOps as being a new practice. Right? I think it existed for a long time. It's just that now we have a name for that, and that name is GitOps. The basic premise is that everything we do is code. Right? Some of us moved away, and some are moving still away from the concept of clicking buttons, going to some UIs to change the state of something, to run builds or what's on it. Right? Everything is is code no matter what kind of code that is. And the natural place, natural location for code is git. Right? I mean, it could be just as well be called SVN ops if it was invented earlier, the term. So it's about storing everything in code and especially in the context of GitOps, the desired state of something. Right? So we define what we want.
Those definitions are in git, and that's where our work stops. And from that moment, the moment we push something to the main line, we are finished and then machines take over. Right? And machines should do whatever they need to do depending on what we pushed. Right? It can could be building a binary. It could be creating a release, or it could be converging the actual state of production environment into the desired state, which is what is defined in Git. So I see it as a maybe as a wall, as a border that defines where human work stops and where machines start. Right?
We end our work by pushing something to Git, and machines take over from there.
[00:05:51] Unknown:
And in terms of the ways that guides the design and implementation of our applications and our infrastructure, what are some of the design patterns or architectural changes that we should be thinking about and that developers need to incorporate to make their applications work well and that type of workflow?
[00:06:11] Unknown:
It's good that you said work well because, theoretically, everything can be can follow GitOps principles. But what we obviously see is that GitOps, as a moment, at least, is relatively new, and it's very much focused. Or it's a reaction in a way based on some other best practices. And those are, you know, being cloud native. Having smaller applications is better than having big applications simply because the cycle can be faster and the machines can have easier time to figure out what needs to be done if the changes are smaller. So smaller applications, faster iterations, and fast when I say faster iteration, I mean, committing or pushing or merging to the main line more often.
And also designing code applications in a way that the job of those machines can be easier. Like, if machines should decide when to scale up and scale down your application, you might need to consider certain make certain architectural choices, like, I don't know, make it stateless, for example, moving state to something like Redis or some other data store and so on and so forth. I think it's probably the best guidelines, which is not specific to GitOps, but in general and then very well applies to GitOps is maybe to follow 12 factor apps principles.
I think that they initiated the whole movement around new architecture, and that new architecture, which is not new anymore, fits very well into into GitHub's principles.
[00:07:41] Unknown:
You mentioned cloud native in there, which is a new term that's been gaining a lot of popularity in recent years with the rise of Kubernetes and serverless and microservices. So because of the capabilities that those provide, that has helped to push the movement towards smaller applications because the theory being that it reduces the operational overhead that comes along with microservices, which is 1 of the arguments against that design pattern. So I'm wondering what your experience has been in terms of what tools help to facilitate a GitOps approach to managing applications and their target environments and how that has changed in recent years to drive this rising popularity of GitOps as a recommended approach to managing applications in their life cycles?
[00:08:35] Unknown:
If I would focus only on deployment part of GitOps, which is usually the part that people are mostly interested in, there are others, definitely. I think that containers are definitely helpful, and using containers today means using Kubernetes in 1 form or another. We might go for serverless, but that would be a separate fight or a discussion. Anyways, containers, Kubernetes and what was happening, I think, last couple of years is that we were trying to adopt that those principles through push models. Like, okay. So you push a change to Git, and then git pushes notification to your CICD tool, whichever you're using, that would do a lot of things and among others, take those new definitions of what an environment means, what is, let's say, release of this app what is the tag of this application should be running, and then applying those changing, like with KubeKettle or Helm install or what's not.
What is happening last months, years even actually, is that we are getting new tools that are focused much more on a pull method, Meaning that Git is not notifying anymore anyone when there is a change. There is no CICD tool that would deploy that change to your cluster. Instead, we are installing very small agents inside of clusters that would monitor your Git repositories. And whenever they detect a drift to change, something that is different between the actual state and the desired state would act. Right? So an example of such tools would be Argo CD, Flux, for example.
Both of them are based on same principles. Hey. You don't need to invoke a CICD tool. You don't need to notify me when there is a change. I will be pulling states from Git and figure out what to do. And there are 2 big benefits from that. 1st, it's almost like hands on approach. I don't need to create pipelines. I don't need to create almost anything. I just need to push changes to Git. And the second important implication of that pool model is that we are finally able to secure our clusters, especially production, without affecting productivity of people. Usually, we were going in 1 direction or another. We are secure, but then getting permissions and going through all the hoops to get deployed something to production was complicated.
Or it was not really secure, and then you could do whatever you want. But with those pool models where agents are simply waiting for Git, there is nobody is prevented from doing anything while still nobody has an access to the cluster, except maybe few people in case something goes terribly wrong, you know, for debugging purposes. But, generally, we can finally cut access to a cluster while still making everybody be as productive or more productive than ever before.
[00:11:43] Unknown:
Yeah. That access control pattern is definitely useful because as somebody who has managed infrastructure for a while and worked as a developer, it's frustrating trying to figure out what is the most effective way to be able to handle automation while maintaining that security and providing network access to the pieces that need it in an automated fashion so that you can build everything from the ground up in 1 go and have it all talking to each other the way it's supposed to, and then figuring out what do you do at those network boundaries and at those handoffs, whereas you're saying that poll based model resolves a lot of the complexities and potential security issues that arise from having to open ports in the firewall or grant credentials to some trusted user for being able to perform certain actions and then having to monitor the usage of those credentials and make sure that they don't get leaked anywhere or rotate them periodically.
[00:12:37] Unknown:
Yes. And it also puts everybody on the same page in a way. Because if you think about it, we can probably start to fight over any type of tool, which 1 is better, which 1 you like, which 1 you don't. Right? Better than CICD or for building stuff or for creating releases. No. Everybody has a different opinion. But the only tool that I know of that is undisputable, as in nobody questions whether it's going to be used as Git. Right? So I see it as a as a place where everybody agrees to be in and everybody agrees to collaborate in a way, which is also kind of important for the adoption of those processes.
[00:13:20] Unknown:
Another aspect of this model where just pushing your code to the main line of a Git repository ends up leading to it being deployed into production is the need to have good visibility into what it's doing in that production environment so that you can be aware of issues as they arise and be able to have enough information to remediate them, particularly when you're in an infrastructure model where there's no way for you to log in and poke around at the application and it's running state. So I'm wondering what are some of the additional concerns that developers need to bake into their applications to make it so that there is enough information for being able to diagnose issues in production once they get to that point so that you can have a tight feedback loop without having to worry about replicating the production environment into some other location where you can have that access to be able to poke and prod at the code? I think that those are maybe a couple of questions.
[00:14:17] Unknown:
Like, replicating environment that suddenly becomes extremely easy because, basically, all you need to do is for get get repo or, you know, branch the main line or something like that and just execute the same process that will apply those definitions somewhere else. And all of a sudden, you have the same thing as production. I mean, you never get the same thing as production because you obviously don't have the traffic. You don't have those users. But it is definitely much easier to reproduce anything, including production up to a level that is possible, given that every single aspect of what it is of the desired state of production is already stored somewhere else like in Git. And, basically, you can just say, okay. I want this new cluster to have actual state of that new cluster to be the same thing as the desired state of production and off you go. Right?
Now the issue, issue, I think, with GitOps, if you look at it in isolation, is that GitOps really gives you visibility to the desired state. Right? I know by looking at Git, what is it that I want to have right now? And I can go back in time and say, yeah. But yesterday, complete traceability into desired state. I do not necessarily, through GitOps alone, know what is the actual state. Maybe the process did not manage to converge it. Maybe there is not enough space. Right? Maybe many, many different things can be happening to create that drift between the desired and the actual state and for tools like Argo City not to be able to reconcile that drift.
What that really means is that I at least see GitOps as being if you draw some imaginary vertical line being on the left side until something is running in production and it is the same as what you want, and then we have the whole observability story of how do we observe what is really, really running in a live cluster and how that affects us, users, what's or not. Right? Today, that observability, if I would focus or limit myself on Kubernetes, that would be usually something like Prometheus. And going back to your initial question, if there is 1 important advice I would give at least from our observability perspective, that is to really make sure that you instrument your applications well.
And what I mean by that is the tools Prometheus or Datadog or whichever you're using to gather the information about the actual state cannot give you much from outside your applications. Right? I can know that your application is, for example, slow to respond. But in my head, that is almost irrelevant information because what I really want to know is, hey. This application is slow. And to be more specific, this function within that application is slow because of this and because of that. What that really means is that if I want to go deep dig deep, deep, deep, deep into an application to find out what's going on, I need those application to be instrumented. And this is where we enter usually into a conflict, is that I feel that many teams, at least those that I've worked with, they think that it's all about writing logs.
And to me, logs are maybe at the end of the process. At this stage, I usually don't care about logs. I care about metrics that are deeply ingrained in applications so that I know what is going on on a metric level. And then maybe once I manage to figure out which part of something is misbehaving, only then I might go into into logs. So the advice, yes, instrument your applications and expose that data through instrumentation for the tools to pick them up.
[00:18:24] Unknown:
Another element of GitOps is that it's working to bring the developers more into the process of managing the environments that their applications are going to run within. Whereas previously, that was the responsibility of the systems administrators or DevOps engineers or platform engineers to take the code that was written, figure out what are the different architectural components, and then decide, okay. This is how I'm going to deploy them. This is what the network is going to look like. This is the security model. This is how I'm going to lock down communication, managing mutual TLS between the different nodes within the environment, managing things like secrets and how you're going to distribute the configuration files.
What are some of the elements of operations and the requirements of a production environment that developers need to be more fully educated on as they do take on the responsibilities of defining the environments that their application is going to execute within through these GitOps practices?
[00:19:30] Unknown:
That's a tough 1, to be honest. On 1 hand, I do believe that we need to have or move towards having self sufficient teams. I'm strong believer that there should be a team in charge of 1 application or maybe more depending on the size. And when I say in charge, I mean, literally everything. There is a team of 6 to 7, 8 people maybe. You know? More than that is not a team. It's a school reunion. Then those people should be fully in charge of an application. That means, you know, writing code, writing tests, build scripts, what's or not, and including, ultimately, deployments to production and even monitoring, you know, in PagerDuty for that application in up in production.
I really strongly believe that that's the best way we can move forward or be really efficient. Now the problem with that is that we cannot expect of a team, a small team to know everything. Right? Understanding how networking works is complicated by itself. Learning Kubernetes can easily take you 2 years to know it in insufficient depth to say, I know what I'm doing. And in those 2 years, it it would change so much that it will be obsolete again. Right? So and so on and so forth. There is too much knowledge for that self sufficient team to be truly doing everything, and it's ineffective.
So in parallel of having those teams that have full control of the application, we need and let's say that I call them vertical teams because they control the verticals. We still need those horizontal teams, people that are highly skilled and have possessed a lot of expertise in certain areas, like networking, right, infrastructure, Kubernetes, or what's or not. But their job, and this is, I think, important change, is that their job is not to wait for others for those other teams to open Jira tickets or request something. Their job is to provide the service that will simplify the autonomy of the first group of teams. Right? So I think of all those horizontal teams as being in a company serving a similar function as what AWS or Azure or Google is providing. Right? We are making things easier so that you are still in control, but you don't need to spend years becoming an expert in something. Right? It's all about having teams that create services so that those other self sufficient teams can consume them and be in full control.
And if I focus, for example, on Kubernetes right now, that means, yes, create templates. A team, if if they choose to deploy to Kubernetes, they will have to learn Helm, let's say. Right? But they do not necessarily need to define every single line because there will be hundreds of lines of some YAML templating. Right? They can get help from those teams by simplifying what they really need to be self sufficient. Good example to me is, for example, is Knative as a project that allows you to do what in, like, 20 lines of of YAML, things that normally we would need couple of hundreds of lines.
And that simplification is what we really, really need right now because last couple of years, let's say, if if you got Docker, what was it, like, 6 years ago or something like that, Docker was movement towards simplification of stuff. And then I feel that we are moving in the opposite direct we were moving in the opposite direction where Kubernetes introduced too much of a complexity for most people to ever handle. And I think that that's kind of the next barrier. How can those horizontal teams make Kubernetes easy?
Right? I'm using only Kubernetes as example because that's probably the biggest pain point right now, But that equally applies to anything else. How can we provide the service to our users and our users being other teams in a company, those teams that are actually developing applications.
[00:23:40] Unknown:
Another aspect of GitOps and particularly with things like Kubernetes or cloud resources like s 3 or various things like that. 1 of the challenges that creeps into actually building the application is being able to test it and execute it in an environment on your laptop that is sufficiently close to how it's going to be running in production. And I'm wondering what you have found to be some useful strategies for being able to manage local developer environments and replicate the overall architecture and system components that the code is going to need to be able to interact with on somebody's laptop so that they can have fast iteration cycles and not necessarily have to be calling out over the network to actually deploy the environments for every time they want to see what happens when they change a line of code?
[00:24:35] Unknown:
It really depends on what that environment needs, especially in terms of memory and CPU. Right? It really depends on architecture as well. Right? Can your application work with maybe only direct dependencies and not the whole system? Because that's the ideal architectural state. Right? I mean, actually, from testing perspective as well. Right? So, ideally, you should be designing your application that use a lot of mocks and needs only direct dependencies, not the whole system. And with or without the whole system, the real question is, can you do you need more than, I don't know, like, 8 gigs of RAM? Or I'm I'm not sure how many CPU that it can fit on your laptop really or 16. Actually, everybody has 32 today. Right?
If you can, then it's relatively trivial because you have on your laptop a full blown Kubernetes cluster even though it's only 1 node. So you cannot really do any everything you cannot test what happens when a node fails, but that's probably not what you wanna do locally anyway. So mini cube or Docker Desktop work fairly well. And, again, this is where GitOps jump in because you already have the full desired state. You have absolutely everything you need to create full production or a fraction of production or whatever you need in a matter of seconds because it's Kubernetes. Right? It's the same API. You can do almost everything you can do in AWS or on prem or what's or not. You can do on your laptop just by spinning up Docker Desktop. So that's easy as long you don't need more resources than what what your laptop allows.
But on top of that, and especially lately, I'm starting to question even the need to work locally. I think that we should be moving towards cloud, and I think that my machines are going to be dumber and dumber. I I believe that probably not so long from now, I will go to some type of Chromebook for work purposes simply because cloud is cheap. And this is now important thing. Assuming that I did everything right, and at least when I do my personal work, I hope that I am, I can create a full blown cluster, like Kubernetes cluster. I mean, multiple zones of a region. Everything I need. Right? The small version of production in, like, 5 minutes' time. So usually, when I wake up in the morning, I go to my computer.
I just execute some Terraform scripts depending on where I'm working, this or that, execute some scripts, go fetch coffee. By the time I finished, I have absolutely everything I need much better than if I work locally. And the trick is and I think this is where many companies kind of go wrong, is that our brains did not yet understand, at least majority of of our brains did not understand, that we have tools at our disposal to make things temporary. Right? I might be working in, let's say, AWS, using AWS as my development environment, and I might be working for 3 hours, and then I might pause and have lunch. Right? What do I do? I destroy the cluster. I destroy absolutely everything I did for 2 reasons. Be first to keep the cost down. Why would I pay something that I'm not using if I know that I'm getting everything I will ever need in 5 minutes?
And on top of that, it's also good practice for me because that way I know that I'm doing the right job because I know that everything I did can be easily recreated. So it's not only beneficial for obvious obvious reasons, but I think it's beneficial as a mental practice that everything is ephemeral. So my work needs to be done in a way that everything might be gone at any moment. So I need to be able to recreate everything at any other moment. Right? So but I understand that for some people, you know, going cloud.
When I say cloud, it can be private. Right? It can be your data center. It doesn't matter. But creating servers and installing everything might not be a good choice. And then Docker Desktop or MiniCube are working just as well if somebody really wants to go local.
[00:28:56] Unknown:
Another aspect to the GitOps workflow is that it can potentially blur the lines between the responsibilities of developers and operators or platform engineers. And I'm wondering what you see as the areas for collaboration between those responsibilities or the dividing lines that still exist or if those even should be different roles within an organization?
[00:29:24] Unknown:
In the past, I like to think that there shouldn't be different roles, but I think that I grew up in the meantime, and I don't believe that that's possible. We do need different roles because simply some areas of what we do are sufficiently complicated and requires so much experience that we cannot just say, okay. Go learn that. Right? That's too complicated. So we need to have those divisions. But it's really about I tried many different methods how to remove the silos and how to make that collaboration work well and what's or not. And what I found to work the best is actually organizing absolutely every team with a budget and having customers.
Right? And I think that that's what we are missing still very often in companies. Like, this department, whatever department is, could be infrastructure, could be testing. And they do not feel that somebody else is their customer. So they have no real incentive to make other people's life easier, and they're so much focused on making their job easier or better. That is really what creates those tensions between people because, hey, I perceive you. Like, security departments are usually the main villain in those stories. They're good people, by the way. But, you know, they're usually perceived like like blockers, like people we don't wanna work with because if we even say their names, we will be blocked for a week. Right? And that comes from that lack of perception of who who is your user. Once you understand that those teams that I was mentioning before, those teams in charge of an application are your users, and that they could choose somebody else's service, and this is not important, then you have real incentive to work towards making others people's lives better. And that's the key. That's why we want teams. That's why that's why we want cooperation because we are trying to they those things make other people's lives more productive, better, nicer, what's so not. Right? So it's establishing who is your user, who pays your salary, and somebody that is in in most cases who pays your salary. Not literally, I know that, but those are the teams in charge of those applications. So if you're infrastructure, if your networking, if your whatever you are, security, you need to make sure that actually people wanna work with you because you need customers.
[00:31:57] Unknown:
In terms of the organizational aspects of that, as you mentioned, treating the downstream consumers of the tooling that you write or the work that you're doing as your customer helps to ensure that the way that you design it is more useful for them. And it helps to open up those conversations of what is their workflow, what are their needs versus just looking internally to what's easiest for me to build. And then the end user has to contort to whatever it is that I've created, and they don't necessarily have the appropriate context to make that useful. What are some of the other principles or strategies or team compositions that you have found to be helpful in being able to drive a workflow where everybody is collaborating around code and there isn't as much of a hierarchical need for managing the different responsibilities or being able to unify around a common tool chain?
[00:33:00] Unknown:
1 of the things that I believe were successful at cross case that I've seen is establishing that to begin with what I said at the beginning, everything is code, everything is stored in Git. And the important part of all that is that companies really, really need to treat all their source code and everything is source code as inner source, let's say. Right? Meaning that there is absolutely no restriction to who can view any part of the code and who can create a pull request. Maybe not push to the master branch, but who can create a pull request without doubt. Absolutely no restriction.
There should not be a single person in a company who cannot clone the code, create his own fork, and create a pull request. And that means that there is actually visibility of what we are all working on. And that means that there is no even excuse, you cannot even say anymore. He's a bottleneck, because I can always go and change his code. Right? I can jump into your team and help you out through by writing code. I am such a huge freak on about writing code in any formal way. I think that we should all stop speaking English and then speak code. Right? So it's really about having all reports, everything completely open to everybody, And that increases that collaboration and communication and what's not. Right? Because suddenly, nobody has excuse not to know what others are working on and change their work in any way that they see fit if they think that there is a benefit in that. Right?
Otherwise, we are usually locked in those those models where, hey, I I I don't know what you guys are doing. Yeah. You just sent me this Excel sheet that I need to fill out. I I don't know. Who are you? What what what do you do? Reading code is happens to be so easy.
[00:35:00] Unknown:
Maybe maybe that's nerd in me speaking. I'm I might be wrong, though. Another element of managing your infrastructure and your desired state in a code first way is the need to test it and testing particularly for systems that have external dependencies is complicated and fraught with error. I'm wondering what you have found to be some of the useful approaches to designing your infrastructure management or your desired state configs in a way that they are maintainable and scalable and easy to validate in an automated manner. And some of the cases where that falls down and you just have to do the closest approximation to something that works without being in the optimal state?
[00:35:52] Unknown:
Realistically, almost every company in the world, at least mid to big size company in the world, will never get to the state I want them to be simply because the world was not created yesterday, and there is always like a steady, not avoiding legacy. And so I will never be able to. If I take a bigger company, a bank or insurance or something that I will never be able to move them to Kubernetes fully, 100%. It's not gonna happen. I will never be able to redesign their applications and what's on that. Right? But what we can do, we have always, let's say, legacy part of the system and non legacy part of the system, whatever the percentage is for 1 or the other.
What I think helps a lot is still making, at least from the point of view of non legacy or newer systems or parts of the system, making everything look as if everything is new. And we can do that through through some clever way of doing creating services. By the way, service in Kubernetes world is not an application. It is the way how to communicate with others with service meshes and what's or not. So that I don't really and that's going back to those horizontal teams. If I'm in charge of application x, I should not know about your mainframe. I should not know about the database.
All I need to know really is my application, the interface I have with other systems, and where to find it. That's all we really need. Once we understand that, then that means that, actually, the most critical thing is almost without a doubt really definition of those interfaces. Whether that's REST API or it's RPC or whatever it is, does not really matter. As long as I can I have a clear definition of what my dependency is, from their moment on, I really don't care about anything else because if everything else goes bad, I can always use the definition to create the stub or a mock or something like that? That's almost trivial.
The reason why people don't do that is because we are still in a state where in many organizations, there is no clearly defined interface between different components. We have no other alternatives than just to cry aloud and try to make do with what we have.
[00:38:14] Unknown:
In terms of actually employing GitOps principles and orienting all of your workflow around the code repository as the source of truth and automating the deployment of environments to converge on some desired state. What are some of the ways that it goes wrong or some of the horror stories of people who have attempted this and ultimately thrown their hands up in failure and gone back to doing everything manually?
[00:38:40] Unknown:
Yeah. You know that saying, destroying a cluster destroying a service human, destroying a fleet of clusters that's DevOps and GitOps. Yeah. It could go terribly wrong. Right? Because at some moment, we become comfortable with what we do, and that's good. That's what we really, really want. We become comfortable in our automation. And from that moment on, it's very easy to slip. Right? It's very easy to introduce a change, push the change to Git, and that change could affect hundreds of servers. Right? It could affect many clusters. It could affect different regions simply because we have that power now. And that's not that we couldn't do before. We could.
But that's doing something so terribly bad would require many, many, many hours of many people before, especially when we were doing things manually. Now suddenly, I can be in control of a fleet of clusters and make change with only a minute of my work. Right? And that really means that we need to be much better how we observe things. And more importantly, we really need to then, again, that those are only declarative definitions. So is the story still changes. It's still GitOps, but we need to move towards not even allowing anything but rolling updates. So for whatever we're doing, doesn't matter whether it's infrastructure applications or what's not, rolling updates, candidate deployments, blue green, whatever you want so that horror stories are limited in scope. So that the chances of doing something on a large scale within short period of time are close to impossible.
And even if that happens, basically, if the worst thing happened, if there is no other kind of nothing else we could have done better, The last resort is always just rolling back to the previous comet, and that's the beauty of it. The major issue I have been rolling back to the previous comet that risks me finding out actually what was the cause of the issue. So I can fix things fast now. I can destroy things fast. I can fix these things just as fast. But if I do that, I really do not gain the knowledge that I'm supposed to be gaining from my failures. And that's why going back, yes, if things go wrong, try to limit the scope to rolling updates, calories, what's or not, so that only 5% or or 1%, depending on your scale, is really experiencing those issues.
And there are mechanisms, basically. I mean, not really physical mechanisms, but it shouldn't be that hard to come to agreement that 1 of the rules is you cannot make a big bang change. It needs to be a rolling change, progressive change.
[00:41:34] Unknown:
In your experience of working on teams that are implementing GitOps and in your work at Codefresh where you're helping to facilitate that for other teams, what are some of the most interesting or innovative or ways that you have seen these GitOps practices employed?
[00:41:51] Unknown:
I think that the interesting part is coming right now simply because GitOps as principles existed for a long time, but I think that we are now on a verge of those having the right tooling. And I'm gonna say we, I mean, we as industry, to make those things really happen. And by really happen, I mean, something like what Docker did a long time ago is make it easy. I think that we didn't have that until now. So as a result, there were not really many horror stories about GitHub, except people saying I'm doing it, and then you speak with them for an hour and discover that they are not. Almost everything we're trying to do is based on 2 principles.
Use the things that already work well. And from GitHub's perspective, things that work well today are emerging recently. As I said before, independent of the company where I work in, that would be Argo CD and Flux. Now I obviously prefer Argo CD. We wouldn't choose it. If that would not be the choice, but both are buying choices, and they're they're relatively new tools. What I believe is now the next barrier when we have the mechanism, I think, again, we as industry, to apply those things easily. We could apply it always, but now we can do it easily. The next thing I think missing, and this is going back to 1 or few of your questions, is really how do we make observability into all those things?
And by observability, I mean, yes, you have Git that is desired state. You have the actual state that you can explore 1 way or another. We do not yet have the tooling that will join those 2 together and give you a very clear guidelines of what went wrong and why something is happening as it shouldn't be happening, what were the causes. Put observability. And when I say observability in this context, I don't mean that observability as we know it, but rather that bridge between observability of the cluster and what is really indirectly GitOps. So and I don't care many horror stories from that segment, GitOps related, simply because I think that we are still at the very beginning, at least from the tooling and best practices perspective.
[00:44:13] Unknown:
And in your own journey of working with GitOps practices and working in teams that are orienting themselves around these cloud native workflows and designing their applications to be composable and observable. What are some of the most interesting or challenging or unexpected lessons that you've learned in that process?
[00:44:36] Unknown:
It's hard to say much without revealing the names. The 1 that I really like, again, I cannot name, is a huge company. I like always using huge companies not because I think that they are pleasant to work with, but because they are that had to be most challenging, most backwards cases often, is the 1 that created a separate company as the last resort before going bankrupt to try to turn things around. They created basically a separate company in a separate building and saying, okay. We know that we failed so many times trying to improve how what we do, trying to change things how we do. Let's try something different. Let's give 15 people, I think, was more or less a 1 year mandate to do whatever they want.
We have nothing to lose. And that company, which, again, I cannot name, is now considered 1 of the 1 of the biggest comebacks of companies that were all were basically on the verge of extinction. And to me, that demonstrates that strong need to and almost the only way to change something is by removing the barriers that were preventing people from changing, removing the and you cannot really do that within a company because it is almost impossible to understand how the system will behave on some change. You can poke it, but takes takes a lot of time. But creating basically a separate company, not necessarily legally, is what I've seen. Like, that that's probably the happiest moment in my career, seeing something that I was planning to sell my shares in that company being transformed into something amazing only because they allowed people to do whatever they felt should be done without any single restriction imposed from the company itself.
[00:46:36] Unknown:
For people who are considering building a workflow that is oriented around these GitOps principles, what are the cases where that might be the wrong choice? And what are some of the alternatives for being able to manage the application life cycle in a maintainable and sustainable fashion?
[00:46:54] Unknown:
I don't really think that there is alternative. So if we would be starting from scratch today on some projects, some system, something, I don't think there is alternatives. In the tools that we have right now, of course, there are alternatives, and there will be many. But in the approach as if everything is code, we write code. Therefore, code is stored in a repository code repository, and everything else that comes after that code is repetitive. Therefore, it shouldn't be done manually because we don't do repetitions very well. I don't think that there is alternative to that if we would be starting from scratch. Now the reality is that simply for many existing systems, it is not a good idea.
And it is not a good idea because they are not ready for that. They're not sufficiently small. They're not sufficiently scalable. Many applications, I know, cannot be even defined as code. Many applications today cannot be operated without clicking some button somewhere. Right? Many applications are not testable. No matter how many tests we write, it is not designed to be testable and so on and so forth. Right? It is probably not a good choice for anything that is still living in, let's say, 2010 or earlier date. You ultimately and this is, I think, 1 of the most important advices I think I have is that you cannot skip through time.
Right? You need to figure out where the system or part of the system is in the timeline of the giants of the industry. Right? Look at what Google was doing, Amazon is doing, what's not. And then you figure out where are you right now, and then you need to go through that whole path. You cannot skip steps until you get to the present tense. And that means, yes, if you're not using VMC, you need to start using VMs first. You cannot jump to containers because that will be too much for you. You can go faster than real time, but you cannot jump to the step. If you're not using containers, you cannot go to Kubernetes. If your applications are not designed to be like this or that, you cannot do that. So I would say that GitOps is not a good choice for anybody, any system that is more than 10 years old. 10 years old in terms of architectural processes and team interactions, not necessarily whether we haven't updated it for 10 years.
[00:49:23] Unknown:
Are there any particular resources that you recommend for anybody who wants to dig deeper into this area or learn more about any of the specific principles that we discussed?
[00:49:34] Unknown:
So, yes. I mean, of course, there is Codefresh first. We do a lot of good things. Did not wanna give anybody sales pitch, but I want to say that, hey, it's a free account. Go there. You get unlimited bills and what's not dried out. And the good thing about it is that actually, it's not just a tool. We are really trying to guide people towards doing things well instead of just dropping a tool on them. So that would be the first thing with Codefresh dot io. The second, of course, I have a bunch of books and courses on Udemy, and most of them are based on DevOps, GitOps, automation, cool stuff. So check my books. And and also, if you have a manager, expense it to him. If not if you cannot expense it, then drop me an email or tweet. Yeah. Send me a tweet message, and I will give it for free. That's absolutely not a problem.
[00:50:25] Unknown:
Yeah. We'll add links to all those things in the show notes for anybody who wants to follow-up afterwards. And are there any other aspects of GitOps and its principles and practices and some of the organizational or development strategies that go along with it that we didn't discuss yet that you'd like to cover before we close out the show?
[00:50:43] Unknown:
Principles are easy. Everything is stored in git. Machines take over after we push changes. On principle level, it's easy. Its implementation is really, really hard. It always looks easy, but it's not because it really requires a lot of potentially, I cannot say for everybody because it depends where you are, but it it requires a huge set of changes, and the most difficult ones are really based on people and processes. Tools are easy. You pick a tool. I can give you a list of tools. You just pick them and use them, but that's the problem is because the all the tools are reflection of the teams that built them.
So if you want to adopt some tool, you need to start behaving like the team who designed that tool. I think that that's very important. Like, if your processes are not similar to processes of those who designed Kubernetes, don't go to Kubernetes. If they are not similar to I think that Facebook started the the call GitHub CDA. If you're not similar to them just to give you 1 example, if you're a company that restricts access between teams to Git repositories, then why bother with GitOps? So become like those who designed the tool you want or the process you want.
[00:52:00] Unknown:
Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this is a tool that I've picked before, but because of the context, I'll pick it again, and that's Pulumi. I've been using that for managing my infrastructure as code using Python, and I've been enjoying that pretty thoroughly. So definitely recommend taking a look at that if you're on the lookout for your your own tool chain of choice. And with that, I'll pass it to you, Victor. Do you have any picks this week?
[00:52:31] Unknown:
Picks this week. This is a portion because it happens to be the 1 that I already mentioned, that's Argo CD.
[00:52:38] Unknown:
Well, I appreciate you taking the time today to join me and discuss the work that you've done with GitOps and your experiences there and some of the helpful strategies and advice for people who are looking to follow in that path. It's definitely something that is worth the effort. So I appreciate all the time and energy you've put into that, and I hope you enjoy the rest of your day. Thank you. Thank you for listening. Don't forget to check out our other show, the Data Engineering podcast at data engineering podcast.com for the latest on modern data management.
And visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction to GitOps with Victor Farczuk
Victor's Journey with Python and GitOps
Understanding GitOps Principles
Tools Facilitating GitOps
Design Patterns for GitOps
Observability in GitOps
Developer and Operator Collaboration
Local Development and Cloud Environments
Team Collaboration and Code Management
Testing and Validating Infrastructure as Code
Challenges and Lessons in GitOps Implementation
Final Thoughts on GitOps