Cloud Native Application Delivery Using GitOps

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try out a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to python podcast.com/linode

today. That's l I n o d e, and get a $60 credit to try out our Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Tree schema is a data catalog that is making metadata management accessible to everyone.

With TreeSchema, you can create your data catalog and have it fully populated in under 5 minutes when using 1 of the many automated adapters that can connect directly to your data stores.

Tree schema includes essential cataloging features such as first class support for both tabular and unstructured data, data lineage, rich text documentation,

asset tagging, and more. Built from the ground up with a focus on the intersection of people and data, your entire team will find it easier to foster collaboration around your data.

With the most transparent pricing in the industry at $99

a month for your entire company and a money back guarantee for excellent service, you'll love tree schema as much as you love your data. Go to python podcast.com/treeschema

today to get your 1st month free and mention this podcast to get 50% off your 1st 3 months after the trial.

Your host as usual is Tobias Macy. And today, I'm interviewing Victor Farczuk about using GitOps practices to manage your application and your infrastructure in the same workflow. So, Viktor, can you start by introducing yourself?

Yes. So as you already heard, my name is Victor. I worked for Codefresh Codefresh

for maybe 3 weeks now. I've recently changed companies.

My kind of focus is the same no matter where I am, and that's usually around

making myself obsolete.

A lot of automation,

a lot of CICD

for a while now, containers, Kubernetes,

and GitHub. GitHub is the is the thing that keeps my attention right now. I tend to change what I do very often. So today, it's GitOps, and we'll see about next month.

And, also, when I was getting ready for this interview, I noticed that you have a podcast of your own.

Yes. Yes. Me and colleague or ex colleague, Darren, are having a devil's paradox

podcast

where we just

speak usually about random things. It's

occasionally, we have guests, but more often than not, it's about, hey.

What were we doing this week?

This or that. Right? And then completely unscripted.

Without preparation, we just talk about stuff that we like.

And do you remember when you first got introduced to Python?

Yes. That was, oh, that was a long time ago. Maybe

10 years, 15,

maybe 20. I don't know. I'm old. I was introduced to Python when I

started reaching, you know, 100 lines in my bash scripts.

So my first contact was not really creating real application, but, hey, I cannot really live with hundreds of lines of bash scripts. And then I started with Perl and for that specific use cases, and that was a nightmare. And then I discovered Python and, hey,

this is a good use case. And after that, I did a lot of well, I spent some time with Python.

Mostly, I think,

for test cases, writing test code. Right?

I don't, to be honest, I don't remember the frameworks I was using back then, but, yeah, it was mostly for scripting and testing.

As you mentioned, now a lot of your attention is on the practices and tooling around GitOps. So I'm wondering if you can just give a bit of an overview about what that term means to you and some of the ways that it manifests in the application life cycle.

Maybe I should start with saying that I don't see GitOps as being a new practice. Right? I think it

existed for a long time. It's just that now we have a name for that, and that name is GitOps.

The basic premise is that everything we do is code. Right?

Some of us moved away, and some are moving still

away from the concept of clicking buttons, going to some UIs to change the state of something, to run builds or what's on it. Right? Everything is is code no matter what kind of code that is. And the natural place, natural location for code

is git. Right? I mean, it could be just as well be called SVN ops if it was invented earlier, the term. So it's about storing everything in code and especially

in the context of GitOps,

the desired state of something. Right? So we define what we want.

Those definitions are in git,

and that's where our work stops.

And from that moment, the moment we push something to the main line,

we are finished and then machines take over. Right? And machines

should do whatever they need to do depending on what we pushed. Right? It can could be building a binary. It could be creating a release, or it could be

converging

the actual state of production

environment into the desired state, which is what is defined in Git.

So I see it as a maybe as a wall, as a border that defines

where human work stops and where machines start. Right?

We end our work by pushing something to Git, and

machines take over from there.

And in terms of the ways that guides the

design and implementation

of our applications and our infrastructure,

what are some of the design patterns

or architectural changes that we should be thinking about and that developers need to incorporate to make their applications work well and that type of workflow?

It's good that you said work well because, theoretically, everything can be can follow GitOps principles. But what we obviously see is that GitOps, as a moment, at least, is relatively new, and it's very much focused. Or it's a reaction in a way based on some other best practices. And those are, you know, being cloud native.

Having smaller applications is better than having big applications simply because the cycle can be faster and the machines can

have easier time to figure out what needs to be done if the changes are smaller. So smaller applications,

faster iterations,

and fast when I say faster iteration, I mean, committing or pushing or merging to the main line

more often.

And also designing

code applications

in a way that

the job of those machines can be easier. Like, if machines should decide

when to scale up and scale down your application,

you might need to consider certain

make certain architectural choices, like, I don't know, make it stateless, for example, moving state to something like Redis or some other data store

and so on and so forth. I think it's

probably the best guidelines,

which is not specific to GitOps,

but in general and then very well applies to GitOps is maybe

to follow 12 factor apps principles.

I think that they initiated the whole movement around new architecture, and that new architecture,

which is not new anymore, fits very well into into GitHub's principles.

You mentioned cloud native in there, which is a new term that's been gaining a lot of popularity in recent years with the rise of Kubernetes

and serverless and microservices.

So

because of the capabilities that those provide, that has helped to push the movement towards smaller applications

because

the theory being that it reduces the operational overhead that comes along with microservices,

which is 1 of the arguments against that design pattern. So I'm wondering

what your

experience has been in terms of what tools help to facilitate a GitOps approach to managing applications and their target environments

and how that has changed in recent years to drive this rising popularity of GitOps as

a recommended approach to managing

applications in their life cycles?

If I would focus only on deployment part of GitOps, which is usually the part that people are mostly interested in, there are others, definitely.

I think that containers are definitely helpful, and using containers today means using Kubernetes in 1 form or another. We might go for serverless, but that would be a separate

fight or a discussion. Anyways, containers, Kubernetes

and what was happening, I think, last couple of years is that we were trying to adopt that those principles through push models. Like,

okay. So you push a change to Git, and then git pushes notification to your CICD tool, whichever you're using,

that would do a lot of things and among others, take those new definitions

of what an environment means, what is, let's say, release of this app what is the tag of this application should be running, and then applying those changing, like with KubeKettle or

Helm install or what's not.

What is happening last

months, years even actually, is that we are getting new tools that are

focused much more on a pull

method,

Meaning that Git is not notifying anymore anyone

when there is a change.

There is no CICD tool that would deploy that change to your cluster.

Instead,

we are

installing

very small agents inside of clusters

that would monitor your

Git repositories.

And whenever they detect a drift to change, something that is different between the actual state and the desired state would act. Right? So an example of such tools would be Argo CD,

Flux, for example.

Both of them are based on same principles.

Hey. You don't need to invoke a CICD tool. You don't need to

notify me when there is a change.

I will

be pulling

states from Git and figure out what to do. And

there are 2 big benefits from that. 1st, it's almost like hands on approach. I don't need to create pipelines. I don't need to create almost anything. I just need to push changes to Git.

And the second important

implication of that pool model

is that we are finally able to secure our clusters, especially production,

without

affecting productivity of people. Usually, we were going in 1 direction or another. We are secure, but then

getting permissions and going through all the hoops to get deployed something to production was complicated.

Or it was not really secure, and then you could do whatever you want. But with those pool models where agents are simply waiting for Git, there is nobody is prevented from doing anything

while still nobody has an access to the cluster, except maybe few people in case something goes terribly wrong, you know, for debugging purposes.

But, generally, we can finally cut access to a cluster while still making everybody be as productive or more productive

than ever before.

Yeah. That access control pattern is definitely useful because as somebody who has managed infrastructure for a while and worked as a developer, it's frustrating trying to figure out

what is the most effective way to be able to handle automation while maintaining that security

and providing network access to the pieces that need it in an automated fashion so that you can build everything from the ground up in 1 go and have it all talking to each other the way it's supposed to, and then figuring out what do you do at those network boundaries and at those handoffs, whereas you're saying that poll based model

resolves a lot of the complexities

and potential security issues that arise from having to open ports in the firewall or grant credentials to some trusted user for being able to perform certain actions and then having to monitor the usage of those credentials and make sure that they don't get leaked anywhere or rotate them periodically.

Yes. And it also puts everybody on the same

page in a way. Because if you think about it, we can probably start to fight

over any type of tool, which 1 is better, which 1 you like, which 1 you don't. Right? Better than CICD

or for building stuff or for creating releases. No. Everybody has a different opinion.

But the only tool that I know of that is undisputable, as in nobody questions

whether it's going to be used as Git. Right? So I see it as a as a place where

everybody agrees to be in and everybody agrees to collaborate in a way, which is also kind of important for the adoption of those processes.

Another aspect of this model where just pushing your code to the main line of a Git repository

ends up leading to it being deployed into production is the need to

have good visibility into what it's doing in that production environment so that you can be aware of issues as they arise and be able to have enough information to remediate them,

particularly when you're in an infrastructure model where there's no way for you to log in and poke around at the application and it's running state. So I'm wondering what are some of the

additional concerns that developers need to bake into their applications

to make it so that there is enough information for being able to diagnose issues in production

once they get to that point so that you can have a tight feedback loop without having to worry about replicating the production environment into some other location where you can have that access to be able to poke and prod at the code? I think that those are maybe a couple of questions.

Like, replicating environment that suddenly becomes extremely easy because, basically,

all you need to do is for get get repo or, you know, branch the main line or something like that and just execute the same process that will apply those definitions somewhere else. And all of a sudden, you have the same thing as production. I mean, you never get the same thing as production because you obviously don't have the traffic. You don't have those users. But it is definitely much easier to reproduce anything,

including production up to a level that is possible,

given that every single aspect of what it is of the desired state of production

is already stored somewhere else like in Git. And, basically, you can just say, okay. I want this new cluster to have

actual state of that new cluster to be the same thing as the desired state of production and off you go. Right?

Now the issue, issue, I think, with GitOps, if you look at it in isolation,

is that GitOps really gives you visibility

to the desired state.

Right? I know by looking at Git,

what is it that I want to have right now? And I can go back in time and say, yeah. But yesterday,

complete traceability

into desired state.

I do not necessarily, through GitOps alone,

know what is the actual state. Maybe the process did not manage to converge it. Maybe there is not enough space. Right? Maybe

many, many different things can be happening

to create that drift between

the desired and the actual state and for tools like Argo City not to be able to reconcile that drift.

What that really means is that

I at least see GitOps as being if you draw some imaginary vertical line being on the left side until something is running in production and it is the same as what you want,

and then we have the whole observability

story

of how do we observe

what is really, really running in a live cluster

and how that affects us, users, what's or not. Right? Today, that observability,

if I would focus or limit myself on Kubernetes, that would be usually something like Prometheus.

And going back to your initial question,

if there is 1 important advice

I would give at least from our observability

perspective,

that is to

really make sure that you instrument your applications

well.

And what I mean by that is the tools Prometheus or Datadog or whichever you're using to gather the information

about the actual state

cannot give you much

from outside your applications. Right? I can know that your application is, for example,

slow to respond.

But in my head, that is almost irrelevant information

because what I really want to know is, hey. This application is slow.

And to be more specific,

this function within that application

is slow

because of this and because of that.

What that really means is that if I want to go deep dig deep, deep, deep, deep into an application to find out what's going on, I need those application to be instrumented.

And this is where we enter usually into a conflict,

is that I feel that many teams, at least those that I've worked with,

they think that

it's all about writing logs.

And to me, logs are maybe at the end of the process.

At this stage,

I usually don't care about logs. I care about metrics

that are deeply ingrained in applications so that I know

what is going on on a metric level. And then maybe

once I manage to figure out which part of something

is misbehaving,

only then I might go into into logs.

So the advice, yes, instrument your applications

and

expose that data through instrumentation for the tools to pick them up.

Another element of GitOps is that it's working to

bring the developers

more into

the process of managing the environments that their applications are going to run within.

Whereas previously,

that was the responsibility

of

the systems administrators or DevOps engineers or platform engineers to

take the code that was written, figure out what are the different architectural components, and then decide, okay. This is how I'm going to deploy them. This is what the network is going to look like.

This is the security model. This is how I'm going to lock down communication, managing mutual TLS between the different nodes within the environment,

managing things like secrets and how you're going to distribute the configuration files.

What are some of the elements of

operations

and the requirements of a production environment

that developers need to

be more fully educated on as they do take on the responsibilities

of defining the environments that their application is going to execute within through these GitOps practices?

That's a tough 1, to be honest. On 1 hand, I do believe that

we need to have or move towards having

self sufficient teams. I'm strong believer that there should be a team in charge of 1 application or maybe more depending on the size. And when I say in charge, I mean, literally everything. There is a team of 6 to 7,

8 people maybe. You know? More than that is not a team. It's a school reunion.

Then those people should be fully

in charge of an application. That means, you know, writing code, writing tests, build scripts, what's or not, and including, ultimately,

deployments to production

and even monitoring, you know, in PagerDuty

for that application in up in production.

I really strongly believe that that's the best way we can move forward or be really efficient. Now the problem with that

is that we cannot expect of

a team, a small team to know everything. Right?

Understanding how networking works is complicated by itself.

Learning Kubernetes can easily take you

2 years to know it in insufficient depth to say, I know what I'm doing. And in those 2 years, it it would change so much that it will be obsolete again. Right? So and so on and so forth. There is too much knowledge for that

self sufficient team to be truly

doing everything, and it's ineffective.

So

in parallel of having those

teams that have full control of the application, we need and let's say that I call them vertical teams because they control the verticals.

We still need those horizontal teams, people

that are highly skilled and have possessed a lot of expertise in certain areas, like networking, right,

infrastructure, Kubernetes, or what's or not.

But their job, and this is, I think, important change, is that their job is not to wait for

others for those other teams to open Jira tickets or request something.

Their job is to provide the service

that will simplify

the autonomy of the first group of teams. Right? So I think of all those horizontal teams as being

in a company serving a similar function as what AWS or Azure or Google is providing. Right? We are making things easier

so that you are still in control,

but you don't need to spend years

becoming an expert in something. Right? It's all about having teams that create services

so that those other self sufficient teams can consume them

and be in full control.

And if I focus, for example, on Kubernetes

right now,

that

means, yes, create templates.

A team, if if they choose to deploy to Kubernetes, they will have to learn Helm, let's say. Right? But they do not necessarily need to define every single line because there will be hundreds of lines of some YAML templating. Right? They can get help from those teams by simplifying what they really need

to be self sufficient.

Good example to me is, for example, is Knative

as a project

that allows you to do what in, like, 20 lines of of YAML,

things that normally we would need couple of hundreds of lines.

And that simplification

is what we really, really need right now because

last couple of years, let's say, if if you got Docker,

what was it, like, 6 years ago or something like that, Docker was movement towards simplification

of stuff. And then I feel that we are moving in the opposite direct we were moving in the opposite direction where Kubernetes introduced

too much of a complexity for

most people to ever handle.

And I think that that's kind of the next barrier.

How can those horizontal

teams

make

Kubernetes

easy?

Right?

I'm using only Kubernetes as example because that's probably the biggest pain point right now, But that equally applies to anything else. How can we

provide the service to our users and our users being other teams in a company, those teams that are actually developing applications.

Another aspect

of GitOps and particularly with things like Kubernetes

or cloud resources like s 3 or

various things like that.

1 of the challenges that creeps into actually building the application

is being able to

test it and execute it in an environment on your laptop that is sufficiently close to how it's going to be running in production.

And I'm wondering what you have found to be some useful strategies for being able to

manage local developer environments and

replicate the overall architecture

and system components that the code is going to need to be able to interact with

on somebody's laptop so that they can have

fast iteration cycles and not necessarily

have to be calling out over the network

to actually deploy the environments for every time they want to see what happens when they change a line of code?

It really depends on what that environment needs, especially in terms of memory and CPU.

Right? It really depends on architecture as well. Right? Can your application work with

maybe only direct dependencies and not the whole system? Because that's the ideal architectural

state. Right? I mean, actually, from testing perspective as well. Right? So, ideally, you should be designing your application that use a lot of mocks and needs only direct dependencies, not the whole system.

And with or without the whole system, the real question is, can you

do you need more than, I don't know, like, 8 gigs of RAM? Or I'm I'm not sure how many CPU that it can fit on your laptop really or 16. Actually, everybody has 32 today. Right?

If you can, then it's relatively trivial

because you have on your laptop

a full blown Kubernetes cluster even though it's only 1 node. So you cannot really do any everything you cannot test what happens when a node fails, but that's probably not what you wanna do locally anyway.

So mini cube or Docker Desktop work fairly well. And, again, this is where GitOps jump in because you already have the full desired state. You have absolutely

everything you need

to create full production or a fraction of production or whatever you need in a matter of seconds

because

it's Kubernetes. Right? It's the same API. You can do almost everything you can do in AWS or on prem or what's or not. You can do on your laptop just by spinning up Docker Desktop. So that's easy

as long

you don't need more resources than what what your laptop allows.

But on top of that, and especially lately,

I'm starting to question

even the need to work locally.

I think that we should be moving towards

cloud, and I think that

my machines are

going to be dumber and dumber. I I believe that probably not so long from now, I will go to some type of Chromebook for work purposes

simply because

cloud is cheap. And this is now important thing.

Assuming that I did everything right, and at least when I do my personal work, I hope that I am, I can create a full blown cluster, like Kubernetes cluster.

I mean, multiple zones of a region. Everything I need. Right? The small version

of production

in,

like, 5 minutes' time. So usually, when I wake up in the morning,

I go to my computer.

I just execute some Terraform scripts depending on where I'm working, this or that, execute some scripts, go fetch coffee. By the time I finished,

I have absolutely everything I need much better than if I work locally.

And the trick is and I think this is where

many companies kind of go wrong,

is that

our brains did not yet understand,

at least majority of of our brains did not understand,

that

we have tools at our disposal to make things temporary. Right? I might be

working in, let's say, AWS,

using AWS as my development environment,

and I might be working for 3 hours, and then I might pause and have lunch. Right? What do I do? I destroy the cluster. I destroy absolutely everything I did

for 2 reasons. Be first to keep the cost down. Why would I pay something that I'm not using if I know that I'm getting everything I will ever need

in 5 minutes?

And on top of that, it's also good practice for me because

that way I know that I'm doing the right job because I know that everything I did can be easily recreated.

So it's not only beneficial for obvious obvious reasons, but I think it's beneficial as a mental practice

that everything is ephemeral.

So my work needs to be done in a way that everything might be gone at any moment.

So I need to be able to recreate everything at any other moment. Right?

So but I understand that for some people, you know, going

cloud.

When I say cloud, it can be private. Right? It can be your data center. It doesn't matter. But creating servers

and installing everything

might not be a good choice. And then Docker Desktop or MiniCube are working just as well if somebody really wants to go local.

Another aspect

to the GitOps workflow

is

that it can potentially blur the lines between the responsibilities of developers

and operators or platform engineers.

And I'm wondering what you see as the

areas for collaboration between those responsibilities

or

the dividing lines that still exist or if those even should be different roles within an organization?

In the past, I like to think that there shouldn't be different roles, but I think that I grew up in the meantime, and I don't believe that that's possible. We do need different roles

because simply

some areas of what we do are sufficiently

complicated and requires so much

experience

that we cannot just say, okay. Go learn that. Right? That's too complicated. So we need to have those divisions. But it's really about

I tried many different methods how to remove the silos and how to make that collaboration

work well and what's or

not. And what I found to work the best is actually

organizing

absolutely every team

with a budget

and having

customers.

Right?

And I think that that's what we are missing still very often in companies. Like, this department, whatever department is, could be infrastructure, could be testing.

And they do not feel that

somebody else is their customer.

So they have no real incentive to make

other people's life easier,

and they're so much focused on making their job

easier or better.

That is really what

creates those tensions between people because, hey, I perceive you. Like,

security departments are usually the main villain in those stories. They're good people, by the way. But, you know, they're usually perceived like

like blockers, like people we don't wanna work with because

if we even say their names, we will be blocked for a week. Right? And that comes from that

lack of perception of who who is your user. Once you understand that those teams that I was mentioning before, those teams in charge of an application

are your users,

and that they could choose

somebody else's service, and this is not important,

then you have real incentive to work towards making others people's lives

better. And that's the key. That's why we want teams. That's why that's why we want cooperation

because

we are trying to

they those things

make other people's lives

more productive, better,

nicer, what's so not. Right? So it's establishing

who is your user, who pays your salary,

and somebody that is in in most cases

who pays your salary. Not literally, I know that, but those are the teams in charge of those applications. So if you're infrastructure,

if your networking,

if your

whatever you are, security,

you need to make sure that actually

people wanna work with you because

you need customers.

In terms of

the organizational aspects of that, as you mentioned,

treating the

downstream consumers of the tooling that you write or the work that you're doing as your customer helps to

ensure that the way that you design it is more useful

for them. And it helps to open up those conversations

of

what is their workflow, what are their needs versus just looking internally to what's easiest for me to build. And then the end user has to contort to

whatever it is that I've created, and they don't necessarily have the appropriate context to make that useful.

What are some of the other principles

or strategies or team compositions that you have found to be helpful in being able to drive a workflow where everybody is collaborating around code

and there isn't as much of a

hierarchical

need for managing

the different responsibilities

or

being able to unify around a common tool chain?

1 of the things that I believe were successful at cross case that I've seen

is

establishing that to begin with what I said at the beginning, everything is code, everything is stored in Git. And the important part of all that is that companies really, really need to treat all their source code and everything is source code

as inner source, let's say. Right?

Meaning that there is absolutely no restriction to who can view

any part of the code

and who can create a pull request. Maybe not push to the master branch, but who can create a pull request without doubt.

Absolutely no restriction.

There should not be a single person

in a company who cannot

clone the code,

create his own fork, and create a pull request.

And that means that there is actually visibility

of what we are all working on. And that means that there is no even excuse, you cannot even say anymore.

He's a bottleneck,

because

I can always go and change his code.

Right? I can jump into your team

and help you out

through by writing code. I am such a huge freak on about writing code in any formal way. I think that we should all stop speaking English and then speak code. Right? So it's really about

having

all reports, everything

completely open to everybody,

And that increases that collaboration and communication

and what's not. Right? Because

suddenly,

nobody

has excuse

not to know what others are working on and change their work

in any way that they see fit if they think that there is a benefit in that. Right?

Otherwise,

we are usually locked in those

those models where, hey, I I I don't know what you guys are doing. Yeah. You just sent me this Excel sheet that I need to fill out. I I don't know. Who are you?

What what what do you do?

Reading code is happens to be so easy.

Maybe maybe that's nerd in me speaking. I'm I might be wrong, though.

Another element

of managing your infrastructure and your desired state in a code first way is the need

to test it and

testing particularly

for systems that have external dependencies

is complicated

and fraught with

error. I'm wondering what you have found to be some of the useful approaches

to designing your

infrastructure management or your desired state configs in a way that they are

maintainable and scalable

and easy to validate in an automated manner.

And some of the cases where that falls down and you just have to do

the closest approximation

to something that works without being in the optimal state?

Realistically,

almost every company in the world, at least mid to big size company in the world,

will never get to the state I want them to be simply because the world was not created yesterday, and there is always like a steady, not avoiding legacy.

And

so I will never be able to. If I take a bigger company, a bank or insurance or something that I will never be able to move them to Kubernetes

fully, 100%. It's not gonna happen. I will never be able to redesign their applications and what's on that. Right?

But what we can do,

we have always, let's say, legacy part of the system and non legacy part of the system, whatever the percentage is for 1 or the other.

What I think helps a lot is still making,

at least from the point of view

of non legacy or newer systems

or parts of the system,

making everything look as if

everything is new. And we can do that through through some clever way of doing creating services.

By the way, service in Kubernetes world is not an application. It is the way how to communicate with others with service meshes and what's or not. So that I don't really

and that's going back to those horizontal teams. If I'm in charge of application x,

I should not know about your mainframe. I should not know about the database.

All I need to know really is my application,

the interface I have with other systems,

and where to find it. That's all we really need.

Once we understand that,

then that means that, actually, the most critical

thing

is

almost without a doubt

really

definition of those interfaces.

Whether that's REST API or it's RPC or whatever it is, does not really matter. As long as I can I have a clear definition of what

my dependency

is, from their moment on, I really don't care about anything else because if everything else goes bad,

I can always

use the definition to create the stub or a mock or something like that? That's almost trivial.

The reason why people don't do that is because

we are still in a state where in many organizations,

there is no clearly defined interface between different components.

We have no other alternatives

than just to cry aloud and

try to make do with what we have.

In terms of actually employing GitOps principles and orienting all of your workflow around

the code repository as the source of truth

and automating the deployment of environments

to converge on some desired state.

What are some of the ways that it goes wrong or some of the horror stories

of people who have attempted this and ultimately thrown their hands up in failure and gone back to doing everything manually?

Yeah.

You know that saying, destroying

a cluster

destroying a service human, destroying a fleet of clusters that's DevOps

and GitOps.

Yeah. It could go terribly wrong. Right? Because

at some moment, we become comfortable with what we do, and that's good. That's what we really, really want. We become comfortable in our automation.

And from that moment on, it's very easy to slip. Right? It's very easy to introduce a change,

push the change to Git, and that change could affect hundreds of servers. Right? It could affect many clusters. It could affect different regions

simply because

we have that power now. And

that's not

that we couldn't do before. We could.

But that's doing something so terribly bad would require many, many, many hours of many people before, especially when we were doing things manually.

Now

suddenly, I can be in control

of a fleet of clusters

and make change

with only a minute of my work. Right?

And that really means that

we need to be much better how we observe things.

And more importantly,

we really need to

then, again, that those are only declarative definitions. So is the story still changes. It's still GitOps,

but we need to move towards

not even allowing

anything but rolling updates. So for whatever we're doing, doesn't matter whether it's infrastructure applications or what's not, rolling updates,

candidate deployments,

blue green, whatever you want so that

horror stories

are limited in scope. So that the chances of doing something on a large scale within short period of time are close to impossible.

And even if that happens,

basically,

if the worst thing happened, if there is no other

kind of nothing else we could have done better,

The last resort is always just rolling back to the previous comet, and that's the beauty of it. The major issue I have been rolling back to the previous comet

that risks me

finding out actually what was the cause of the issue. So I can fix things fast now. I can destroy things fast. I can fix these things just as fast.

But if I do that,

I really do not gain the knowledge that I'm supposed to be gaining from my failures.

And that's why going back, yes, if things go wrong,

try to limit the scope to rolling updates, calories, what's or not,

so that only 5% or or 1%, depending on your scale,

is really experiencing those issues.

And there are mechanisms, basically. I mean, not really physical mechanisms,

but it shouldn't be that hard to come to agreement that 1 of the rules is you cannot make a big bang change. It needs to be a rolling change, progressive change.

In your experience of

working on teams that are implementing GitOps and in your work at Codefresh where you're helping to facilitate that for other teams, what are some of the most interesting or innovative or

ways that you have seen these GitOps practices employed?

I think that the interesting part is coming right now simply because

GitOps

as principles

existed for a long time, but

I think that we are now on a verge

of those

having the right tooling. And I'm gonna say we, I mean, we as industry,

to make those things really happen.

And by really happen, I mean, something like what Docker did a long time ago

is make it easy.

I think that we didn't have that until now.

So as a result,

there were not really many horror stories about GitHub, except people saying I'm doing it, and then you speak with them for an hour and discover that they are not.

Almost everything we're trying to do is based on 2 principles.

Use the things that already work well.

And from GitHub's perspective, things that work well today

are emerging

recently.

As I said before, independent of the company where I work in, that would be Argo CD and Flux. Now I obviously prefer Argo CD. We wouldn't choose it. If that would not be the choice, but both are buying choices, and they're they're relatively new tools.

What I believe is now the next barrier when we have the mechanism, I think, again, we as industry,

to apply those things easily.

We could apply it always, but now we can do it easily.

The next thing I think missing,

and this is going back to 1 or few of your questions,

is really how do we make

observability

into all those things?

And by observability, I mean, yes, you have

Git that is desired state. You have the actual state that you can explore 1 way or another.

We do not yet have the tooling that will join those 2 together

and give you a very clear

guidelines

of what went wrong

and why something is happening as it shouldn't be happening,

what were the causes.

Put observability.

And when I say observability in this context, I don't mean that observability as we know it, but rather

that bridge between

observability of the cluster and what is really indirectly GitOps.

So

and I don't care many horror stories from that segment,

GitOps related,

simply because I think that we are still at the very beginning,

at least from the tooling and best practices perspective.

And in your own journey of

working with GitOps practices

and working in teams that are

orienting themselves around these cloud native workflows

and designing their applications

to be

composable

and observable.

What are some of the most interesting or

challenging or unexpected lessons that you've learned in that process?

It's hard to say much without revealing the names.

The 1 that I really like, again, I cannot name, is

a huge company. I like always using huge companies not because I think that they are pleasant to work with, but because they are

that had to be most challenging, most backwards

cases often,

is the 1 that created

a separate company

as the last resort before going bankrupt

to try to turn things around. They created basically a separate company in a separate building and saying, okay. We know that we failed so many times

trying

to improve how what we do, trying to change things how we do. Let's try something different. Let's give

15 people, I think, was more or less a 1 year mandate to do whatever they want.

We have nothing to lose. And that company,

which, again, I cannot name,

is now considered 1 of the 1 of the biggest comebacks

of companies that

were all were basically on the verge of extinction.

And to me, that demonstrates that strong

need to

and almost the only way to

change something

is by

removing the barriers that were preventing people from changing, removing the and you cannot really do that within a company

because

it is almost impossible to understand how the system will

behave

on some change. You can poke it, but takes takes a lot of time. But creating basically a separate company, not necessarily legally,

is what I've seen. Like, that that's probably the happiest moment in my

career,

seeing something that

I was planning to sell my shares in that company

being transformed into something amazing only

because they allowed people

to do

whatever they felt should be done

without

any single restriction

imposed from the company itself.

For people who are considering

building a workflow that is oriented around these GitOps principles, what are the cases where that might be the wrong choice? And what are some of the alternatives for being able to manage the application life cycle in a maintainable and sustainable fashion?

I don't really think that there is alternative.

So if we would be starting from scratch today on some projects, some system, something, I don't think there is alternatives.

In the tools that we have right now, of course, there are alternatives, and there will be many. But in the approach as if everything is code, we write code. Therefore, code is stored in a

repository

code repository,

and everything else that comes after that code is repetitive. Therefore, it shouldn't be done manually because

we don't do repetitions very well. I don't think that there is alternative to that if we would be starting from scratch.

Now the reality is that

simply

for many existing systems,

it is not a good idea.

And it is not a good idea because they are not ready for that. They're not sufficiently small. They're not sufficiently

scalable.

Many applications, I know, cannot be even defined as code.

Many applications today cannot be operated without clicking some button somewhere. Right?

Many applications

are not testable. No matter how many tests we write, it is not designed to be testable

and so on and so forth. Right?

It is probably not a good choice

for anything

that is still

living

in,

let's say,

2010 or earlier date.

You ultimately

and this is, I think,

1 of the

most important advices I think I have is that you cannot skip through time.

Right?

You need to figure out where

the system or part of the system is

in the timeline

of

the giants of the industry. Right?

Look at what

Google was doing, Amazon is doing, what's not. And then you figure out where are you right now, and then you need to go through that whole path. You cannot skip steps

until you get to the present tense.

And that means,

yes, if you're not using VMC, you need to start using VMs first. You cannot jump to containers because that will be too much for you. You can go faster than real time, but you cannot jump to the step. If you're not using containers,

you cannot go to Kubernetes. If your applications are not designed to be like this or that, you cannot do that. So I would say that GitOps is not a good choice for anybody,

any system that is more than 10 years old. 10 years old in terms of

architectural

processes

and team interactions,

not necessarily

whether we haven't updated it for 10 years.

Are there any particular resources that you recommend for anybody who wants to dig deeper into this area or learn more about any of the specific principles that we discussed?

So, yes. I mean, of course, there is Codefresh first. We do a lot of good things.

Did not wanna give anybody sales pitch, but I want to say that, hey, it's a free account.

Go there. You get unlimited

bills and what's not dried out. And the good thing about it is that actually,

it's not

just a tool. We are really trying to guide people towards doing things well instead of just dropping a tool on them. So that would be the first thing with Codefresh dot io. The second, of course, I have a bunch of books and courses on Udemy, and most of them are based on DevOps, GitOps,

automation,

cool stuff. So check

my books. And and also,

if you have a manager, expense it to him. If not if you cannot expense it, then drop me an email or tweet. Yeah. Send me a tweet message, and I will give it for free. That's absolutely not a problem.

Yeah. We'll add links to all those things in the show notes for anybody who wants to follow-up afterwards.

And are there any other aspects of GitOps and its principles and practices

and some of the

organizational or development strategies that go along with it that we didn't discuss yet that you'd like to cover before we close out the show?

Principles are easy. Everything is stored in git.

Machines take over after we push changes. On principle level, it's easy.

Its implementation is really, really hard. It always looks easy, but it's not because it really requires

a lot of

potentially, I cannot say for everybody because it depends where you are, but it it requires a huge

set of changes, and the most difficult ones are really based on people and processes.

Tools are easy. You pick a tool. I can give you a list of tools. You just pick them and use them, but that's the problem is because

the all the tools

are reflection

of the teams that built them.

So if you want to adopt some tool, you need to start behaving like the team who designed that tool.

I think that that's very important. Like,

if your processes are not similar to processes of those who designed Kubernetes, don't go to Kubernetes.

If they are not similar to

I think that Facebook started the the call

GitHub CDA. If you're not similar to them just to give you 1 example, if you're a company that

restricts

access between teams to Git repositories, then why bother with GitOps?

So become like those who designed the tool you want or the process you want.

Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks.

And this is a tool that I've picked before, but because of the context, I'll pick it again, and that's Pulumi. I've been using that for managing my infrastructure as code using Python, and I've been enjoying that pretty thoroughly. So definitely recommend taking a look at that if you're on the lookout for your your own tool chain of choice. And with that, I'll pass it to you, Victor. Do you have any picks this week?

Picks this week.

This is a portion because it happens to be the 1 that I already mentioned, that's Argo CD.

Well, I appreciate you taking the time today to join me and discuss the work that you've done with GitOps and your experiences there and some of the helpful strategies and advice for people who are looking to follow in that path. It's definitely something that is worth the effort. So I appreciate all the time and energy you've put into that, and I hope you enjoy the rest of your day. Thank you.

Thank you for listening. Don't forget to check out our other show, the Data Engineering podcast at data engineering podcast.com

for the latest on modern data management.

And visit the site at pythonpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes.

And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com

with your story.

To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__

Summary

Announcements

Interview

Keep In Touch

Picks

Links

The Python Podcast.init