Summary
It doesn’t matter how amazing your application is if you are unable to deliver it to your users. Frustrated with the rampant complexity involved in building and deploying software Vlad A. Ionescu created the Earthly tool to reduce the toil involved in creating repeatable software builds. In this episode he explains the complexities that are inherent to building software projects and how he designed the syntax and structure of Earthly to make it easy to adopt for developers across all language environments. By adopting Earthly you can use the same techniques for building on your laptop and in your CI/CD pipelines.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Vlad A. Ionescu about Earthly, a syntax and runtime for software builds to reduce friction between development and delivery
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Earthly is and the story behind it?
- What are the core principles that engineers should consider when designing their build and delivery process?
- What are some of the common problems that engineers run into when they are designing their build process?
- What are some of the challenges that are unique to the Python ecosystem?
- What is the role of Earthly in the overall software lifecycle?
- What are the other tools/systems that a team is likely to use alongside Earthly?
- What are the components that Earthly might replace?
- How is Earthly implemented?
- What were the core design requirements when you first began working on it?
- How have the design and goals of Earthly changed or evolved as you have explored the problem further?
- What is the workflow for a Python developer to get started with Earthly?
- How can Earthly help with the challenge of managing Javascript and CSS assets for web application projects?
- What are some of the challenges (technical, conceptual, or organizational) that an engineer or team might encounter when adopting Earthly?
- What are some of the features or capabilities of Earthly that are overlooked or misunderstood that you think are worth exploring?
- What are the most interesting, innovative, or unexpected ways that you have seen Earthly used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Earthly?
- When is Earthly the wrong choice?
- What do you have planned for the future of Earthly?
Keep In Touch
- @VladAIonescu on Twitter
- Website
Picks
- Tobias
- Shape Up book
- Vlad
- High Output Management by Andy Grove
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- Earthly
- Bazel
- Pants
- ARM
- AWS Graviton
- Apple M1 CPU
- Qemu
- Phoenix web framework for Elixir language
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today, I'm interviewing Vlad a UNESCO about Earthly, a syntax and runtime for software builds to reduce friction between development and delivery. So, Vlad, can you start by introducing yourself?
[00:01:09] Unknown:
Yeah. Sure. I'm the founder and creator of Earthly. I'm also the founder of ShiftLeft, a code analysis company. And before ShiftLeft, I've been at Google. I am at VMware as an engineer. And I've also worked on the early version of RabbitMQ in its early days.
[00:01:26] Unknown:
Yeah. Rabbit's definitely a very interesting and ubiquitous project, although it is starting to be edged out by Kafka and Pulsar and the like. That's right. It's been a decade since I worked on RabbitMQ, but those were really fun times for sure. Absolutely. Yeah. Erlang projects are always interesting in terms of just their resiliency and the locations where they get deployed.
[00:01:49] Unknown:
Oh, yeah. Oh, yeah. It's big in telco. They've always, you know, prided themselves with creating the system that had 9 nines of reliability, which is like crazy talk these days, I guess. In many telco applications, that's actually required.
[00:02:04] Unknown:
Alright. And do you remember how you first got introduced to Python?
[00:02:08] Unknown:
Yes. So Python for me was maybe my, let's see, 3rd or 4th language I learned. So I was getting my first job in the UK, and, basically, I was getting this job to pay my rent because I was a student. So I was living, you know, in my student accommodation and doing this part time thing while I was going to courses. But, yeah, as I got this job, the boss there, the CTO of, basically, the company that created RabbitMQ, Matthias Reidstock. Basically, she said I should learn more modern languages. I was coming from a c plus plus background, and you can imagine, you know, c plus plus is this old sort of thing that, you know, by many standards, it's sort of, you know, not as modern anymore.
And that was when when I started to look into Python, really try to understand it. And I did some tutorials, but no real work just yet. And a few years later, when I was trying to build my first startup ever, I picked up Python as I felt like it was just so easy to do everything. You can imagine coming from c plus plus having no libraries, no batteries included, nothing. All these, like, edge cases and bugs and, you know, these difficult things to deal with that c plus plus makes you deal with. You know, coming at Python and having, you know, Django and all these really powerful web specific tools.
You know, I felt like I was, you know, Superman, basically. So it was a really nice experience. Yeah. I've used Python on and off for the last, I don't know, more than a decade now. I think it's serving well in so many cases.
[00:03:47] Unknown:
As you mentioned, you have a background in starting different companies, and the most recent 1 now is Earthly. And I'm wondering if you can describe a bit about what it is that you're building there and some of the story behind how it came to be and why this is the problem that you decided it's worth spending your time and energy on. Yeah. Absolutely.
[00:04:06] Unknown:
So Earthly is sort of the result of me having been a software engineer and not having the right tools for the job. Right? And I've experienced how builds work at Google pretty early in my career, and I've seen how powerful a well built build system and build automation system really in general, how that can serve the engineering productivity. And as I came out of Google, I was just waiting for something to come in and give me those same benefits as I've seen in, you know, the internal systems at Google called Blaze internally. And I just couldn't believe for the longest time that Jenkins is the best the world can give us. You know? Everyone was saying, like, open source is always the best and so on, but in the world of builds, it just wasn't the case.
And later on, Google did open source their internal system Blaze, and outside of Google, it's called Bazel. But the pain with Bazel these days is that it requires you to take on this very different sort of alien world of tooling that, you know, the open source ecosystem isn't quite used to. You know, you have to replace your, say, your Gradle scripts, you know, your Python specific scripts and so on, and really buy into Gradle with everything that it has. And that is a big, big ask of the developer. So having those ideas, we thought about how we can bring some of the benefits of Bazel, you know, like remote execution and high parallelism and high caching, bring those benefits without any of the drawbacks.
So what that meant was you had to be able to use your existing scripts with Earthly, or you would be able to have all the sort of caching benefits that are similar to Bazel and also have it be made very consistent so that when you run the system 1 computer, it behaves exactly the same as you run it on another computer or when you run it in CI. And this is a property that we call repeatability, and it's sort of the core of our design process. This idea that no matter where you run the build, it'll behave the same, and that takes down collaboration barriers between teams.
It takes down issues of works on my machine and so on, and also allows you to reproduce any failure that you might have in CI. So if anything fails in your CI and there's no way for it to reproduce it on your laptop, you're gonna have a really long day. You're gonna do git commit over and over again until you get that CI fixed. And if your build takes 40 minutes to build, which is not abnormal, you know, it shouldn't be that long, but it it's very common, actually. You might be spending your every 30 minutes or so trying to get to that line that is failing over and over again and try to, you know, throw other things at it. That can be a super time consuming process, and this ability to reproduce failures on your laptop is the number 1 thing that we're going for. And today is the number 1 reason people use Earthly for as it is right now.
[00:07:16] Unknown:
And your mention of Bazel, and I know that there are other tools out there such as Buck. And the 1 that comes closest to home in terms of the Python community is Pants. And I'm wondering what your experience or thoughts are in the sort of relation of Pants to the kind of level of conformity that you need to bring to be able to make effective use of it versus the types of use cases that Earthly is well suited for and what sort of
[00:07:44] Unknown:
sort of coexistence you might find between those 2 tools. I can't speak for a pen specifically because I've not used it. So I'll put that out there. But in general, Basel like tools will have sort of the same, you know, drawbacks. Of course, once you have adopted the system and it's working for you, you are now in heaven. You know, your productivity is, you know, over the roof and so on. But the challenge for many of these systems is just adopting them. And we've also seen people use Basal in conjunction with Earthly. And the reason they might do that is because there are certain things that you might have to do as part of the CI that Basal doesn't quite let you to do, or maybe they're just too difficult to achieve that way. And that's where you need some sort of system to allow you to put things together, like glue things together. And that's where Earthly is is really great at. So if you wanted to have, like, Basel or Pants based system, you can have that for specific languages.
But once you have to write more scripts around them, for example, for releases, for maybe putting together other ecosystems that are not Bazel related or not Pants related, then maybe you need a glue layer. And Earthling, in some cases, is, you know, really create as a glue layer.
[00:09:01] Unknown:
And so digging more into the problems of software build and delivery, what are some of the core principles that you found that engineers should really be thinking about when they're designing that overall process and workflow and some of the complexities that come about when they're dealing with figuring out? So I've got these, you know, 5 files that do the thing that I want to do. I've got a basic web application. Now I need to get that into production. What are all the things that they need to be considering? And particularly as they evolve into more complex systems, how they're going to start encountering that incidental and evolutionary complexity of the system?
[00:09:37] Unknown:
Yeah. Where to begin? Right? This is a pretty meaty meaty topic. But I will say some of the design principles we've thought that are important are basically, you know, that the team should be able to contribute to the build, the whole team, not just 1 or 2 people. And sometimes when the scripts give you too much freedom, you know, like, you have make files or bash files and so on, the more freedom you get, the more dangerous maybe the script feels. And I always say that, you know, Bash is the only language that allows you to shoot yourself in both feet with a single bullet. But, a a lot of times, people will feel uncomfortable contributing to those kinds of scripts because they're sort of dangerous.
And so having a really easy to understand syntax, a safe syntax that people can just dabble and play around with safely, I think is really important. The other principle, you know, sort of designing for is the idea that you should be able to unblock your team very fast. So say you have a, you know, a failure in your CI, and now maybe a significant part of your team is unproductive because maybe the integration test is not working or something that they need to test out is sort of not easily accessible because there's this other blocker. And so optimizing for fixing CI issues, I think, is also a significant considerations to to sort of take into account.
And I think this goes back to that repeatability idea that I mentioned. But, yeah, if you have a good way of quickly finding the root cause or or reproducing it in an environment that you can control easily, that goes a long way to improving the MTTR
[00:11:24] Unknown:
of the CI. Yeah. CI CD is definitely a large and complex and constantly evolving topic. And I guess in that light, what are the components of the overall CICD process that Earthly is aiming to be part of or potentially replace?
[00:11:42] Unknown:
Well, a lot of what CI entails these days relates to the experience in a web app. Right? So a lot of the the CI that you consume today is through a web app. And as we started early, we've also thought about the idea, you know, what if bills were more accessible from your terminal. Right? That's where a lot of people work in. Right? The the place where they feel most at home and so on. And what would that mean, right, if if everything was accessible through the terminal? And in our you know, the the things we address today are basically the process of you working with your code on your computer, for the most part, and iterating on your code in development.
And then there's a part that we do, which is sitting on top of your existing CI and being able to run bills consistently for you. And so when it does fail in CI, there's a single command that you run on your computer that will reproduce that failure with a certain degree of consistency. Of course, CICD is is broader than that. You know, there's also a push into production and so on. 1 of the things we're seeing from our users, which is, you know, sort of strange, but sort of natural that we feel like it was bound to happen. And that is people are starting to use their heat to push from their own computers, which is something you don't normally do. Like, many people so far have been using CI to do pushes to production mainly because of that guarantee of consistency that there's that environment. There's nothing specific to the developer in that environment, and there's no surprising configuration that can, you know, come up in that process.
But because our fleet gives you that sort of layer of consistency, we're seeing people being comfortable with pushing from their own computer for that reason. So it's again the case that the developer's configuration doesn't influence the pipeline, and it's actually safe to do that. So, you know, why not? It's just an implementation detail. The fact that it runs on your computer, you're getting the same effects, and that is, you know, the consistency of the push. But, yeah, we have plans for the future beyond what we do today. I'm not allowed to talk about everything that we'll do in the future, but just a hint for the audience, we're gonna build something really exciting in this space, and we think it's gonna be,
[00:14:00] Unknown:
you know, the biggest shift in the industry since Jenkins. All I'm gonna say about that topic for now. Well, that that's definitely quite the claim to make. So definitely excited to see how well you deliver on that. And going more into the kind of CICD question, there are couple of interesting things to dig in to where 1 of the points that you mentioned is the risk of the developer's environment leaking into the build context and how the CI framework or the CI execution environment is seen as this trusted location where everything is as it should be. And so that is what we're going to use to deliver into production. And then the other thing that you hinted at was the question of pipelining of builds and delivery. And so I'm interested in exploring what the capabilities of Earthly are in both of those regards of being able to manage this isolated build environment so that you can be sure that there is this trusted execution. You don't have to worry about whatever version of Python or your environment variables or specific differences in the requirements file that might exist on your laptop.
And then also the capability of being able to do pipelined builds of saying there are multiple stages to the execution that I need to make sure are run-in this particular ordering and how well Earthly is able to support those those different concerns?
[00:15:18] Unknown:
Yeah. Exactly. So like you mentioned, every sort of automation system will come with its own sort of variance. Right? And what we're doing with Earthly, we're trying to eliminate as much variance as possible. It's not a 100%, you know, deterministic, and I think that's an important distinction to make. You know, we're not reproducible. We're just repeatable. We make that distinction because the outputs of Earthly will not be bite for bite exactly the same. You know, there might be timestamps in there. There may be slight differences that are typically inconsequential. We made this deliberate choice of staying away from reproducibility in its pure form or not staying away, but not enforcing it upon the user. And the main reason being that in order to make a bunch of traditional tooling work in that manner, requires extra configuration or is just incompatible with that world, and so we did not want to limit the user. At the end of the day, the user needs to ship to production real apps.
And sometimes in the real world, you make imperfect trade offs in order to deliver on time and so on. But it does provide you with some level of consistency. And, you know, as far as we know, it's probably, you know, what most people need anyway. So what's the variance that we eliminate? We eliminate, for example, the flavor of the operating system or the dependencies you might have installed on your computer that maybe interact with the build or not, or any sort of programming language specific tooling, like, you know, which version of Python you might have installed, which version of PyPI, and so on. And then you might also have environment variables and local configuration and so on.
We make that uniform across the different environments we run by containerizing the build. Right? And so those factors that typically affect your build or create differences and prevent you from reproducing the failures in CI or the failure that another colleague is getting on their laptop. Those differences are sort of flattened by the use of containers. And that's sort of the core of our design.
[00:17:32] Unknown:
In terms of the challenges that folks run into when they're designing their build process beyond just the variance in environments between the developer's local machine and the CI execution environment? What are some of the other complexities or sort of conceptual issues that folks run into or just lack of education in terms of proper build and delivery and any that are particularly relevant to the Python ecosystem? And then I guess the other question there is whether you found it to be mostly aligned along the language that these differences come about in terms of challenges versus the specific type of application such as a web app versus a mobile app versus an embedded software device or a data science project and things like that? Well, 1 category of issues is
[00:18:24] Unknown:
what I have mentioned, you know, the ability of the team to contribute to the build process and not just 1 person. And we we're seeing this over and over again in in many teams that there is this person that we call the build guru. And this is someone that wasn't actually hired to do builds necessarily, but for 1 reason or another for in an ad hoc manner, they've just taken on this responsibility. And we're seeing this person put together the entire build. And because it becomes very complicated to achieve some of the same things you would achieve with directly, like, you know, containerizing your build or, you know, maybe contributing across repositories and so on. Some of these things you have to sort of do in house, and they get pretty fiddly with traditional technologies.
And so that, again, sort of prevents other people from contributing to the build. It's just so difficult to understand. As another class of problems, you know, it's not really Python specific, but we've seen that whenever multiple ecosystems come together, you know, say it's a Python team working with the Java team or Python team working with the JavaScript team, it's always hard to sort of build the other team's code and maybe put everything together in an integration test. And 1 of the reasons is that, you know, you have to install the tooling, you have to configure it, and so on. Another reason is, you know, the individual team members of each of these teams aren't familiar with that ecosystem.
And so the simple thing of setting up your environment could take a whole day. And I think the worst part about bad builds, you know, the most common mistakes that happen aren't sort of what you might think that, you know, there's something catastrophic that happens to production or something. You know, I think of course, it's bad to have an outage or 2, and it could definitely be caused by bills. But I think for many teams, it's far worse to have a team that is not productive all the time than to have just a couple of outages here and there. And I think, you know, builds that are difficult to contribute to or builds that create barriers between teams or not necessarily create, but, like, that's sort of the default mode of operation. Right? Most language specific ecosystems will sort of encourage teams to use those language specific tools as well, which other teams will not be, you know, familiar with. And so all of this combined will create barriers between teams. And, you know, being a manager myself of engineering teams, I've always seen siloing or cross team collaboration being sort of number 1 problem. You have to sort of solve for and get everyone, you know, working together.
And, you know, if you have a good tool to address that, I think that's a huge win for your team, for your productivity,
[00:21:13] Unknown:
you know, for organization. Digging more into the Earthly project itself, I'm wondering if you can talk to some of the ways that it is designed and implemented and how you thought about the specific syntax for being able to provide the logic as to how these different builds should be executed.
[00:21:31] Unknown:
So Urkli is we describe it typically as sort of like make file and Docker file had a baby. So the syntax looks very familiar. We're sort of learning from the best things that Makefile and Dockerfiles gave us gave to us, and taking some of the best ideas and hopefully none of the difficulties that sometimes come with with Makefiles or any, you know, such technologies. As an example, you know, I've worked with Mayfiles a lot, and I've had teammates that, for the longest time, just didn't realize you have to put a tab in front of your recipe in Mayfiles. You know, this such a small gotcha, and it's just a little example of how Makefiles can be pretty difficult to adopt, you know, as a newcomer. And so we decided on this syntax. You know, it reads kinda like English for the most part. We don't have, like, tricky symbols.
If we do it, they're very well thought up. You know? Or some biased. The idea is that everyone should be able to read the build file and contribute to it with ease. And if you're coming from Docker files, the syntax will look really, really familiar. We're basically extending that syntax and making it more more powerful. So, like, if with Docker files, you can only build, you know, Docker images, but with Erick, you can build not just images, you can run unit tests. So, like, execution that doesn't necessarily have an output, you know, like an image or or things like that. Or you can build binaries. You can build, you know, regular files as artifacts, or you can push to production. These are sort of the typical things you might do with Earthly, and that's what we designed for.
In terms of the implementation, we use buildkit. This is a project created by the Docker team. They use it to power the way Docker files are built. This is the new generation of the build. Before that, I think it was called Docker builder or something along those lines. The new generation is now parallel. It has really strong caching primitives, like remote cache and shared caching. And you can basically use any Docker registry as a cache repository as well among all kinds of really interesting features. But, basically, we're using build kit as the execution engine under the hood.
And, really, we're creating a graph of the build with all the dependencies that go on between the different parts of the build, this direct acyclic graph or DAG, if you will. And build kit figures out which parts can run-in parallel, which parts cannot run-in parallel. And, of course, for many people who are familiar with make files and the parallel option of, you know, make minus j, they probably know that it's just so hard to get right that most people don't actually use that. And the reason is that make files basically reuse the environment, the current directory for every target that it runs, and they'll be stepping on each other's toes every time they would execute. Whereas with Earthly, it's kinda like you have every target in the make file containerized.
In that way, it's fully isolated from the outside world, and there's no way for it to interact with other targets unless you explicitly tell it to interact with that other target. Maybe there's explicit dependency. You know, 1 artifact needs to go from 1 target to another. But that sort of isolation frees up the system to automatically parallelize without any strings attached. And that is 1 of the benefits that Erfie provides. Basically, full parallelism as much as it can do based on the build graph and, you know, giving that sort of speed benefit to the user. Underneath build kit, there are typical container primitives like run c and, you know, overlay FS, the file system that allows for sort of trees of files to be derived and extended while keeping the first and the version of the initial tree. That is how Docker layer caching works, and that's what we all supported to Earthly as a primitive.
Yeah. I think that's sort of high level overview of of how Earthly works under the hood. So as you have been
[00:25:30] Unknown:
exploring the space of how to build the system that makes it easier to have a consistent environment for managing software builds. What are some of the early ideas or assumptions that you had that have been challenged or updated and some of the ways that your exploration of the space has influenced the design of Earthly from when you first began?
[00:25:53] Unknown:
Well, from the very beginning, we wanted it to be, you know, very easy to adopt. And, again, try to play into some of the existing knowledge of the typical users. Like, if you knew Docker files, Earthly would be a natural step forward, for example. Maybe over time, the 1 thing that we haven't really thought about from the beginning was the emergence of Apple Silicon and just the ARM architecture growing into a strong player. Right? And so last year, the m 1 laptops came out, and also AWS Graviton instances have been around for a while. You know, for anyone who doesn't know this, AWS Graviton is the arm specific line of compute that Amazon provides. And for many people, running on Graviton is just far cheaper because of the efficiency of the processor and so on. And so with the m 1 laptops now, I think there's gonna be a shift in the next decade or so more towards ARM processing or at the very least, a mix between Intel and ARM as sort of both of these working together in production.
And not just in production, of course, you have to work together on the development computers. And so we've added support for multiplatform builds as a result. And many people use it today. It's super useful. You can basically build images that run either on ARM or Intel. The same image basically is packaged under the same Docker tag, goes in your registry, and you can use it on either platform. Or it builds that basically build the same thing, but for your native environment. So maybe you're running Intel in production, but if it's running Python, you know, you can just run it natively on your ARM laptop, it'll behave the same. Or if for any reason, you just cannot create the equivalent native execution of a certain process, you can still emulate it. And Earthly also provide this trend of scenes we use QEMU.
They spelled qemu. It's this emulation layer that can basically pretend like it's a different processor and run, you know, binaries of a different processor. And that way you can have bills, for example, that, you know, some parts build on Intel, some parts build on Arm, and then you can put it all together and still make sense. It still executes consistently across environments. That's probably the main thing that sort of we've designed into the system after the fact.
[00:28:23] Unknown:
And another interesting aspect of the overall build and delivery question is what it is exactly that's being delivered, where in recent years, a lot of that has become Docker containers, but there are also, you know, binary builds of a process, or you might be building a PEX archive of a Python project, or you might need to create a Debian package or what have you. And so I'm curious how the specifics of the actual output artifact influences the ways that somebody might use Earthly or some of the ways that folks might need to think about the surrounding tooling that they're using to either scaffold Earthly or within the actual execution context that Earthly provides?
[00:29:12] Unknown:
Yeah. Exactly. Well, a lot of the modern workflows for machine to production involve images, but that's not the whole story. Like, every company somewhere will have some kind of package, some kind of artifact, or it could be intermediate files or files that are provided to AI and ML workloads, you know, data files and so on. There's just so much else out there that just thinking that you ship images to production might not be complete. Or maybe it is true for your organization, but there's definitely some intermediate steps in there that you might want to really take into account. So, yeah, you're right. You know, I think beyond just images, you will have things that are specific to 1 use case or another in such categories.
And 1 of the things that went into the design of Verity was actually to support those use cases as well. You know? There's always some file that has to go from 1 repo to another, and it's really difficult if it's not supported by some language ecosystem. For example, if you have protocol buffers, you know, this format that allows you to serialize data between microservices or when you store them in the cloud or or things like that. With protocol buffers, you have to generate language bindings, language that you use. Right? So you might be generating language bindings for Python, but also for maybe Java if there's a Python process communicating with the Java process.
And so those bindings are artifacts. Right? And if those aren't treated like artifacts, it will become very difficult for each individual developer to get those or to refresh those. Right? You have to install, you know, protoc. You have to install Python specific extensions and then Java specific extensions. All of these 3 things come from different package managers. And in fact, protoc doesn't come from any package manager. Actually, you have to download it manually. And so these are sort of things that creates that variance that we were talking about. You know, you could be having a different protoc version compared to your colleague. And now every time you run the build, maybe it shows up that you have differences in your generated code. But, actually, you know, that was not intended.
Or it might just be very difficult for your colleagues to install all those little tools in order to contribute to the code base. So, again, these are things you could wrap in Earthly or, you know, something that basically creates consistently the same output and then reuse that output maybe across repositories. And for many traditional technologies, it's actually really hard to import things across repositories. Like, you have to maybe create a package or upload it to s 3 or just do Git clones on your own and sort of run the build on your own and have more scripting around that. That is painful, cumbersome, and leads to a lot of sort of human errors.
You can get a lot of things wrong by doing that yourself. In Earthly, we provide that out of the box. Basically, you can reference any artifact from any repository and just import it automatically. Behind the scenes, it runs the build for that artifact and just, you know, provides it where you need it. And we've seen this being used over and over again, protocol buffers, especially for just about any sort of in between sort of artifact that maybe it goes to production, maybe it doesn't, but you have to have it, you know, in some integration test where you have to pass it along between projects. And it's really cumbersome when the language specific ecosystem doesn't give you a way to do that.
[00:32:50] Unknown:
In terms of the workflow of somebody who wants to adopt Earthly, particularly if they're working on a Python project for the sake of this audience, what is the process of starting to adopt Earthly and incorporating it into an existing application? And then once you've explored that, maybe what is involved in using it in a greenfield application where you're bootstrapping the whole build process with Earthly and just some of the considerations in terms of the collaboration aspect of educating your team members on its use and adoption and just some of the thoughts that go into actually making Earthly be a core part of your software delivery process?
[00:33:33] Unknown:
Well, the first thing you can do is go through our onboarding tutorial, our getting started tutorial. And we have examples for Python that typically handles basically the typical sort of shape a Python project might have. Like, maybe you have a requirements TXT and you install dependencies, and then you maybe package an app and put it in a container. If you're coming from Dockerfile, you might already have a Dockerfile that is executing the build for your Python image, you can reuse that inertly or copy paste and reshape it a little bit, and it'll just work. If for any reason you're not creating a image for production, again, Earthly is not just for images. You can still build Python packages that you maybe ship to another team or libraries that you reuse in another project. Those are things we support as well.
[00:34:22] Unknown:
That's sort of the way you typically get started with RCA and Python. For the case where maybe you're building a web application, so you have both Python dependencies and test execution, and maybe you want to package it up into a PEX archive so you have a single executable. And then you also have all of the JavaScript and CSS that go along with the web application that you need to compile through Webpack and Sass and whatever other tools, Shane, you'd decide to use there. And then you want to either bundle that up all into a single final image so that everything's being served, you know, with the static assets and API proxying through NGINX. Or maybe you want to have 2 different artifacts where 1 goes to a Docker container running into Kubernetes cluster, and the other 1 goes up to some object storage for getting delivered by a CDN.
Yes. How do you think about designing the build process for all of those concerns?
[00:35:15] Unknown:
So you might have this in multiple repos or in a larger monorepo, but what you would typically do is craft the build for each of these individual components. So, you know, maybe it's something that processes your CSS. You put that process in a NERD file, for example, or maybe you have something that generates HTML that is ready to go to production. Put that process in a nerd file. Or maybe you have something that, I don't know, creates a Django app out of your Python. You put that as well in a nerd file. And then you might want for some of these things to sort of meet in the middle in an integration test. You can actually have an Earth file that imports all of these results from all these 3 places and puts it all together in this integration test and executes maybe a battery of end to end tests on top of that. But, also, you might have things like, let's say, you know, each of these individual components could be shipped to production either together or independently.
So, again, you might use the import mechanism to again sort of package everything together as a single unit, Or you might execute pushes independently for each of these components where, you know, you push your, I don't know, CSS to s 3 or, you know, wherever NGINX gets its data from, where it packages a full NGINX image together with NGINX. So, like, the possibilities of composing builds are endless, and Berkeley provides that or allows you to really go cross ecosystems. You go from, like, you know, the NGINX ecosystem to CSS, to front ends, to JavaScript, Webpack, and Django, and Python, and so on. And because everything runs on Linux and everything in Earthly is containerized, it will work on Earthly, and it will allow you to sort of compose these different processes, each of which are containerized and isolated and consistent, and then allow you to put them together for production use.
[00:37:12] Unknown:
In terms of the sort of capabilities of Earthly, what are some of the things that you find people either most often overlook or misunderstand or don't execute to its fullest extent that you think are worth calling out? I think many people
[00:37:28] Unknown:
don't use this cross repository referencing as early as they could. I've seen people use Urly for the consistency aspect of it. And then, you know, for 1 reason or another, we ended up telling them that this existed as as a feature, the ability to import things across repositories. And it's actually really, really easy, and it works exactly the way you expect it to. You know, you can pin it to tags and branches and things like that. Their mind is blown, and they're already fans of Orifly, but but now they like Orifly even more. So that is 1 feature that we've seen sort of surprise people and is sort of this nice little surprise that they find out about later on. Another 1 is maybe the idea that as we build for different CIs out there, every script, every, maybe, YAML file or Jenkins file or what have you that is specific to your CI It's just that it's a script specific for your CI, so it doesn't work for anything else. And a lot of people I've seen migrate to EDFiles, and then they realize they're no longer tied to their CI, and they can migrate anywhere. You know, it's this build that actually runs anywhere.
And now that they are on Earth files, they've gotten new wings. They're now free to choose any other CI. And for reasons outside of the build itself, like, I don't know, the way the builds are hosted or the workers are hosted or the pricing or the way secrets are kept or any other sort of external feature could be reasons for which people actually switch CIs as a result as well. So this is sort of this vendor lock in prevents is sort of another maybe surprising
[00:39:09] Unknown:
feature that Thirdly gives you. Your point of being agnostic to the CI framework or provider also brings up the interesting question of the Docker and Docker problem where Earthly is using Docker for being able to containerize and execute these builds, but a lot of CI systems are actually also using Docker as their execution environment. And so I'm curious what types of challenges you've run into as to how to allow Earthly to work well within those systems where maybe you don't have the ability to run Earthly in privileged mode to actually execute Docker. Just some of the complexities that come up with this whole explosion of containerization as the de facto mode of execution?
[00:39:55] Unknown:
We've gotten pretty good at supporting CIs, and there's there's some way to run Earthly in every CI. Maybe there is 1 that we haven't quite figured out yet, but we're working on it. It is Bamboo CI. It's not necessarily the most popular, but we, anyway, want to sort of cover the whole market. Docker and Docker itself has been an evolving thing, and most CI vendors are realizing that you're not just building Docker stuff in your CI, you're also running Docker stuff, which means you have to be able to run Docker for 1 reason or another, you know, integration testing or whatnot. And so they've gotten much better at at supporting it as a technology. In some cases where Docker and Docker is not supported, there's a way to use an external Docker daemon in such CIs, and that works just as well for Earthly.
[00:40:45] Unknown:
Yeah. It's a ongoing sort of battle with certain more obscure CIs, but there's only very few that we don't support. Another interesting aspect of what you're doing at Earthly is that you're building a business around it, but it's also an open source project, which always brings up the question of governance and how you manage the project road map and how you think about the division of what is open source versus what is commercial and how to maintain that dividing line. Yeah. I should maybe mention,
[00:41:15] Unknown:
we are open, but we're not open source by the definition of the open source standard. The key difference that we prevent is a Tech Jam from copying us. And, you know, there are many reasons we do this, but the license we use makes the project become pure open source after 3 years. So that's sort of a balance between purity and allowing also us to, you know, guard our lunch, not leave anyone to eat our lunch. And I think that's a good sort of way to innovate on top of such a platform. But what we've seen in terms of what is open and what is not across maybe the different projects out there is there's typically some kind of some kind of barrier that sits between the open version and the commercial version. And that barrier is important to be a real barrier, not an artificially imposed 1. Right? So, for example, a bad barrier could be number of users or something. Right? That is something anyone could edit in the code out and say, you know, users greater than 1, you know, just remove that line or something. Right? So it has to be a real barrier that creates technical challenges of some sort.
So for example, when you look at HashiCorp Vault as a product, the free version runs on 1 node at a time, 1 machine at a time. The commercial version runs either across a cluster or across multiple regions. And from an engineering standpoint, that's a clear barrier. There's a technical difficulty to making it work at scale. And in general, most projects that are successful commercially like that have this sort of natural barrier that makes it difficult to work with it at scale. Other examples are, you know, Hadoop, which is, like, really, really hard to manage in production to deploy. Where the Confluent Kafka product, where Kafka is sort of, you know, the streaming service, of course, in sort of simplistic in terms of the way the primitives it provides. But, again, it can be very tricky to manage in production, and that's where Confluent is helping companies do that better.
We've seen this as a pattern. I think it's compared to the traditional days of when, you know, the only way to monetize open source was just providing services like Red Hat, you know, consultancy and things like that. Those aren't quite as appealing to investors in Silicon Valley nowadays because the margins are pretty small. But this new model where the vendor hosted for you and gives you extra scale, extra management, extra security, or something in that region seems to be working really well as a recipe for commercial open source. In your work of building Earthly and working with the community of users
[00:44:02] Unknown:
and managing the business around it, what are some of the most interesting or innovative or unexpected ways that you've seen the Earthly project used?
[00:44:09] Unknown:
We've seen Earthly used in just about anything. Like, we've seen Earthly used in airport scanners, you know, the things that check your luggages for, you know, I guess, terrorist activity or or trafficking of some sort. We've seen Earthly used in a lot of open source like Jackal, the framework that turns markdown into HTML, or the Elixir Phoenix framework, the core framework of the Elixir language, or a bunch of other really interesting open source. We've also seen it in health care, in energy, high-tech, low tech. Just across the board. There's no, like, single type of company that sort of stands out as the, you know, typical user of Earthly. It's just across the board. And that is maybe 1 of the more surprising parts. You know, we've sort of designed Earthly initially with a goal of making it great for the back end engineer especially, but we're seeing it used in many other areas like embedded and front end and so on. Maybe we weren't quite planning, but we're just happy. It's it's helping more people than we thought. That's probably the most surprising part. In your experience
[00:45:18] Unknown:
of building the project and the business, what are some of the most interesting or unexpected or or unexpected or challenging lessons that you've learned in the process?
[00:45:25] Unknown:
Well, I guess to some extent, the industry already knew this, but we had to learn it, you know, as we went along. But, you know, languages in general are difficult to get right. It's always very tempting to just include every single feature that the community asks for. And in the process, as a byproduct, you're creating a language that is very difficult to understand by newcomers. And so over time, we had to sort of pace ourselves and really think deeply about every single thing that we add to the language to maybe not impose more difficult learning curve to the initial user while still providing all the capabilities that, you know, people need in order to achieve what they typically achieve through their CI. Right?
And so in that learning process, we came up with a bunch of principles that we use when we design every new feature at Earthly. And this was inspired actually by a talk by Brian Cantrell, the founder of Oxide Computer. This talk is about the platform values of Rust or something along those lines. And the argument that he made was basically, you know, every value that you can come up with is important. There's no question about that. Like, for example, you could imagine, you know, what if your value was that your language was entertaining. You know? Certainly, that's maybe a goal for some people, like, you know, some of, you know, meme languages out there or there are a few of of these that are sort of made for that purpose. But, really, all values are important. The important thing is which ones you prioritize for your own platform because all of these are actually intention.
And you have to prioritize ones over the others. And so he gave the example of c and c plus plus where c prioritized simplicity, whereas when c plus plus came out, it eliminated simplicity from its set of values and instead put in expressiveness. We can see now after the fact where that led us to as, you know, the c plus plus language and how that made things more complicated, more difficult to understand. And even though it feels like c plus plus is newer and more powerful than c because it can do more things with the syntax, It's actually in many ways, it's also a step backwards because now the language is less readable and it's it's harder to understand. There's some magic happening behind the scenes that means surprising.
There's a lot of bugs and so on. But this sort of tension between values is what prompted us to think about what our values should be, and we come up with these 3. So number 1 for us is versatility, and that is the capability of the build system to achieve anything that the build system should. Like, every process you might have in your CI, including pushing to production and building binaries and running basically any sort of Linux specific executable and supporting those workflows, fully. Our second value is approachability. So the approachability stands for the readability of the language, the friendliness to newcomers.
And again, goes back to that principle of, you know, every engineer on the team should be able to easily contribute to the build, and the language should be friendly and accessible enough for that to happen. And then number 3 is reproducibility. This idea of consistency. And, of course, as we know, we're not fully reproducible like Basil is, but we're striving towards that. And because of the ordering, we've put versatility in front of reproducibility, which means if there's a need to get some kind of software built, we will allow you to build it first and foremost rather than make it difficult for you to adopt with Earthly because we hold dear to this reproducibility value. This is where the prioritization comes along and is very important for us. The fact that it's 1st and foremost versatile and approachable, prompts maybe all the pureness of reproducibility.
Yet we still provide you with as much, you know, what we call repeatability in exchange.
[00:49:32] Unknown:
And for people who are interested in improving their ability to manage consistency between their development and build environments and they want to optimize their overall delivery workflow, what are the cases where Earthly is the wrong choice so they might be better suited with 1 of these, you know, monorepo build tools such as pants or using a different type of CI tooling or building their own homegrown system.
[00:49:59] Unknown:
Today, we don't support mobile or Mac native or Windows native apps. Of course, we run on Mac and Windows, but we execute Docker workloads in there. And, of course, if you've adopted PANSS or Bazel, in many cases, you're just okay with that. You know, you're already in the promised land, so to speak. You're getting the benefits of what we provide as well through those other tools. Of course, if you need to adopt them, that might be very difficult for you. And that is where maybe Earthly could be easier to implement in your system. Yeah. Those are typical areas.
[00:50:35] Unknown:
As you continue to build and iterate on Earthly, and I know that there are certain things that you're not at liberty to say as you mentioned earlier, but what are some of the things you have planned for the near to medium term?
[00:50:46] Unknown:
Yeah. So Earthly in its current form will always be free and open. Behind the scenes, we are working on a commercial offering of Earthly in the cloud. And, again, I'm not allowed to talk too much about this, but it's something around builds, it's something around the cloud, and, something that involves also artifacts and so on. It has a bunch of features that we've never seen as an industry before, any such product, and these features make the productivity of the developer
[00:51:17] Unknown:
so much more efficient than ever before. And we're gonna launch this later this year, and we're really excited to, you know, to have people on board, try it out when we do. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. This week, I'm going to choose a book called shape up from some of the folks at base camp that talks about the ways that they think about the selection and execution of software projects. So it's definitely a very interesting approach.
You know, it builds a bit on some of the agile principles without being adherent to any of the more dogmatic formulations of it, such as scrum or kanban. So definitely worth taking a look at and considering how that project management approach might fit into your own workflows. And so with that, I'll pass it to you, Vlad. What do you have for picks this week?
[00:52:09] Unknown:
Yeah. My pick is Andy Grove's book called High Output Management. If you're becoming a manager, I mean, coming from an engineering background, I think this will be a great book for you. This is how sort of I got introduced into management. It just opened my mind so much. It basically treats every management problem like an engineering problem for the most part while educating you about sort of the the motivation behind, you know, the people's behaviors and how you can motivate your team and all that good stuff that comes with with management.
[00:52:41] Unknown:
I've always quoted things from that book. Like, ever since I read it, I always underline things in there. And, yeah, it's such a sort of reference book for me. Alright. Well, thank you very much for that all to take a look. And thank you for taking the time today to join me and share the work that you've been doing at Earthly and your thoughts on the overall space of software build and delivery. It's definitely a very interesting and complex and constantly shifting domain. So I appreciate all of the time and energy that you've put into helping make it a bit more attractive. So thank you again for that, and I hope you enjoy the rest of your day. Awesome. Thank you so much, Tobias. Thanks for having me. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at at dataengineeringpodcast.com for the latest on modern data management.
And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Guest Introduction
Vlad's Background and Career Journey
Introduction to Earthly and Its Origins
Comparison with Other Build Tools
Core Principles of Software Build and Delivery
Earthly's Role in CI/CD Processes
Design and Implementation of Earthly
Challenges and Evolution of Earthly
Adopting Earthly in Python Projects
Capabilities and Misunderstandings of Earthly
Open Source and Commercial Aspects of Earthly
Design Principles and Values of Earthly
Future Plans for Earthly
Contact Information and Picks