Docker Best Practices For Python In Production

Hello, and welcome to podcast.init,

the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With 200 gigabit private networking,

scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale up. And for your tasks that need fast computations, such as training machine learning models and running your CICD pipelines, they just launched dedicated CPU instances.

They've also got worldwide data centers, including a new 1 in Toronto and 1 opening in Mumbai at the end of the year. So go to python podcast.com/linode,

that's

linode,

today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show And to grow your professional network and find opportunities with the start ups that are changing the world, then AngelList is the place to go. Go to python podcast.com

/angel today to sign up. You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.

For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media, Dataversity, and the Open Data Science Conference with upcoming events including the O'Reilly AI Conference, the Strata Data Conference, and the combined events of the Data Architecture Summit in Graphorum.

Go to python podcast.com/conferences

to learn more and to take advantage of our partner discounts when you register. And visit the site at pythonpodcast.com

to subscribe to the show, sign up for the mailing list, read the show notes, and don't forget to visit the site at pythonpodcastdot

com to subscribe to the show. Sign up for the mailing list and read the show notes. And to help other people find the show, please leave a review on iTunes and tell your friends and coworkers. Your host as usual is Tobias Macy. And today, I'm interviewing Itamar Turner Trauering about what you need to know about running Python workloads in Docker. So, Itamar, could you start by introducing yourself? Hi. My name is Itamar Turner Trauring.

I help,

teams using Python to ship features faster,

and

using Python for about 20 years. And most recently, I've been spending a lot of time focusing on figuring out how to

really run, Python applications

on top of Docker in the best way possible. And do you remember how you first got introduced to Python?

Yeah. It was

1999,

and I was a web developer doing PHP.

And I heard about this

web framework called Zope,

which had a lot of really neat features. It had a built in management UI. It had a built in transactional database and object database.

So I ended up learning how to use Open,

built some websites with it, worked on a dotcom where where we ran Zope. And then at some point, I decided I didn't wanna be a web developer anymore. But since

Python was a general purpose language, I ended up using Python for a whole bunch of other things like distributed systems and networking and so on. And so

now as you said, you've been working with Docker for being able to package up and deploy Python applications.

And for anybody who isn't familiar with it, can you just start by describing what Docker is and some of the benefits that it can provide to this overall problem space?

Yeah. So Docker is sort of a generic name,

and the earliest implementation

of,

sort of a set of technologies for running,

applications in an isolated and reproducible manner. And basically,

it lets you take a

full file system for for your application. So all the files it needs to run,

basically a whole Linux distribution,

plus all your code

and package it into an image.

And then you can run that image, and when you run it, it will typically run-in isolation

from in terms of the networking stack, in terms of the other processes you can see. And so,

basically, it's a little bit like having a virtual machine except unlike virtual machines,

you can run a whole bunch of containers within a single

Linux kernel. And so you have less overhead in terms of running lots of Linux kernels,

And it's easier to connect the if you do need to access the shared file system, it's easier to get to it. But you get the benefits of the virtual machine, which is isolation

And the images let you sort of ship everything you need to run your applications code. And what was your motivation for dedicating so much of your time and energy into the specific area of using Docker for Python usage in production?

So I started using Docker a a number of years ago. 5 years maybe? I worked in 1 of the early

storage back end systems for Docker. So I had a bunch of experience with packaging applications for Docker. And so at a previous job, I was

packaging our our code up, for Docker, and I was using some of the latest features to try to get smaller images and faster builds, like, multistage builds.

And

I

noticed that even though,

we were in theory, we were using

some of the Docker's features to make faster builds, in practice, the builds are actually taking a long time. We weren't using any of the caching. Every time we built an image,

it would basically build everything from scratch and ignore caching.

And so I spent a bunch of time trying to figure out why, and I spent a bunch of time then trying to get our tests to run-in

GitLab CI, which is what we were running using Docker.

And, eventually, I realized that,

basically, there's a whole bunch of details

you need to get right, and many of these details are not written down anywhere.

And it can just take people like,

I then spent too much time just doing research to see what else I'd missed. And

it's just a huge amount of effort to get all this information.

It's

you have to wait through, like, blog posts that are 4 years old and they're for out of date. You have to log through a whole bunch of tutorials

that are using

things that are very much not best practices, things that are insecure, and that's fine for a first tutorial.

But they don't tell you that they're doing the wrong thing. So lots of people just copy them and, like, end up shooting insecure images.

It's just

it's 1 of those areas of ops work where you just have to get a lot of details right and it's not well documented. And so I figured there was value in spending the time to do the research and then write it up and so people can learn all the details they actually need to know to make things fast and secure and,

maintainable.

For

the overall use cases of Docker, there's a bit of a divide between people who are using it for their development environments just to isolate different projects and not pollute their laptop or desktop with all of the different dependencies that go into being able to get an application running,

and people who are using it for production environments where they need to package up all the code that developers are creating

and then be able to put it on their servers for end users to be able to take advantage of. And I'm wondering if you can just talk through

some of the differences

in requirement

as far as the level of effort and detail necessary for both of those different use cases.

And since production is your primary focus, some of the common edge cases and issues that engineers run into when trying to take advantage of the production capabilities for Docker. Yeah. And so

if you're

doing using Docker for development environment, and it's kinda nice for that because, for example,

whether you're in Windows or Mac or Linux, you can

run the exact same versions of the code. On Mac and Windows, it'll,

run a Linux VM behind the scenes and so transparently

proxy Docker commands to it. So it it's good for that, but you often tend to emphasize things like fast feedback cycle for a developer. So you want, for example, code reload

every time like, if you're doing web web application, it's easy to just every time you save a file to restart the web application or reload the new code.

And so you need you your Docker image will basically be hooked up to the

code check out the developer's working on and reloading from that.

Whereas in production, you would actually have a particular version of the code.

Another thing which I actually wrote an article about today, database scheme upgrades.

If you're

working as a developer or developing your code, the sort of natural way to do a scheme upgrade is as part of your code startup. So when you start your container, when you start your application, first thing you do is a schema upgrade, and the next thing you do is start an application. That way you're always running on matched version of the latest code and the latest schema. And for development, that's fine. For production,

that can cause problems because

if you upgrade migrate your schema database schema. If you migrate your data schema on application startup,

unlike a developer environment where there's only 1 process running,

typically in production, you might have 2 or 3 or 4 or 200 processes running your application just to scale across multiple machines.

And And so now you have 200,

processes trying to migrate this database schema at the same time.

And unless you're using some sort of data migration tool that actually can database migration tool that can deal with this,

this can corrupt your database because,

like, the tool most unless you have some sort of lock, like, you can't really migrate your database schema at the same

time. And so they they do have different requirements.

And

there's a whole bunch of other details involved in production like,

security,

matters, but when you're doing development locally, you don't really care about security. You care about ease of use.

Having a really large image is not really

a big deal necessarily because,

usually,

you the feedback loop is tied to

you're just mounting your code in. You're not repackaging each time,

and you're not copying the image around. It's just local in your hard drive. So a few 100 megabytes more don't matter.

So the the ways in which you use Docker are tend to if you want to do it right, are probably gonna be

somewhat different between packaging for quick developer feedback and packaging for what you're gonna run-in production. Yeah. There's actually a great presentation I saw a while ago called Heresy in the Church of Docker that talks about some of those differences in use case where a lot of people will see Docker and say, oh, it's just this panacea where it solves all of my problems. I just put it into Docker container and then ship it off to production, and I'm done. And then he talks through of the different layers that need to go into it once you're actually in a production environment as far as the security, ensuring proper networking, ensuring, you know, distribution across multiple hosts. And so I'll add a link to the show notes for anybody who wants to watch that. But,

yeah, the the the overall production,

the the overall set of issues that come in when you're actually running it in production are a lot more than people might initially suspect. And I'm also curious

if there's anything specific to Python itself as far as being able to build containers that are production ready that would differ from just the generic case of Docker on its own. A lot of it comes down to, for Python,

just the specific

tools that are different. So if you're if you're shipping a Go binary,

in a container, like, Go basically compiles everything down to 1 binary.

So packaging up Go is very easy. Just copy 1 file, and then you're done mostly. Maybe you need to install a few libraries.

For Python, you need the Python interpreter.

You need a bunch of c libraries you depend on. You need all the Python dependencies you libraries you depend on. You need your code.

You probably you have different choices for a base image. And, again, these issues apply to other languages, but you you wanna make the pythons there's different options for Python. There's different ways that,

people specify their dependencies.

So there's poetry and there's pipenv and there's just plain old requirements of text.

And so

for those, you want to do things in different ways. And then

often you want some level of you you want to install all your Python code into 1 thing often that's easy to copy over. For if you're doing a multistage Docker build. You basically

compile,

things like c extensions in 1

image,

that means you have to install your compiler and so on in that image and you'd see it's pretty big. And then you add a second image, which is your production image and you just copy over the compiled code. So your final image that you run-in production doesn't have a compiler in it. That means you're copying a bunch of files over and Python

will by default

install files all over the place and now you need to either use a virtualenv or do a user install. And so

basically there it's not it's not that it's fundamentally

different than many other languages. It's just they're just detail and just another small detail and another small detail. And over time, it just it's a it's a lot to keep track of and it adds up. And

as far as the biggest pain points that you see for people who are getting started with Docker and trying to shift it into a production environment, what are some of the ones that you have found to be the at least the ones that are initially most painful that people are trying to deal with and some of the advice that you have for overcoming those issues? Well, I guess there's,

the invisible

but potentially extremely problematic pain points of security.

And then there's the more,

obvious ones

people encounter. So

security is a sort of a whole issue on its own, so maybe you can talk a bit

a bit about it afterwards. But putting security aside,

some of the other pain points then feed into security problems. So I'll start with the the ones that sort of hit people first. Basically, when you're building a Docker image,

you are

installing,

in some sense, an operating system from scratch and then compiling all your code and then installing your libraries.

You have to install a whole bunch of stuff and build a whole bunch of stuff. And depending on how much compilation you have, how many libraries you depend on,

this can just

end up being a fairly slow process. And then if building a Docker image becomes a bottleneck in your development process, so you say you have some integration tests that use your Docker image

and now before you can merge a pull request you need to build a Docker image. If it takes 10 minutes 10 minutes to build your Docker image

then now

you've added 10 minutes to your feedback loop, to catching bugs and that can be end up being very expensive, especially as your team as the size of your team grows. And then

maybe less of

a immediate concern, but it just sort of it feels wrong as you can it's very easy to end up with really giant images. And if you have, like, a gigabyte image, it just takes a while to to download, so upgrades are slower. Use a bandwidth. It's not ideal. And

so there's ways you can deal with the with the the slow builds and basically Docker's build system has caching, but that then starts introducing security bombs.

So if you're caching

basically, the cache the way the caching works is you have a series of steps you're doing to build your Docker image. So you might say, I'm starting with the Ubuntu 18 0 4 image, and then I want that's step 1. And then step 2, I'm gonna install these packages. In step 3, I'm going to copy my code in. Step 4, I'm gonna build and install it. And

since you're, typically, only your code is changing, you cache the earlier steps. So you don't reinstall the Ubuntu packages you need every single time. And the problem there is, like, you can be 2 months in, and now

you haven't updated any of the Ubuntu packages that you're running in production, and so you haven't gotten any security updates. So you've gotten fast builds but at the cost of not getting security updates. And so you need a process in place to rebuild your image from scratch without caching, let's say, once a week or whenever there's a security update. Update. And often you need

build secrets. Another problem people often have is,

sort of a combination of security, nonsecurity issue is

often you need access to some,

private,

secret information to build your code. So let's say the private git repository

can only access it if you have the right SSH key. And

the Docker build is sort of an isolated file system on its own, so you have to get your

SSH key into the Docker build.

But if you're not careful, it's very easy to end up, leaking

these,

secrets into the final image. So there's a pretty decent chance that if you go and look at a lot of Docker images that are publicly available that were built from private

code that if you look in the image, you'll find an SSH key or some other secret embedded in the image, and people built it didn't even know about it. And there's

other annoyances like,

if you're not careful, you can it can be hard to shut down your docker image. Like, go hit control c, and then it'll just sit there. And then for 10 seconds, it'll time out, and then it'll kill it because

signal handling can be a little bit tricky to get right. And if you don't do it right, then

you never get clean shutdown. You always get, like, kill minus 9 after a 10 second time out. And there's also things like

if you're running in a Docker environment, it might be slightly different than running on the host machine. And so for example, for gunicorn,

that can result

in some cases in g

unicorn web server just freezing for a few seconds

because the it just expects certain locations in the file system

to be actually RAM file system. So but within the Docker container, they're always on disk.

And so,

I mean, time or disk is slow, suddenly Unicorn's heartbeat

system is blocked, and suddenly, Unicorn just freezes.

So it there's just

all these details that you have to get right,

and some of them

are somewhat unexpected.

And the, I guess, last security

issue that people don't

often seem aware of is that you really don't want to run your Docker container as a root.

The default on for unfortunately, many of the official Docker images is to run as root, and they don't discourage it enough.

And the problem with running in root is it

makes it much easier for attackers to escape the confines of the Docker container

and

take over the machine that's hosting them. So for example, in February 2019, there is

a attack that

allowed someone to basically escape Docker and get root on the host machine.

And

running your container as a non root user,

as an unprivileged user,

would have prevented that attack from happening. So

for security, the

sort of absolute minimum you should be doing when you're packaging stuff for Docker is making sure you don't run as root. 1 of the issues that you pointed out in that list is the problem of the overall speed of the Docker builds.

And you've pointed out the fact that there

are multilayer

aspects

of the container runtime,

and some of that can factor in as far as the way that the caching takes place.

And I also know that some of the ways that the Dockerfile syntax works, it can contribute to the overall size of different layers. So I'm wondering if you have any particular advice or

general resources or guidelines

as to how to make best use of that layering system and some of the potential of people who are writing their own Docker files?

Right. So I guess we can start with a quick review of the Docker image format and then how that reflects in the Docker file instructions for building Docker images. So Docker image is basically a set of layers. For For our purposes, you can think of each layer as a tarball, and

you untar the first layer, then you untar the second layer on top of it and then a third layer on top of it. And

you can't ever really delete

files from earlier layers. We can only hide them. So when you delete a fire file in the 3rd layer, it hides it, but it's still there like the 2nd layer if that's where it was.

So we have basically

these stacked layers,

And these layers allow you to have caching

because you you have a Dockerfile, which is the instructions for creating a Docker image. Dockerfile is 1 word. And

each,

line of instructions to first approximation matches up to a line and to a layer. So first, you create the 1st layer, then the 2nd layer, and the 3rd layer.

And so

your instructions for building might be

start with Ubuntu 1804

base image, which basically says use these previous layers,

which are marked as being Ubuntu 18 0 4 image. Use those layers as a base layers.

And then run apt to get install these packages.

So you've added new second layer and then copy in some code from the the host file system. That's another layer.

And then install that Python code. That's a 4th layer.

And then as part of the build process,

there's caching where

it basically

does a hash of if it's just a command like apt get installed, it hashes the text of the command.

So if you haven't changed that line,

then it'll consider not to have changed. And

if you're copying files in, then,

it'll hash the files you're copying in. So if the files have changed, it'll invalidate the cache. The files haven't changed, it'll have it'll

it'll be able to look it up in the cache.

And so as long as you have

a layer in the cache that has the same hash

as whatever you told it to do, it'll say, oh, I already have this layer

pre created. I'm not gonna rerun that step. And this is useful because

to build if you think about what it takes to install some code,

so you have to install a bunch

of packages like a compiler and libraries and dev headers, and that takes a while. You may have to download it. You have to untark. Like, package installing packages takes a while. And then you copy

in your code,

and and then you install the Python libraries. That might be more compilation and install your code.

And so all the basically just all this copying and compiling and downloading and whatnot just takes time. And the caching lets you say, oh, I don't have to redo this step. Like, I already have the last time you built this image you already installed the same Ubuntu packages

you don't need to reinstall them I already have a layer that has all those files in them already. Ideally you only

rebuild the layers at where things actually change so if you're only your code change you shouldn't have to reinstall any of the dependencies because they're the same. But in order to ensure that happens,

you need

to make sure you copy in and run the commands in the right order because once you invalidate

and then once you can't use a layer from the cache, all subsequent layers in your build are also invalidated, so you have to rebuild them. So the very first thing you do is copy in your source code, and then you install the Ubuntu packages.

If the files you're copying in have changed,

that invalidates the Ubuntu package install even though it's completely unrelated to your files.

And so when you're writing your Docker build instructions in your Dockerfile,

you want to do things

so that,

the minimal amount of the validation happens.

So first, you want to install the system level packages

because they don't depend on anything else, and there's no reason to redo that every time your code changes. And then you copy in just enough of your files to install your the libraries you depend on. And this is where some of the Python specific stuff comes in. You might be copying in requirements dot text. You might be copying in a pip file and file that lock depending

how you are managing your dependencies.

And then so you copy in your requirements at text, then you install those dependencies. And now if if your requirements of text doesn't change, you won't have to reinstall those dependencies because you'll be able to use the layer of the cache. And only then do you copy in your code, and by having that almost the last thing you copy in, if your code changes, it only invalidates

the last couple layers.

And you can reuse all the cache layers that have all their dependencies and all the system packages and so on. Another element that plays into this whole layering piece is the idea of the base image that you're working from. And I know that there can be issues as far as

consistency in terms of the way people tag those base images where

the latest might actually not point to the same container or even the specific versions might not point to the same container depending on how the original creator decides to construct their workflow.

And there are also issues with security problems contained in those base images that don't necessarily get addressed or that aren't necessarily obvious. And I'm wondering what your opinions

and advice are as far as how to address that overall problem of

where to

start in terms of building your containers of is there a particular distribution that works well

or, just general best practices around what to do for your base image

for what you're then constructing on top of? As a starting point,

and there's always specialized cases where the situation is different. But for most people,

you want as a starting point to have an image that's based on an operating system that has,

some level of stability.

So it's not so

the the fundament the libraries of chips aren't going to change radically

from 1 week to the next because it just increases your maintenance burden.

So Ubuntu long term support,

releases are good are a good base image in that sense, whereas Ubuntu's non long term support are less so because those get

updated every 6 months and suddenly all your libraries change whereas the long term support ones, you'll have stability for a couple years. So you wanna start with something like that. So Debian releases also get supported for stable for

stable Debian releases get supported for 5 years, I think the latest 1. So you wanna start with a

operating system that's fairly an image that's based on operating systems very stable.

And then

you

also

want to have a,

usually a particular version of Python.

And if you say you're using the current long term support Ubuntu which is Ubuntu 18 0 4

that ships with Python 3.6.

If you want Python 3.7, it's not installed and you have to get it from somewhere else.

And so to solve that problem,

there's an official quote unquote Docker image,

for Python

which is basically Debian Stable

and specifically they now have Debian Stable Buster

which was released,

middle of July 2019.

So it's actually very up to date packages and it's gonna be have security updates for the next

5 years.

So it's sort of right now it's quite up to date but you also can trust it to be stable.

And then they install on top of that particular Python versions. So you can get like Python 3.5 or 3.6 or 3.7

and the way you select that is there's a little in addition each Docker image has sort of the name of the image and then tags

And tags can point at different underlying images.

And so

1 way you can,

when you build your image, you can refer to it in a bunch of different ways. And 1 way is you can say I want

Python 3.7

off of Buster,

which is the latest Debian release. So 3.7

dash Buster. Or if you want a slightly smaller image, 3.7 dash slim dash Buster.

We can link to the,

this particular image in the show notes so you can see the different options. And if you link to 3.7 that means at some point there will be a new point release of 3.7 so it will go from where it is now 3.7.4

to 3.7.5

and you'll automatically

switch from 3.7.4

to 3.7.5.

Which

if you're not too worried about breakage from that minor version release then,

you can do that. Or you can say I'm gonna specifically build on 3.7.4,

then I know exactly

what,

Python version I'm getting.

And,

with the cost that you have to,

notice when 3.7.5

comes out

and manually change your Docker image to refer to that.

And

this gives you decent amount of stability.

If you really, really wanna be sure that,

the particular

image you're building off of doesn't change, Bitnami

also packages,

Python images, docker images in this style. But 1 of the things they have that the official images don't is that they will actually have

permanent tags

where they'll say, we guarantee that if you refer to this tag, we will never ever change what it points to.

Whereas the

3.7.4

bust their tag for the official docker images

could in theory,

in fact, it does get rebuilt every once in a while when

ever is in the release of PIP PIP, for example, they change that. So if you're worried about new version of PIP somehow breaking your build, you can use a bit Nami images and then link to 1 of their permanent tags

and as your base image, and then you'll know that it'll never change.

If you're the more stability you have, the the downside of that is you're not getting security updates.

And so it's always best practice as when you're building the image yourself is that

1 of the commands you run is updating all the system packages.

So you say,

you know, in addition doing that's for using a Debian or Window Assistant. So in addition to AppScan install,

you also

up AppScan update to latest packages,

just to make sure that, you have

the latest

system packages.

And,

again, if you're using caching,

that will only happen the first time. So it's also best practice to

once a week or if you're

paying attention every time there's a security list or if you're really paranoid once a day,

rebuild your images from scratch with no caching. And that way, make sure you always have the latest system packages.

And

just 1 final note about choosing a base image is a lot of people recommend Alpine Linux

because Alpine Linux gives you slightly smaller images.

However,

I actually personally think it's a bad idea to use Alpine Linux because they use a different

most Linux distributions use gnulypse,

as a standard library

for c programs, including Python. And lpine Linux uses,

musl, which is a different ellipsi.

And it's just different. And so there's all kinds of edge cases in there that

haven't been fixed or are there different and cause different behavior. And so there's you'll see reports of, like,

Python crashes because it has a different smaller stack size or string formatting for,

time stamps is different

or

weird edge case DNS differences. And Mussel has been fixed over the years, and

all the known bugs that I've heard people talk about have been fixed. But someone had to

notice this problem and report it, and they had to fix it. And so if you're using Alpine Linux,

we're more likely to have weird, obscure,

production problems.

And the benefit is

you save,

say, 80

megabytes, a 100 megabytes on your image size.

And, personally, I feel that,

production impacting problems, especially obscure hard to debug ones,

are a thing I don't want. I'm willing to have another 100 megabytes on my image not to worry about that. 1 of the other things that has been introduced to the overall Docker ecosystem

in the past few years are alternative

build tools that

issue the Dockerfile syntax in favor of

some other way of actually constructing the containers. So I'm wondering what your opinion is,

that overall trend, and if there are any particular tools that you have experience with and how they compare to Dockerfile itself.

It's on my to do list to do,

deeper research into these, and so I would,

consider this more of a preliminary. But

the problem with the way Docker

works

is that it's a daemon that's running on your computer, and then the command line tool you use sends commands that daemon. And

that daemon is running as a root.

And so

that results in a bunch of operational issues.

There's

in in CI systems, they're all they often sort of restrict the way it can run or a CI system's already running Docker, has its runtime environment. So they have to run Docker in Docker, which adds some to some weirdness, like, because you have this daemon, which is in a container off the side and doesn't can be pretty

confusing. In particular, if you're using GitLab CI, you should read the Docker and Docker instructions they have very carefully. And CircleCI

won't let

you run Docker containers the same way. And so there's some so 1 alternative route people have taken is,

I believe, the Podman project. The Podman project basically emulates

the Docker command line,

fairly thoroughly and completely. So it can it can package Docker files. You can do Docker run. You can do all the different commands that you can do in Docker except it's implemented not as a daemon, but as a command line tool that just runs without a daemon. And so that

makes it

simpler and easier to run, and you don't also need to run as a root often. And on my to do list, it's saying if that solves makes some of the CI build issues easier.

I believe also that the downside is that it doesn't

whereas Docker,

there are solutions running on

Windows and Mac. I don't think you can run Podman on Mac. Again, I have to do the research. Don't take my word for it. And then there are

sort of

different approaches which sort of abandon the concept of a Dockerfile altogether.

So 1 option

is Nix.

Nix is an attempt to make,

a packaging system where all builds are completely reproducible. So instead of saying I'm gonna use

this Debian package, and hopefully it's exactly the same as the Debian package I referred to last time, This Python package and hopefully, it's a Python package I ran out last time. With nix, you are guaranteed that you're it ident that you're getting identical inputs because of the way it works.

And 1 of the things Nix can generate is docker images. And so if you really care about reproducible builds,

Nix seems like

the sort

of a tool that is designed for that from the start and does it all the way from the level of system packages. Basically, everything is you're not using debbie or Ubuntu packages. You're using Nix packages. And so it gives

you reproducible builds

end to end,

and it can generate a Docker image. And I know there's some efforts also again,

getting rid of Dockerfile altogether,

which sort of emulate the

buildpack concept that Heroku popularized. And so they're trying to build similar buildpack systems for Docker where you don't have to go and

do everything yourself.

It sort of knows how to package things for you.

And that means you take away some control, but it also mostly just works. And then

there's a 4th approach to something that I've actually

tried to build. This is a product I've created, which is a template for pack Docker packaging for Python applications in production. And the

template approach is sort of a halfway between

the what Dockerfile

gives you, which is a huge amount of flexibility,

but also a huge amount of places where things can go wrong and details you have to get right. And the tool

where a higher level tool sort of restricts

what you can do at some sense because it's giving you a higher level of traction, and there's a reason you have to deal with all these details when you're building things. And so the idea with the template is that it gets you for

the majority of people, the majority of time. You don't have to edit a Docker file. You just edit a simple simplified config file and that does everything and it, you know, does all the necessary security checks and

security settings and it makes sure your images are small and so on.

But when you hit the edge case it doesn't support, you can fall back to the lower level,

more flexible thing and just edit the Dockerfile, edit the code to customize it.

And

there's a few other tools I'm

forgetting about that people are likely to encounter.

Basically, what started out as Docker is now a bunch of related

semi competing technologies that approach things in slightly different ways. What are the cases where

Docker itself is the wrong choice for a production environment?

And are there any useful alternatives that you have seen that address some of the same

problems that Docker is intended to cover? So for

some applications, you don't you're only running 1 application, so you don't care about the isolation aspects of Docker, and you already have a system with VMs.

So you can

just

build a VM

and, image,

the tool packer

and then run it on machine in the cloud like Google's cloud or Amazon's cloud

and

get the same sort of benefits

of

isolation and,

reproducible

builds.

And

potentially, then you you can, use it to log vagrant to run the image locally. So

there's you you can get some of you can get some of the same, benefits without using Docker, and then Docker adds

some level of performance overhead. And so not that big,

kind of because it's a few percent, but it basically you don't have that, and so it solves that problem. If you

consider Docker as a specific instantiation of a container technology, like, all the things we've been talking about are basically

generically Docker is a specific implementation, the idea of containers. And so there are other implementations of container runtimes, for example. There's Rocket. There's 1 other. I don't remember its name.

And,

typically, they can run Docker images. And so,

there might be operational reasons or security reasons you don't wanna run Docker daemon, but you can

run a effectively equivalent technology. And similarly, there's

tool there's runtime systems that ingest Docker images like Heroku or,

I believe, Elastic Beanstalk.

And their runtime,

it's proprietary soft. We don't know when it's running. The runtime is plausibly not Docker. It's just redeeming Docker image format. But you can just build a

single binary executable with a 2.5 installer.

And if you're

distributing

command line tools or

software people who run the desktop, like, technically, you can use Docker for that, but it's really not the

and and container

The idea of containers makes a lot of sense for that, but the the user experience in Docker is not really designed well for command line tools.

So

you probably don't want to use Docker for that. And using, like, py installer

might be a better choice if you just wanna install, like, a isolated thing people can run

or things like snap or flat pack if you're, distributing stuff for

Linux distributions.

And,

yeah, it's sort of

for

development environments

versus, like

I I guess another

approach is conda,

where

conda

comes with

a package that is its compiler, and it comes with pre compiled c libraries. And so you can

basically have a runtime environment that's,

in terms of, available libraries and binaries, that's different than your host operating system, but it's still just there's no,

it's not a container. It's not isolated from the host operating system. It's just

kinda provides

a a a different set of compilers and libraries for you. And so you get some of the benefits of having more control about what

specific libraries and packages and so on you're running without,

having this isolation that might make things more difficult if you need easier access to your host file

system. The overall

space of Docker and running Python in production is pretty vast, and we've only touched some of the preliminary

aspects of it here.

But are there any other

aspects of this topic

that we didn't discuss yet that you think we should cover before we close out the show? Yeah. I guess the thing that that I want to people to pay attention to is that much of the

documentation and examples you'll get for this

are often

not best practices. And to some extent, that's less of an issue if it means, like, your build's a little slower

or your image is a little larger.

But for security, that's actually can be an extremely

expensive problem. And so

at least for the security

aspects of packaging stuff for Docker, you you should just make really sure that you're doing it right.

And even the official

Docker

recommendations,

like, they they were sort of half heartedly say you should not run as root, then they make other recommendations that would force you to run as root.

And you really, really don't want to run as root.

It's just it makes your the the ability of attackers to escalate the their access much higher.

And

in general, just

don't assume that if you found a random Docker portal and went through it,

you know how to build

images that will that were good enough for production, like, the security issues at least. They don't it doesn't take a lot of work to get it right,

but you do have to do that work. And the vast majority of tutorials will just

completely gloss over all the problems you're going to have, like making sure your your caching doesn't lead to insecure packages and not running as root and so on. Alright. Well, for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the pics.

And this week, I'm going to choose the movie Shazam. I just watched that, and it was very entertaining and amusing

and, had a good time watching that with my kids. So with that, I'll pass it to you, Itamar. Do you have any picks this week? I've been watching,

Veronica Mars, the original

3 seasons,

and it's in in sort of interesting seeing a TV show talking about,

sort of class conflict and income inequality,

especially a TV show that at this point I think is 15 years old. And it's also just a fun mystery slash

teenage drama

combination.

Well, thank you very much for taking the time today to join me and share your experiences and knowledge of using Docker for running production workloads with Python.

It's definitely a very relevant subject and 1 that I think a lot of people will be able to benefit from. So thank you for your efforts and your time, and I hope you enjoy the rest of your day. Thank you. Thanks for having me.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__

Summary

Announcements

Interview

Keep In Touch

Picks

Links

The Python Podcast.init