Cryptography with Paul Kehrer

Hello, and welcome to podcast.init,

the podcast about Python and the people who make it great. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.

When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode@linode.com/podcast

in it and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app or experimenting with something you hear about in the show.

You can also visit our site at www.podcastinit.com

to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. To help other people find the show, you can leave review on Itunes or Google Play Music, share it on social media, and tell your friends and coworkers.

A little bit of news before we start the show. I'm also launching a new podcast called the Data Engineering Podcast, which is about modern data management and data infrastructure. So if that's something that you find interesting, then you should go check it out and let me know what you think.

Your host as usual is Tobias Macy. And today, I'm interviewing Paul Carrer about cryptography and encryption in Python. So, Paul, could you please introduce yourself? Sure. So I'm Paul Carrer. I'm Reaper Halk on, Twitter and other, services like Freenode,

and, I work as a principal engineer at Rackspace, hosting. Hosting. And, for my day job, I'm involved heavily in encryption and security, which means that I spend a lot of time with the Python cryptographic authority writing, Python software. So how did you first get introduced to Python? So I actually came from a a Ruby background 4 years ago, and, I was recruited over to work at my current employer, and they were a Python shop. And that was specifically to work on a, a Python project in the OpenStack community.

And so I

was

basically crash course into Python and said, oh, you need to start doing a bunch of Python work and you need to spend, all your time working on security projects for that,

which, of course, led into

my current status where where I was looking at the

ecosystem of Python cryptographic libraries

and did not like what I found. So can you share a bit of the background behind the Python crypto

the Python cryptographic authority and how you got involved with it? I guess that would have been 20

13 at this point. So around August of 2013,

I was looking around and not enjoying the state of cryptographic libraries in Python.

And, I reached out to, Alex Gainer and David Reed, who at the time were both Rackspace employees as well, although they have both moved on since then. And they had an idea of taking a CFFI based project, which at the time was called OpenTLS,

and

basically rewriting it from scratch and seeing what we could do. So from there, we said, okay.

We'll create a Python cryptographic authority, which is just kind of an amusing name at the time to play on the, fact that Donald Steff had recently created the Python packaging authority,

and we were going to just keep working

to try and build sane cryptographic defaults and well tested software. Yeah. I noticed the, similarity to the Python Package Authority, and I spoke with Donald Stuff on a past episode. So I was wondering if there was any sort of standardized naming around that where if there was a particularly significant

attribute of the Python ecosystem, then somebody would create an authority around it. But it sounds like it was largely just sort of tongue in cheek. Yeah. It was initially intended as tongue in cheek. I would say that at this point, actually, that the the, Python something authority has become

almost exactly what you described, which is effectively that someone sees a gap and they go ahead and say, well, if I'm gonna take this on seriously, maybe I am gonna name it something like that. So,

most recently, I know that,

Sigma Virus, who's

Ian I'm sorry, Ian. I forgot your last name suddenly. But, he, recently started the Python code quality authority to contain flake 8 and some of the other, like flake 8, pep 8, and some of the other tooling that we use for doing code linting in Python. Yeah. By having these Python blank authority, it

it serves a good purpose in that a lot of times when you're looking around

at packages that are available, particularly in an

ecosystem as large as Python, it can be difficult to understand what the canonical or sort of most well supported offering is for a given use case. And so by having the Python cryptographic authority or the Python package authority, you can look at the projects that are held under that umbrella and say, okay. Well, this is the best of breed for this particular thing.

And as long as the packages that are held underneath those various authorities do continue to be best of breed, then it doesn't dilute the brand, and it can serve as a good it it can serve its purpose as making it easier to understand

where particularly beginners or even intermediate people should start in the process of selecting what to include in their projects. Yeah. And and I I think that actually is a a great thing. Although, 1 of the

as you noted, we have to stay best in breed. And since we're not we're not sanctioned by any actual authority,

the authority we have is derived from our name plus our track record. So, ideally, we continue to operate with a stellar track record and make people continue to believe that authority should be trusted.

So there's an adage that I've come across a number of times that you should never roll your own crypto because of the fact that there are a lot of potentially sensitive applications and serious side effects if there are bugs or exploits that are introduced into the implementation. So I'm wondering

what was the problem that cur that the cryptography library was trying to solve that was important enough to disregard that particular warning?

Sure. So the adage to not roll your own crypto is is generally a very good 1, obviously.

You can fail with crypto at many levels. Like, most fancy level, the level you read about most often, crypto fails when people have

subtle implementation bugs, like, having if they have their AES implementation

have a timing vulnerability related to the fact that it uses, like, a jump table to compute its what they have internally. It's called an s box,

and that can allow, like, key leakage.

Those things get can get super advanced all the way from an example like that to doing power analysis to see if certain,

things when you're doing an RSA modular exponentiation,

like, certain operations will take more power, and that allows you to determine which code paths are being taken, which can disclose the private key. So that's, like, the most subtle level of where you can fail at implementing your own crypto.

However, there's, like, a much more common level and a much more concerning level for people in the Python ecosystem and indeed most programmers, and that's that cryptography gives you like building blocks.

A lot of the time people like to talk about about them as Legos but I think it's more accurate to describe them as Legos with razor blades taped to them.

So you can compose things and you can build beautiful amazing things but there's a very good chance you're going to cut yourself repeatedly while doing it. So the reason ultimately that we've created the cryptography library was to solve actually both levels of this. On the first level, at the base level,

was that,

PyCrypto

and,

M2Crypto

both had major issues when those were the the dominant ones at the time. PyOpenSSL as well, although we now actually possess PyOpenSSL

under the umbrella of the Python cryptographic authority. But,

PyCrypto

is

well, it's convenient to compile because it contains all the c code that you would actually need to write the or to be able to use the cryptographic primitives.

It is all bespoke c. And what I mean by that is that it was all written just for Pycrypto, which means it had it's had limited attention. There have been quite a few CVEs even in the recent past all around the fact that, like, this is not looked at that extensively. So, like, this code has not been reviewed for those timing side channels. It hasn't been reviewed to see that it doesn't have certain types of buffer overflows. It hasn't been reviewed for all the sort of things that you would expect a crypto library that is in wide use to see review on. And then at the other level,

we care deeply about trying to make the code have or the API, we want it to be misuse resistant. And so API misuse resistance, let me to be clear, is a very active area of research within crypto right now. No 1 really quite knows what that genuinely means. But as programmers, we certainly have opinions about how people do misuse things and how we can prevent them from misusing them in those particular ways.

So cryptography structure is largely geared around the idea that what we would like to provide is recipes

that solve use cases for people. And those recipes limit your choices.

But by limiting your choices, they also limit your possibilities of screwing up. Of course, we also recognize that the reality of crypto is that a lot of times you're attempting to interoperate with legacy systems that did things in ways that are just super horrible. And so we also expose the layer that is dangerous.

However, to make sure that it's very clear and easy to grip through your code base to see where you're doing dangerous things, we use the term hazmat.

So whenever you import those things, you're importing part from something deep in the hazmat module. And so just grepping through your code base for hazmat will show you where those dangerous areas may be. Yeah. When I was looking through the documentation, I saw that it had all of the, warnings and explicit

notifications that if you do this, you are taking your own life into your hands, and we are not responsible.

Yeah. Yeah. Namespacing it under the HAZMAT, definitely, as you said, gives a good amount of visibility

and makes it so that if you are trying to refactor those pieces out or if they are intended to have a limited lifetime, then it makes it easy to target those refactorings so that you can move it over to the, quote, unquote, safer side of the library. Yep. And that's absolutely the goal. Of course,

those things become aspirations. They're not always things you achieve immediately. In the case of cryptography, 1 of the things we have not succeeded at yet is writing enough recipes.

So we have a primary recipe which allows you to do authenticated symmetric crypto in a very safe fashion, but the nature of it, because crypto is not perfect, is that it's a 1 shot encryption model. What that means is that you have to encrypt all the bytes at once and decrypt them all at once, and we won't show you any of the bytes until after we've authenticated that that data is still good.

Now that's obviously the way you want it to work, but it means that you can't do things like encrypt 30 gigabyte files unless you have 60 gigs of RAM free. Yeah. And stepping back a little bit, I just realized that we haven't really given a high level overview of what exactly the cryptography library is. So if you can do a bit of a bit of brief overview of that and also maybe a little bit of how you

came by your hard earned knowledge of cryptography and how you first got started in that field. Sure. So, ultimately, cryptography

is a library for providing

encryption primitives and high level recipes to developers in Python.

It came about and its purpose was to,

provide support in a world where Python 3 and PyPy were not commonly supported.

We wanted to make sure that we maintained it well. We wanted to make sure that we had, like, top quality implementations of algorithms,

as in ones that don't have known side channel attacks and things of that nature. We wanted to have the high level APIs as we kind of previously discussed, and we wanted to have modern constructs like AES GCM,

hash based, key derivation,

things of that nature. Finally, we wanted to have extreme like, we wanted to make sure that our defaults were as good as we could make them and have good testability.

On your other topic, which was, I guess, where I

earned my knowledge, so to speak.

So out of college, I started actually working with a certificate authority. So I spent the first 6 years of 7 years of my career working on a globally rooted certificate authority and managing a public key infrastructure.

So in the course of that, I learned a lot about the operational side of that, but I also continuously

was required to learn more and more cryptographic defaults and where we could run run afoul of various problems. And that led to kind of just a general interest.

In general, most of my knowledge is self taught, which is sometimes good, sometimes bad. Certainly, it's helpful from the cryptographic engineering side, but from the pure cryptography

side, occasionally, I do have cryptographers I reach out to explain to me papers that are beyond my understanding.

Yeah. The nature of cryptography and all of the theory involved fundamental

levels

what's happening because there are a lot

of at the fundamental levels what's happening because

there are a lot of complicated mathematics involved

and sometimes unproven theories that are put into practical use, such as the, you know, sort of prime theories that are still in active research. Yeah. Absolutely. I mean, the the cornerstones of modern cryptography right now for asymmetric, especially, right, are basically the prime factoring problem is assumed to be hard, and that's the basis of why RSA is functional.

And then for elliptic curves, it's the assumption that the discrete log problem is difficult. Like, both of those are unproven. However, we have a high degree of confidence that they are difficult, but there have been breakthroughs in the past that have made certain things easier. Right? So, like, the current state of the art in, the world of prime factoring is the general number field sieve, and the general number field sieve has seen significant advances in the last decade. So, like, prime factoring is faster than it was before. It's getting faster faster than computers are getting faster.

Yeah. And then, when quantum computing becomes a mainstream viable

option for a computing platform, then we're all in trouble. And our cryptography will most likely be rapidly broken. And we'll have to develop all new techniques for trying to survive in a world of, you know, ubiquitous quantum quantum computation. Yep. So, fortunately, academia is very interested in the concepts of post quantum cryptography. So, like, that'll be that's PQ is the standard way they refer to that. So there are things like lattice based crypto. Actually, Google Chrome recently ran an experiment where they did a, TLS

handshake doing the initial key exchange using a post quantum algorithm called New Hope. New Hope is a variant on a ring ring learning with errors

based lattice. So there are lots of things currently in research for what we can do for post quantum. And the reason that people are interested in it right now actually is that if you assume that, wait, 20 to 30 3 years from now, we're gonna have functional quantum computers,

then there are secrets being encrypted right now that we don't want to be decryptable 30 years from now. So we need to have PQCrypto

available

for certain at least some subset of use cases as soon as we can because we want to be able to protect those secrets when those machines come online. Not to scare everybody right now, but,

yeah, be aware.

So given the sensitive nature of the libraries

cryptographic authority,

what are some of the development practices that you rely on to prevent the introduction of vulnerabilities

development practices section. But 1 of the biggest items is, development practices section. But 1 of the biggest items is, you know, we have this ends up sounding very,

very common. Right? Like, we say, oh, we care deeply about code review. Well, we certainly do. So as a part of that, anyone who submits a PR, someone else must review it. The author is never allowed to merge their own PR. If multiple people contribute to a PR, none of those people are allowed to merge. It must be a third party on that front. On top of that, we do, coverage with branches using coverage dot p y, obviously, and we merge that across a very large fleet. So we have over 30 different builders

on Travis CI, and then we have an additional

25 or so on a Jenkins instance.

So if we don't have 100% coverage, then that includes branches,

and we have no no pragmas for marking lines that we don't check, then it's not mergeable. And then finally, the because of that giant CI system,

we have a lot of tests that run a lot of permutations across a lot of different platforms.

Like, right now, we run against

OS 1010 point 7, 10.8, 10.9, 10.10, 10.11, 10.12.

We also run against

7 Linux distros,

and then we run on multiple windows including obviously both the 32 bit and 64 bit variants.

Anything we say is supported is a thing we actually run-in our CI. So that means we also run Python 26,

27, 33, 34, 35, 36, and PyPI versions

all in our CI and all against every platform. Now that's obviously a very large combinatorial

explosion,

when it comes to our our test suite. In fact, last time I checked, I think we were somewhere north of 8, 000, 000 tests every single time a commit is pushed.

That can sometimes take a while. I can imagine. Yeah. But the core of it is obviously that, yes, we're very concerned about making sure that this stuff is correct, and we don't trust it. Like, the cryptographic primitives in large part are provided by an underlying c library,

but we don't trust that c library to be correct which means that we run insane amounts of test vectors. So anytime we land any new feature, say you added a new mode to AES, a component of that would be us saying, okay,

we must find a set of official test vectors that provide

sufficient numbers of permutations,

and that's usually doing things that try and, like, exercise unusual corner cases of the way the underlying crypto library or crypto primitive works.

So that sometimes can be a few 1, 000 tests. In the case of GCM, which is actually surprisingly tricky to implement properly, we run over 50, 000 permutations

of tests through against the underlying library. Yeah. I noticed when I was going through the documentation that the primary linking for the cryptography library is against the OpenSSL library, which has had a number of high profile vulnerabilities

in recent years, including Heartbleed as being sort of the best known 1.

So I'm wondering if it's possible to swap out that back end for alternative for alternative implementations such as LibreSSL from the OpenBSD folks or s 2n from Amazon. Yes. Cryptography links against OpenSSL as you noted. You can swap LibreSSL

out. No problem. We do support that. It's an officially supported configuration.

S 2 n actually leverages OpenSSL, which is always an amusing 1 for people. So OpenSSL is actually properly 2 separate libraries. When you build it, you get a lib SSL and you get a libcrypto.

So the 1 that tends to cause us all the trouble and the heartburn

is lib SSL.

Live crypto is just an implementation of all the underlying cryptographic constructs. So that's like Diffie Hellman key exchange, Elliptic curves,

RSA, AES,

those sorts of things. And those are actually highly so while the let me be clear. While the API of open SSL is atrocious,

those are highly respected implementations and generally are considered best of class, both from a security and a performance perspective.

Lib SSL has obviously had its share of challenges. So s 2 n is actually a real implementation of Lib SSL effectively. It uses LibCrypto for the underlying cryptography, but it writes its own key exchange. It does its own TLS state machine, things of that nature. Whereas LibreSSL was a hard fork of open SSL that allowed them to

perform the sort of, like, major surgery they wanted to provide while maintaining

almost API compatibility. Their original goal was to have API compatibility, but they have diverged somewhat at this point.

Right now, you can swap out to LibreSSL for sure. Cryptography is designed with a back end agnostic model,

which is to say that we've built it where we have interfaces that you conform to, and you can actually write arbitrary back ends that use whatever crypto libraries you want.

Actually, when you PIP install cryptography,

if you do so on a Mac, we have 2 backends. 1 of which is open SSL and the other of which is common crypto, which is Apple's underlying symmetric

crypto primitives.

So on that platform, you can use either. And when there are

discovered vulnerabilities in the underlying back ends,

are there any mitigating factors that you put into play to to try to help people avoid exposing themselves to the effects

of those vulnerabilities? It depends upon the vulnerability.

In general, it's very difficult to do that for a variety of reasons. 1 of the first thoughts you might have is that you would just say, okay. If we detect that we're running on top of an open SSL that's known vulnerable, we will, you know, at least raise a user warning or something. But in practice, it turns out that distributions

like Red Hat, Ubuntu, etcetera, they'll patch the vulnerability, but they won't rev the version because they're trying to maintain API, ABI compatibility

with their existing install base. So that means that, like, just looking at the version of open SSL is actually insufficient to know whether or not it actually is vulnerable to a problem.

In fact, you need access to the underlying, like, essentially, the distro's package version, and that's not something you can access from the open SSL APIs.

The unfortunate reality of the world right now is that, in general, we have to assume that you are conscientious enough to upgrade when you need to.

Now

in the case of Windows and Mac,

we actually ship a statically linked OpenSSL as part of PIP install cryptography. The idea being that you don't need to have a c compiler on those platforms

as long as you have a sufficiently new PIP.

Now that means that we are the ones who are responsible for updating OpenSSL when it's out of date. So

we are pretty good about that. We take at most 1 day, typically less than 1 day, to ship a new cryptography that links against the updated versions of OpenSSL.

However, there is an outstanding problem, and this is 1 that we've been talking to with distro

and with people, obviously, with, the PIPA as well to say, like,

when we have this sort of issue, how do we tell people that they need to upgrade their version of cryptography? Right? Like, normally, people pip install their stuff, and they go along about their day. And they don't worry about updating that software

perhaps ever, but certainly not until they

need to do, like, a large scale upgrade of their environments. Right? So that's an outstanding problem to try and educate people that you need to stay on top of your cryptographic library. Yeah. And to that end, I'm assuming that you

work fairly hard to make sure that you don't make breaking changes in the API so that it is simple for people to update and maintain,

version parity with what's latest? Yeah. So in general, we have a strong backwards compatibility policy. I I wouldn't say we're quite as impressive as Twisted in that regard, but we definitely have,

long deprecation cycles that where we explicitly call out things that are deprecated,

including raising warnings if you use those, properties for the versions after which it's been deprecated.

Deprecations are somewhat rare these days. We've gotten to the point where the API is reasonably mature and we don't want to make those sorts of changes,

Really, even we don't really wanna make those changes at all because we're mostly happy with the way the API is constructed at the moment.

So, yeah, I I would definitely say that it's almost always safe to upgrade. But, certainly, if you stay on top of upgrading consistently, then you'll never be caught by surprise because you'll always see those deprecation warnings first.

So what are some of the testing techniques that you use to ensure the accuracy and safety of the algorithms that you're using? And you mentioned test vectors, so I don't know if you can explain a bit more in detail what that means for somebody who's not familiar with some of the basics of cryptography. Sure. Absolutely. In cryptography, there's a few things you care about and, obviously, 1 of them is whether or not given an input with an input of data with a key into a symmetric algorithm, do I get out the data I expect? That's effectively saying,

is this algorithm written correctly?

So it turns out there's ways you can write those algorithms where they look correct in most cases but aren't always correct. And so the way that most of the, authors of those algorithms deal with that these days is they come up with what they call a set of test vectors.

That's just a fancy name for saying,

given these variables as input, here's what I expect to see as output. And so those tend to be created as text files,

usually with hexadecimal

encoding for each 1. So you read you write a little state machine that can read the files in, decode the data, and then feed it through your algorithm, and then compare the output at the end, and that will tell you whether or not the algorithm itself has been written properly.

So that's our first and biggest line of defense for everything we do inside of cryptography,

which is, basically,

if you claim that this is an implementation of this algorithm, then prove it.

And that a big part of that is also the idea that as we plug in new back ends, we can have confidence that those back ends are correct because we already have a library of many, many tests. So the moment you plug up a new back end and and then wire it to our testing harness,

you can have a high degree of confidence that that code is working properly. There's 2 other things that I I would say matter a lot for our purposes. Like, 1 is the way that we handle errors. Right? We need to make sure that we know how the underlying library errors when bad data is passed. And so that means that we have lots of tests that test negative cases. Right? We we're looking for ways to make the underlying library

choke, and then we handle those cases properly. In the case of OpenSSL, it has some interesting quirks where it has, like, a thread local error stack. And if you don't manage that error stack effectively, you can have problems. So we have a lot of testing around dealing with that sort of thing.

And then lastly, it's something we're kind of trying to expand here going forward, but have not had time to spend a lot,

of effort on yet is, generative testing.

If you, use a tool like hypothesis,

it becomes very interesting to say, well, like,

given

this shape of data, we expect this other thing to come out the other end. And while we may not care precisely what those bytes are because we've already verified whether or not they work via test vectors, we do care that any

permutation of data we send in via hypothesis should result in a specific out like, a specific length output or a specific

lack of error message.

So hypothesis has been very good in that regard because it allows us effectively to fuzz our underlying library. Yeah. We, interviewed

David McIver, the creator of the hypothesis library, a few episodes ago, and it was definitely very interesting to hear about some of the different methods that he employs for being able to generate that randomized input and, validate its accuracy. And since then, I've come across a number of people who have either used or want to use hypothesis, so it's good to hear that people are actually putting it to putting it in practice. Yeah. I mean, hypothesis, especially,

in the X509, like, the certificate layer for cryptography, hypothesis becomes very interesting because,

there's, ASN 1 parsing

all up and down inside that stack. And ASN 1 parsers are notoriously hard to get correct,

and buzzers are perfect for finding major bugs in those things. So for somebody who's looking to select a cryptographic library to use in their project, what are some of the factors that they should be considering well

during that selection process?

The first thing I would say is, well, what are your requirements? And the requirements could be things like, oh, management says I have to use NIST approved algorithms, like the National Institutes of Science, Science and Technology.

And if you need to use NIST approved algorithms, then that constrains your choice set. Right? Also, it matters whether or not you're working in a world where you need to interoperate with other people. So, like, if you need to interoperate with, some other group or some other company, then you need to be talking to them to to determine what makes sense for your purposes. Because, like, what you think might work well might be something that's impossible for them to do in their language or so difficult that they'll have all sorts of problems.

But outside of those problems, there's kind of a few high level questions to ask, which is, like, well, what is it exactly you're trying to accomplish? Like, are you trying to encrypt data?

That's great, but encryption of data is too broad. Right? Like, do you do you need to encrypt and keep files safe on a file system?

Do you need that data to be authenticated? In general, you should assume that encryption without authentication is almost useless,

but a lot of people don't necessarily realize that. So, like, you have to kinda go down through these pages. If you encrypt, say like, say you wanna encrypt a giant disk image on a file system, do you need to be able to seek to an offset and read the data?

If so, that actually makes your problem much, much more difficult. So it's very specific to the problem set of what you may need to do.

Underlying quality of the library matters a lot, and performance may matter to them as well. But, the underlying quality speaks to something like, so we have a project under PICEA called Pineapple, which is inappropriately named because in reality, we bind against libsodium.

That project uses

Daniel j Bernstein's

primitives.

Now d j b's stuff is widely respected and actually is increasingly becoming heavily standardized throughout the IETF.

But it's the sort of newer crypto that sometimes organizations aren't comfortable deploying.

So while that's fantastic crypto that provides a lot of reasonably safe constructs,

you may have political reasons why you can't use it. For cryptography itself, in general, we designed that to be the Kleenex of,

crypto libraries.

The idea being that you it's it's the generic crypto lighter. It's the 1 you would reach for by default in most cases.

It has almost all the constructs you would probably need to use,

but it doesn't have some of that sexy new crypto, and it will not it it will always gain that stuff eventually, but only after it's no longer sexy.

Because we want those things to be reasonably well vetted primitives that have a lot of cryptanalysis

so that we have large degree of confidence that people can use them

safely.

Yeah. Cryptography is 1 of those things that everybody should want to be boring. Because when things get exciting in cryptography, it usually means that somebody's having a bad time. Yeah. Exactly. Like, sexy new crypto is synonymous with vulnerabilities in a lot of lot of cases.

That's not always true, but, certainly, to a first approximation, it's not unreasonable to think that way. And for somebody who wants to incorporate the cryptography library into their project, what are some of the potential pitfalls that they should be aware of? And also, how much knowledge of encryption and cryptography should they possess? Well, so as always, the more the better, but it's not required to know something about cryptography to use it. If you are a

newbie to the to the entire discipline,

then FERNET, which is our high level

symmetric encryption construct,

is

almost entirely bulletproof

within its sphere of what it can handle. Right? Again, if you try and encrypt a 30 gig file with it, you're going to have a bad time. But for small tokens and and small file encryption, then FERNET is very difficult to misuse.

So from that perspective,

even the rankest amateur would have no problems with it.

Once you start getting into needing to compose your own primitives to interoperate with something else or trying to build your own asymmetric,

plus symmetric

key model, then, yeah,

you really wanna be talking to other people, asking questions, and and generally being very skeptical of your own architecture. Right? You should assume that anything you're trying to build yourself is utterly broken, and you should invite others to show you how it's broken.

The

hubris with regard to cryptographic design is common problem, and I would urge anyone who's wants to go down this path to just be humble about what you've built and ask be excited when people break your stuff. Right? It teaches you something new about what you shouldn't have done. In terms of incorporating the library,

so to do crypto safe, you have to do it outside of Python, which is something we haven't talked about yet, but that's, for the moment, let's take that as red. That means, in general, you're doing it in c. And that means that if you want to

distribute something that uses cryptographic libraries,

you may have to force your users to have a c compiler.

Now the Python ecosystem is getting better about that. Right? So we have wheels now. And on the Linux side, we even have many Linux 1, which allows us to do, binary wheels for a reasonably wide variety of, Linux distributions.

But that's not universal yet. In the case of cryptography, we ship wheels for Mac and Windows, so you can install cryptography very simply there. But on the Linux side, you need to have

libffi, the libffi development headers, open SSL, and the open SSL development headers, and the c compiler. And without those, you can't do it. So that's a much larger burden than just pure Python.

So that's always something to keep in mind that when you pull in a cryptographic dependency,

you're significantly

increasing the complexity of deploying your software. So you touched on it briefly, but,

what are some of the ways that the security landscape in Python differs from that of other languages that you're familiar with? And, also, what are some of the unique challenges that we face as users of Python? So I can take that first 1 or that last 1 first, which is the most unique challenge we face as users of Python

is, maybe not all that unique, but instead is shared by, other languages that have other dynamic languages like Ruby, which is effectively that we don't control our memory allocation at the Python layer. So Python has a slab allocator, and then it gives you, memory from that. And when you when a item goes out of scope, that gets deallocated

in PyPI via garbage collection or in in, CPython, it'll be via reference count. Right? But that memory is not zeroed out, and there is no way to tell it you must 0 that memory out. So that means that you do have scenarios where you may have sensitive data in RAM in ways that you will be long lived even though you want it not to be.

So there's basically nothing you can do about that in Python land or Ruby land

or JavaScript land unless

you reach explicitly down through the FFI layer and talk to c only, which can be very challenging.

So those are things that we have to deal with as Python users. We have to recognize that there are things Python's very good at, and there are things that Python's not so good at.

And this is 1 of them. Now, in general,

while this is a thing it's not very good at, it turns out that the threat models for, compromise in,

most applications, it doesn't really matter. Like, if somebody is a capable of reading the memory of your process,

they probably have a persistent presence on the machine, and that means they can read the memory they could read the c memory of your process before it was zeroed anyway, and therefore, it wouldn't be that hard for them to extract keys.

So this is 1 of those ones where, like, it is a problem, but how much of a problem it is very much depends upon your domain. In what ways the security landscape in Python differs from that of some of the other languages that you're familiar with? Python actually has it pretty good in the security, like, in terms of,

general security. Right? We have

while the, SSL module is not the greatest in the world, thanks to the efforts of, Alex Gainer and David Reed back when Python 279 came out, we validate host names by default.

We have people like Corey Benfield

working hard on requests and making sure that request has same defaults that result in good validation of things. We have the pie CA projects providing reasonable cryptographic primitives on that front.

And in general, we have a pretty vibrant ecosystem of people who care about this stuff working.

Sometimes that's not true in other languages

or in other languages, you end up with, things living in the standard library that kind of

suffer over the long term. 1 thing I do very strongly believe is that cryptographic libraries do not belong in the standard library because they need to be more agile. They need to be capable of changing. And while that

is unfortunate from a stability perspective,

it's the reality of the world. Right? And it turns out that in the past when we've been complacent about allowing software to be upgraded, we ended up with the debacles around TLS where it became very difficult to deploy TLS 1.2.

It took the concerted effort of multiple multi $100, 000, 000, 000 companies

to actually push the state of the art forward. And going back to what you're saying about zeroing out the RAM and not having control of the memory management

in terms of

ensuring that the secrets that are held in memory are properly disposed of, would something like Cython be a viable option to

control that level without necessarily having to drop all the way to full C code? That's an extremely good question, and I don't actually know the answer to that. So if Cython allows you to

read

strings

from the file system

without ever actually making them into a Python object or while give making them a Python object that you explicitly control the memory of, then yes.

If it doesn't allow that behavior, then no, it won't help. Well, if anybody

knows the, developers behind Cython, then, feel free to ask them that question, and, I may do so myself.

So what are some of the fundamental aspects of encryption and cryptography that you think that every developer should at least be aware of, if not intimately familiar with? So the single biggest thing think developers should be aware of is what's good randomness.

Right? It's very common for people, especially when they're picking up a language like Python, to think that the way that they should get their randomness is by typing import random. Unfortunately,

random dot randrange and things of that nature are actually a Mersenne Twister instance that's a a global singleton.

Now that's perfectly fine and, in fact, desirable in certain scenarios. Right? Rand range is great if you wanna see your Mersenne Twister instance

so that you can get, like, randomly generated world. Right? So if you're making a game, it's great to be able to seed your random generator

explicitly so that you can have reproducible, quote, randomness.

However, it's really really bad for cryptographic purposes.

So if there's 1 single thing I would tell developers to know it's when you're generating a key to be used for cryptography,

that data should come from os.urandom.

It should not come from anywhere else.

Other than that, on the encryption side,

key management is obviously 1 that I would say matters a lot. Understand

how to keep your keys in safe places and not reuse keys unless you need to reuse keys.

And then

kind of finally,

when you're doing encryption,

encryption alone is not sufficient.

You need some form of authentication.

If you don't have authenticated data, essentially, if you don't have some guarantee via HMAC or some sort of tag that says this ciphertext

is what I expect it to be, then that data can be tampered with even when it's encrypted.

There's a very interesting

and very simple attack that can be performed where if you know that a it's like, if you imagine you have an encrypted payload that's a get request,

it is trivial

to say I'm gonna change that g into something else. Like, you can change the letters g e t into any other 3 letters that you want to without ever being able to decrypt it because of the nature of the way cryptography works. You just have to XOR some values, and you're done.

So

that's an attack that will always work 100% of the time unless you have authenticated encryption.

So never encrypt without authenticating. And if anybody wants to learn more about security encryption, what are some of the resources that you would recommend they take a look at? A friend of mine, LVH, wrote a free book that's, called crypto 101, and I believe there's a link that you're gonna be providing for that. That would be my primary suggestion for for starting out. Crypto 101 is a fantastic resource.

From there,

you know, it's kind of that'll give you a good grounding in various areas of cryptography, and you can kind of spread out and do what you're interested in. If you're interested in breaking crypto, then,

Masa Sano wrote the Crypto Pals, which is a set of like, I think it's up to 7 sets now

of, cryptographic problems where you are tasked with breaking various crypto things,

where there's bugs in the implementation or mistakes made, and then your job is to recover the keys or recover the message without having access to hypothetically what you need.

So that's a really, really good way to learn additional crypto including

practical cryptographic engineering concerns because ultimately if you're going to be a developer working in cryptography,

you care more about how people might screw up

than making new cryptographic primitives. Are there any topics that you think we should cover before we close out the show? The only other thing I haven't covered would

be that, the PiCA has kind of 3 major projects, which is cryptography,

PYNAKLE, and, PY OpenSSL.

And through a quirk of history, 1 of the main reasons that cryptography is actually based on OpenSSL is because PyOpenSSL

uses the underlying bindings to do its work. So PyOpenSSL

is actually a pure Python project these days. It just depends on cryptography to do all the FFI.

However,

those 3 projects can always use additional contributors.

While I say that Python in general is a healthy and vibrant community and we have lots of people who care about crypto,

We do have a shortfall of people who are willing to go through the amount

of rigor and effort it takes to seriously contribute to these libraries.

So I would urge anyone who's, interested to please reach out, file issues, ask questions, hop on IRC.

We're happy to help people get up to speed. It's definitely difficult to become a major contributor,

but for the sake of Python in the long term, we definitely need more people. Alright.

Well, for anybody who wants to get in touch, I'll ask you to send me your preferred contact methods, and I'll add those to the show notes.

And with that, I'll move us to the picks. And my picks today are first 1 is an email service called Migadoo,

which is a company out of Switzerland

that provides

a hosted email platform where you can

host your own email domain similar to what you can do with something like Google Apps but at a much lower cost, and it's a

1 low fee for being able to host as many different domains as you want. So I've started using that for 1 of my domains, and I'm working on moving some of the other ones over. So I've got a link in the show notes that'll actually get you a 10% discount

on any of their plans. So if you click that and check it out, then you'll get a few bucks off from them.

And my other pick today is a board game called Castle Panic that I picked up recently to play with my son, and it's just a really fun tower defense style board game where you

are working together with the other players to try and defend the castle tower against all of the monsters who are coming in from the forest. And so in order to win, you have to outlast all of the monster tokens, and the monsters are all trying to destroy the castle. So it's a lot of fun, has some fun strategy, and it's just a enjoyable all around game to play.

So, with that, I'll pass it to you. Do you have any picks for us today, Paul? Sure. So since I'm, so self absorbed, I have to I have to hype 2 additional projects I might have. 1 of which is frinkyak,

which is frinkyak.com,

which is where it allows you to search for, Simpsons quotes and has over 3, 000, 000 screenshots that you can choose from. And then the other 1 would be Morbotron, which is the same thing only for,

Futurama.

So those are the labors of love I do on the side when I'm not working on

cryptographic

things. Alright. Well, it's good to hear that you're making the world a better place as well as a safer 1.

Alright. Well, I appreciate your time today. It's definitely a very interesting set of projects, 1 that I'm glad that I know a little bit more about, and 1 that I hope that other people will be able to benefit from. So I appreciate your time, and I hope you enjoy the rest of your day. Thank you very much. Anytime.

The Python Podcast.init

Summary

Brief Introduction

Interview with Paul Kehrer

Keep In Touch

Picks

Links

The Python Podcast.__init__