Digital Identity, Privacy, and Security with Brian Warner

Hello, and welcome to podcast.init,

the podcast about Python and the people who make it great. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it. So you should check out linode@linode.com/podcastin

it, and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app or trying out something you hear about on the show.

You can visit our site at www.podcastinnit.com

to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.

To help other people find the show, you can leave a review on iTunes or Google Play Music, tell your friends and coworkers, and share it on social media. Your host as usual is Tobias Macy, and today I'm interviewing Brian Warner. So Brian, could you please introduce yourself? Hi there. Yeah. I'm, I'm a software engineer based out of San Francisco. I've been doing things in the privacy and security work for maybe 10, 15 years, pretty much entirely in Python. I started doing, started my career doing router firmware like boot proms

and CLI commands and forwarding tables and stuff, but switched over to doing Python

relatively quickly and have been doing that ever since. Great. And do you remember how you first got introduced to Python? Yeah. You know, I I was at this router company, and I think I must have read,

the release notes for Python 2.2,

like, in 2, 000 and 1. And I remember there was a section in there about generators, and it was some example about,

binary tree traversal

that was a lot easier to express if you had, you know, yield and and you could kind of walk everything inside a single function instead of having to accumulate the everything with a callback of some sort. And I thought, this is really cool. These people are on a really good path. In 5 years from now, they're gonna be doing something really, really neat, and I wanna be ready for that. So I wanna start learning this thing now so that I can, be up to speed, by the time they start doing that really cool stuff.

And and they're still at it? Yeah. Yeah. It was fun. You know, I think, I I started doing this thing this kind of pre Buildbot thing using async core back in, like, 2002,

And then that kind of got me into the Twisted community, and then those folks, got me to start coming into, to Python Conferences and presenting some of that stuff. And it's been such a great community to get connected to ever since then. Kinda got caught up with that momentum really early and just kinda got swept away with it. When I was doing the research for the show, I noticed that you were the original creator of Buildbot, which I thought was pretty interesting since I've done a show about that little while back.

Yeah. It's fun. It's it's I've been really it's been really gratifying, just how many people have benefited from that. I mean, it's not there there's there's stuff these days like Travis and stuff that's kind of better supported, more polished in some ways. But early on, there wasn't a lot.

That was actually I remember seeing,

Mozilla had a thing called,

Tinderbox,

and I kinda wanted something like that but available in a different pattern where you could attach a worker box to the the build master instead of the other way around. Anyway, I think that that's probably ended up being 1 of my most influential projects. Like, more people I run into are like, oh, wow. You did that than anything else I've worked on. So that's been a lot of fun. Yeah. It's definitely great when you actually hear back from some of the people who are, consuming what you've built. I've been able to experience some of that with the podcast, but it's definitely great too when you actually have software that people are using as well. Nice. So you mentioned that a lot of the work that you do is in the area of

cryptography and digital privacy. So I'm wondering how you first got interested and involved in that area of computing.

Yeah. I think,

I was into

PGP

kind of back in the nineties, kind of the the first cypherpunks era, and that all sounded really cool. But I didn't really have much of an opportunity to work on that stuff while I was doing things in the router field. And it wasn't until I kind of switched over to Python and

started looking at encrypted messaging systems.

And I think it was,

in 2, 000,

when was it?

2004, I presented a thing at this little conference in San Francisco called CodeCon called PetMail that was kind of my attempt to do an anti spam slash encrypted email thing. And that got me connected to a bunch of people there that were presenting.

I think Nick Mathewson was presenting Tor. There were a couple people doing some distributed hash table, kind of peer to peer file sharing things.

That was the same conference where Zuko and some folks presented Mojo Nation,

earlier than that. But but, I mean, I got to meet all of those people there. And that kind of got me

connected in with a bunch of folks that were doing really interesting encrypted communication

stuff. That meeting then got me

working at a company called All My Data where we started making

Tahoe, the this encrypted file system thing that we've been working on since then. So it was really this interesting

connection there of like getting to present this project that I've been doing just for fun, gotten me a seat at the table to kind of start talking to a bunch of other people that had similar ideas, and then things kind of grew from there.

Yeah. And you mentioned the Tahoe least authority file system. I know that you've got a few other projects,

notably

recently anyway, the magic wormhole. You mentioned pet mail. Wondering if you can just give a brief overview of some of the different projects

that you've worked on, what their use cases are, and, you know, any other projects that I missed that, didn't come across while I was doing the research for the show as well. Yeah.

So let's see. The first version of PetMail

back in, in 2004

was mostly about anti spam, but I figured, well, you might as well encrypt everything anyway because that's just a smart thing to do. And at that time, it had to be all, front end to PGP. There wasn't a lot of of other options. We have much much better options now. And then yeah. So we started Tahoe. So Tahoe is an encrypted distributed

file storage system. And it's a little bit,

you can kind of think of it like an S3 or Google Cloud Google Cloud files, except that all of your data gets encrypted on your machine before it goes off to the server. And the data gets spread out among multiple servers so that you can survive the loss of 1 or more of those servers. And you can configure how much redundancy or how many copies of the data get pushed out there. It uses a technique called erasure coding, so it kind of spreads the data out a little bit like RAID for disks, but this is spreading it out for files. There are bunch of interesting aspects to the way we do the encryption in Tahoe that give you very fine grained control over who can see what. And our theme is sort of no any given point in the system should have as little authority as possible. The l a f s in Tahoe, LAFS actually stands for least authority file store. And so in that system, the server

doesn't get to see any of your files. The server has just enough power to be able to hold a copy of the ciphertext, but it doesn't have enough power to modify it without you being able to detect it. So we have integrity checking and doesn't have the ability to see the contents of the file other than their length. So that was Tahoe. After Tahoe,

after that company basically ran out of money, I started working at Mozilla,

and there did a couple of different projects. There were some some stuff on the add on SDK, something called persona. But the 1 that's kinda relevant to this is called Firefox sync, which was it's it's the mechanism by which Firefox lets you synchronize your bookmarks, your saved passwords,

all your browser state between different browsers. And we had the same kind of philosophy there that the servers that Mozilla runs should not be able to see any of your data. So when I started there,

that was like

2, 009,

2010.

We set up Firefox sync to it encrypts all of your data on 1 browser. And then to add a second browser, you do this kind of pairing thing. There was an algorithm, that I ran across called JPEG that lets you use a a fairly short low entropy secret in order to derive a stronger shared, encryption key between 2 different machines. So we use that in the original version of Firefox sync to get that encryption key for 1 browser over to the other. So when you wanted to add a second browser,

to your account, you would go to the 1st browser. You'd say, I wanna add a new browser. It would say, here's this code. Type this into your other browser. And then from that, they can get the same encryption key. That was a really neat bit of technology.

What we found over the following couple of years was that

the usability was not very good. And I think we we made some kind of design

mistakes

in building it and in changing the systems that existed before that to add this pairing thing. The result was that there were a lot of of, cases people got stuck in where they they lost their laptop or they lost their phone and they wanted to get back to their saved data, and they couldn't because everything was designed around this idea of pairing with an existing browser. And we didn't really handle those cases very well. So we ended up I was on the team that that removed that pairing stuff and replaced it with a more traditional

email address and password scheme. We still maintain the property that everything is encrypted on the client before it goes to the server, but we no longer use the pairing thing. We based everything on that password. And so after,

I left Mozilla

and started working on some other projects, I decided I really liked the pairing aspect of Firefox sync, but that it could be more usable in a different context. And so I started Magic Wormhole maybe about 2 years ago with the idea of using pairing for file transfer. So Magic Wormhole is a it's a command line tool so far. And you type the command. You say wormhole send and give it a file name, and it tells you the magic wormhole code is and it it has a single use short code like 4 purple sausages. And then somebody else in the world types in wormhole receive 4 purple sausages.

And that gives enough information to the 2 sides

to find each other, to negotiate a strong session key, to figure out how to make a direct connection for doing bulk data, and then encrypt your file and send it over that encrypted connection. And that's turned out to be incredibly convenient.

Even if it didn't have any encryption at all, I'm finding it to be a really useful tool

for just getting a file from 1 machine to another when you don't already have some sort of pre established connection. Like, you can use SCP, you know, do an SSH connection from 1 or the other, but you have to have a shell account on the other side. You can try and and PGP encrypt something before you send it, but then you have to get your your public key to the other person beforehand. So this kind of cuts away all of that setup phase. It just says you want to boot strap a connection that you have with the person

and use that in order to allow 2 computers to talk to each other securely.

And that's been going pretty well. I've I've been getting a lot of of, good feedback on that 1, and I'm trying to, you know, build that out into something more usable usable in in other domains and easier to install. Oh, and then then yeah. 1 other 1 other project I thought of is project I thought of is called versioneer, and it's not a security thing at all. It's it's a Python setup dot py tool

to, get version numbers out of something like Git and, and Git tags and put them into your project. So when you're releasing when you're making a new release of your software, instead of modifying some file called version dot p y and updating a bunch of numbers, you just make a Git tag and then make a make a new, release, make a new tarball. So the nice thing about it is that if you were running an intermediate version, you know, if you have somebody downstream who's following along with your project and wants to submit a bug, then you can just tell them, okay, tell me what version the this application is reporting. And no matter what git version they've checked out, the version number will include the the the full git hash. It'll include this is version 1.2 plus some number of changes. If the tree has unrecorded changes, it'll mark that in the version string as well. So it kind of removes that that question of, well, are you running 1.2 or you're running 1.2 plus some stuff, but the system isn't reminding you that it's it's not the actual upstream tagged release.

So it just removes a lot of the effort out of making a a release for something out of getting the version streams managed. And that's I've gotten some good feedback on that too. Some of the the projects in the kind of, twisted space have been picking that up that tool up.

Yeah. Managing version numbers within your project is always 1 of those chores that is, more of a pain than you really realize when you first get started until you've actually had to deal with it for a while.

It's unfortunate. And and it's nice when you have some automation there so that you know, like, Twisted has a very, very well oiled release process, so they can get out releases every couple of months. All their release notes kind of get automated in that, and it's a good it's a good flow. And stuff like this just kind of makes that easier where you don't have to think very much about managing the version number anymore. Yeah. Absolutely. Going back to what you were talking about with Magic Wormhole and trying to solve the problem of sending 1 file between computers,

Despite the number of there's actually a great x kcd comic about this. But despite the number of various technologies that there are for transferring files from 1 location to another, it never ceases to be a problem when you actually need to get 1 file from your computer to the person next to you or to somebody who you know quite well. And the the only other project that I've come across that really does a decent job of that while maintaining any real measure of security and privacy is the key based file system, which I don't know if you've looked at that. I have. Yeah. It's fun. That 1, has they they do a really interesting job of sort of binding identities and keys from 1 domain to another. So the fact that that most of the people I'm trading files with are developers means that most of them have a GitHub account so that the fact that I can send something to to a Keybase user through their GitHub user ID is pretty handy. And they they've done a good job of kind of removing themselves from the trust path in that like we do at Tahoe, where everything gets encrypted on that client side before it goes. But it's still it's still sort of user identity centric so that it's definitely, you know, Warner at GitHub is sending the files to, you know, Tobias at at GitHub or something. And and the interesting thing about Magic Wormhole is that it's much more 1 shot. It's much more like there's a specific instance of us talking right now, and I want to get a file to you that doesn't imply a longer term relationship, and we don't have to go and set that stuff up beforehand. And as such, I think it's it's what I'm what I'm pushing Magic Wormhole for as a protocol is to kind of be the introduction or the invitation mechanism that you use to establish that longer term relationship.

Yeah. And the other thing too is that because of the fact that there is no intermediary storage, you don't have to worry about about then cleaning up after yourself because it's just, as you said, a direct connection between the 2 parties.

Yeah. The problem gets a lot simpler

when you can reduce the scope of it. You know, magic wormhole is that wormhole code stops being used by the time the second person types it in, plus, like, 2 round trips after that point. And if you can get a direct connection between the 2 machines because you're either on the same network or 1 of you has a public IP address or eventually once I get the kind of NAT hole punching stuff working, then this should work almost all of the time. Then you don't need there there's a central

coordination server that it uses to get the encrypted messages from 1 side to the other. But once you get that direct connection, you don't even need that. So the lack of storage, like you said, scoping the problem down makes things so much simpler. That reminds me, you know, Tahoe was is sort of a great grandchild

of

MojoNation. MojoNation was this very ambitious,

censorship resistant, encrypted, decentralized publishing system from, like, 1997.

And it was it was super clever and and had a lot of really interesting,

goals and approaches. But in some ways, I think it was just too big. There were too many pieces going on there. And so it's it kind of fell apart under its own weight, which is really sad because there was a lot of really neat stuff there that we're just now starting to be able to implement properly. But what we found interesting was that a lot of people that were working on imagination at the time took smaller pieces of it and ran off in a different direction

and got very successful with it. So, Brian Cohen was 1 of the original people at Mogo Nation, and he focused on just the download aspect and the kind of swarming download aspect and made BitTorrent out of it, which has obviously been super, super successful. And then for Tahoe, we kind of focused

just on the,

reliable storage aspect of it and not on any of the micropayment

stuff. None of the like, being able

to buy blocks of data from a broker because you discovered that other people are asking for this block of data. And so your agent thinks, Oh, hey, maybe I could make a profit by doing this. So for Tahoe, we just kind of focus on a very, very narrow portion of that original project and have had, I think, a lot more success in some ways than that original project.

So in any of these things, if you can find a way of narrowing the problem domain down as much as possible, then you're much more likely to have something that'll be that'll, be workable and and, be usable.

And what's the current status of the Tahoe LAFS project? Is that something that's still active and being maintained that people can pick up and use today? Or is it has it been sort of on the wane? It's it's still plugging along. We got another release out,

back in December, I think early January,

trying, hard to get in, some new features

in time for the Debian freeze that happened there. I'd say, you know, progress is slower than it was when we had an entire company just dedicated doing it. There is there is a current company

that does a lot of work on Tahoe called Least Authority Enterprises.

They provide a service called S4, which is commercial

storage providing for for Tahoe clients. So if you don't wanna run your own Tahoe server, you can pay them to run 1 for you. And that gets spread out among multiple cloud service providers. So you can you can survive, you know, s 3 being down or or, you know, EC 2 is out for a little while, that kind of stuff. But it's you know, the the the the pace is different now. And I think, a lot of us, you know, we have we have other projects. We're focusing on, other jobs, and it's a little bit hard to kinda maintain that momentum. But we have some some we have a weekly dev chat going on, we just had this morning. And, you know, we both have a bunch of interesting projects that we've been kind of sketching out and we're trying to figure out how to build. We are

slowly working on adding counting to Tahoe. So at the moment, if you have any permission to store something on a storage server, then you can store as much data as you want. And the storage server operator

doesn't really get a lot of visibility into how much, you know, Alice is using 2 gigabytes and Bob is using 3 gigabytes. And maybe Alice is a friend and, you know, Bob has this arrangement and says, well, you know, I can give you like 5 gig on my machine, but I don't have that much more. And then if Alice starts using too much, it'd be nice if Bob could say, Hey, you know, can you cut back? Can you start deleting some stuff? We don't have any good tools for that right now. So we're trying to add that and do that in a way that doesn't involve

user accounts in that traditional way. You know, you don't sign in with your Facebook account into your Tahoe storage server. It's gonna be something much more there's a private key and you have some sort of invitation process so that Bob says, hey, Alice. Please join this. And then Bob gets visibility into how much Alice is using. But, of course, you can't see the contents of the files, can't see any more detail other than how many bytes is she is she currently using. So we have that. We're trying to change some of the internal protocols for Tahoe around so that it is easier to port to other languages. Tahoe is using a networking library called Foolscap that I put together a couple of years before we started doing Tahoe. That it's a remote object's indication

mechanism that

everything is encrypted, everything is is very fine grained,

object capability based access to the different objects and different methods you want to invoke. But what we've discovered is that it's really overkill for Tahoe, and it does a lot of encryption that is redundant with the kind of encryption that the Tahoe file system layer is doing. So the complexity of the protocol,

I really went overboard when I designed this thing, is a barrier to getting people to port Tahoe

to Node JS or to Go or into a a browser context. So, you know, step 1 is kind of getting rid of that and replacing the networking layer with plain HTTP.

And then we can make Tahoe into something a bit more library shaped so that people can write some other projects, some other, applications sitting on top of it. And maybe those can be web based things or maybe they can be something on a mobile mobile phone, you know, something Swift or iOS or Android based. So kind of trying to make some architectural changes to open it up and make the make it possible to have a bigger ecosystem.

Yeah. 1 of the,

recurring themes

whenever cryptography or digital privacy comes up is the, sort of user experience barrier where a lot of people who aren't necessarily as technically well versed are not as likely to sort of go down the rabbit hole of figuring out how to actually jump the hurdles that are necessary to make sure that they're properly protected when they're using the Internet or either computers. So I'm wondering, what are some of the biggest barriers to adoption that you found for some of the projects that you've involved in and how do you see that evolving?

So lack of a GUI has been a big barrier for a bunch of these. I am too much of an engineer and I have a very strong tendency to build projects,

build things for other engineers. You know, you can do pip install magic wormhole, and that's

great for everybody who's ever been to PyCon or, you know, that the the population that kind of knows how to develop stuff within Python. But that's not something that a non developer is gonna be able to use, and clearly that needs to be something like in an app store or something that's on a web page that you can visit. So trying to build these things, trying to figure out the packaging, trying to figure out the the UI process

to make these things

usable is a big step. Tahoe, likewise, you can do pip install Tahoe. And the folks over at least Authority are working on some better deployment mechanisms for this, trying to make more traditional

application installers like the dot DMG kind of, whatever the Windows installer wizard thing is. There's that. Another section is, how do you present this as a product? You know, Magic Wormhole is is a CLI product. It it provides you with a specific bit of functionality.

You can, you know, watch the little screen cast and figure out what exactly it's going to do for you, And you can install it, and then it'll go and do that. Tahoe is it needs similar product work in my mind. Something I've been I've been struggling with has been how do we get from a tool

that allows you to store stuff on servers in a secure way, but doesn't actually offer you any servers to store it on right out of the box? How do you get from that to a product that behaves a little bit more like Dropbox or, you know, 1 of these backup tools like, you know, Spider Oak stuff where there are kind of 3 pieces to it. There's the party that writes the software.

There's the party that is running a server,

and and there's you that that's installing this thing. And for everything, almost everything getting delivered over the web, the the party that's giving you the software and the party that's giving you the service that that that that tool needs are 1 in the same. So you don't, you know, install the Dropbox client and then separately try and find somebody to go and buy storage from. That's currently how Tahoe works, and and that's kind of his barrier.

There are times when I think that we should build something on top of Tahoe that maybe has a different name that is powered by Tahoe Technology or something, but it's run by a company that can take your money and give you service. And you don't have to think about the difference between the software you're the client you're running and the protocol that it speaks and the set of potential servers you could use, if only you knew how to configure it that way, and

the actual service you get. So that that kind of productization

step is a thing that that companies do all the time, and larger companies have a lot of experience with defining what the product ought to be. But as an an open source project, we don't have a product manager. So we're still sort of struggling to define how should we deliver this bit of functionality in a way that's gonna be immediately useful to to different sets of people.

Yeah. And 1 of the challenges there too is that while there are a number of larger organizations who are perfectly willing to sell you storage and make it as reliable as you could possibly like. There's no real guarantee

about the security and privacy of what you're actually storing because of the fact that they do have a vested interest

in being able to

monetize based on the metadata that they can gather about you in a number of cases. There are some providers where privacy is their foremost concern, like SpiderOak as you mentioned. But, you know, companies like Amazon, Google, they do wanna be able to do some machine learning over the types of data that people are storing so that they can then build additional products, not necessarily using your data directly, but using the information that they've gleaned from it so that they can use it as a form of market research.

Yeah. It's interesting. There are companies whose

Google is an advertising company and anything that they can learn about you is something they can make money from. Facebook is you know, even more extreme in that direction. And then there are companies that they just I think that a lot of startups these days have this sort of default of record everything we possibly can because we don't know how we're going to make money yet. And if we have that data around, we might discover a way of making money a couple of years down the road. But if we, you know, voluntarily

throw that information away, then we might paint ourselves into a financial corner. And it's unfortunate because it'd

be much happier if the default was, you know, this information about people is toxic and we should have as little of it as possible. And you're starting to see some of that and and some companies that are recognizing that minimizing what they know is is better all around. But, yeah, your point about market research is a really good 1. We don't know how many people are using Tahoe right now. You know, we decided very deliberately to not include any kind of usage metrics or, you know, phoning home to kind of let us know that this particular network has this particular many of server many servers, this many clients, it's running this particular version. And that's kind of a problem for us because when I talk to folks at conferences, I talk to lots of people that know about Tahoe and a lot of people that have used it in some fashion, but we don't really know any numbers on that. We don't have any metrics on how many people are using it. And we obviously don't know what features people are using or what things people want other than when they show up on the mailing list or the IRC channel or, you know, the bug tracker

tracker and they say, hey, please go and fix this thing because I can use it. And so we've kind of handicapped ourselves relative to regular commercial products because we don't want to collect a lot of metrics, and that means we don't know what we ought to be building or we don't know what isn't working for people. And it's a good you know, having that information so you can make a better product is a totally legitimate good thing to be doing. We don't really know how to do that and also not collect the information about usage that we that we don't want to have. And so we've been we've been kicking around the idea of, like, having a a survey button on the main web interface to it that says, you know, if you'd like, please follow this link to this page and tell us about how you're using it so at least we can get a general idea. And making sure you know, having it in that form would be much more opt in and much more, visible of what information you're providing than having something baked into the the code itself. Yeah.

Yeah. And I think that by having that option to provide information,

it would make people much more comfortable and willing to actually give that data rather than if it were something that, as you said, were baked into the product and would make people leery of even using in the first place, particularly since given the nature of the project itself, anybody who is actually making use of it is fairly well aware of digital privacy and security so that if it did have that automatic metrics collection, then it would drive away exactly the people here trying to get to use it in the first place. Yeah. Yeah. Exactly. On the front of sort of digital privacy and cryptography and the fact that there is still a lot of public education necessary to get people to have a fairly secure stance

as they are using the Internet and and digital technologies becoming increasingly ubiquitous. 1 of the

continuous

weak points is the password and the fact that it's so ubiquitous, and we haven't really come up with a reliable way of replacing it yet. So I'm wondering based on what I saw in your presentation from Pycon about magic wormhole is when you were positioning that as a possible way of introducing an idea of identity to an anonymous connection.

What are some of the technologies that you foresee replacing the password as we continue to rely so heavily on digital infrastructure?

Yeah. So,

I would love passwords to just die. And in my mind,

my my kind of vision for the future, some kind of science fiction version here, is is,

you have a password manager. Really, you have your phone or you have some sort of agent that is working on your behalf. And the 1 thing that you need a secret for, the 1 thing that you need a human managed secret for is to establish a connection with that agent. So

in current technology, that's in the form of memorizing the password that you use for your password manager or you're unlocking your phone with your fingerprint and

passcode. That's this, like, secret conversation that you have with your agent about how the agent is gonna recognize you in the future and how you ought to recognize the agent and make sure you're both talking to the right party. And then once you've done that, then this agent stores

everything

else that is needed to go and and,

operate your accounts and having all those tokens. So I'm I'm thinking that you use something like Magic Wormhole, use some protocol like that to provision this device or provision these agents with credentials for these other services. So say you have your company has an online service and it also makes a mobile application that's supposed to talk to it. Current traditional approach is that when you sign up online for the service, you provide them with an email address and provide them with a password. Or you delegate everything to Facebook or to Google and you sign in with 1 of those accounts. But that's that's kind of unsatisfying as far as reducing the authority you're giving to these other other services. So it's a traditional way is that you give an email address and password to the server and then you go to this mobile app and you type in the same email address and you type in the same password and it shows up at your back end server

kind of identically to your web browser. The big problem with that is that you have to remember this password in order to get into it. And it's a password you kind of only need once. You know, you set it up when you, create the initial account, but then you are really just using it to transfer the authority over that account from the web browser that already has it to the mobile device that doesn't have it yet. And that's a place where you could use something like a wormhole magic wormhole as a provisioning step to transfer a credential transfer an access token

rather than starting from scratch and typing a password in. The other feature that passwords, I think, serve a value for right now is account recovery or kind of a a backup plan. So the idea is that if you threw away all of your devices

and, you know, show up with nothing but the clothes on your back, can you still get back to all of that that information?

And if that's a hard requirement,

then things are really tough. That really does limit you in a lot of ways. But I really feel that we can do better than that. An approach that I was I was looking into when we were doing Firefox accounts at Mozilla that and and particularly Firefox OS, the mobile phone operating system that they were doing, didn't really go anywhere. But trying to do a

social account recovery thing that now, like Facebook and Google, have have some mechanisms for. The idea would be that your phone encrypts all of its data. It has its its local key that it's using for that. You add entries to your address book by doing wormhole

type exchanges with them that lets your agent know the public key for that other person. So now your phone, your agent has a secure connection to Alice's agent and to Bob's agent and can exchange secrets through that connection. And what you do with that connection is to take your key and split it up into a bunch of pieces. You're doing this mechanism called secret sharing, and it's a lot like the erasure coding we do in Tahoe, but it's more cryptographically

absolute. You can arrange it so that you need 3 out of 5 pieces,

and any 2 pieces are completely insufficient to get any information about what that key is. And so you basically, you know, take this key to all of your data and split it up into pieces and give 1 piece each to a bunch of people that don't know each other. And if you then lose all of your devices and have nothing but the clothes on your back and you go and and buy a new phone and you go back to those friends and you say, hey. Can you push the button that gives me back my share of that key? And once you collect enough of those fragments, then you can reconstruct that original key and you can get back to the encrypted backup of all of that original data. And at no point did you ever need to memorize a password. And I think there are there are a lot of parallels here to how identity works in the real world. You know, if you were to lose your birth certificate and lose your passport

driver's license and needed to kind of start again, then there actually,

as I understand it, there are laws and procedures in place to allow you to get a new set of identity documents based upon members of your community vouching for you. So, you know, in some cases, it's, it's a it's a well respected, like, priest at your church or some member of your local government or a set of employers. And there's kind of a mechanism by which if enough of those people are willing to vouch for you and make this claim that, yep, this really is Brian Warner, then you can get a new driver's license. You get some new kind of root identity document even though you've lost everything else. So I think that that in many ways, our identity is defined by what other people think of us. And so it actually makes a certain amount of sense to allow the digital identity to be driven by that that and use that as kind of a backup tool.

Yeah. That's definitely a really interesting approach using the idea of social proof in the digital realm as a means of a break glass mechanism

for recovering access to your accounts. And then also as we continue to live more of our lives online, there's the continued dialogue of what happens when somebody passes, how do you transfer

that person's digital assets to their next of kin or their heirs. And so

by moving more towards that idea of social proof, that would potentially simplify

that transfer mechanism. Whereas now a lot of, for instance, password managers have a mechanism where if you haven't signed in in a certain amount of time, it will send the

credentials to somebody who you've prearranged.

I know Google has a similar capability for your accounts. So, yeah, that's definitely a very intriguing idea, not 1 that I've really come across before. It'd be interesting to see how that pans out going forward. But then, also, there's a potential

for, you know, how would a fraudulent actor take advantage of that situation as well. So that's 1 of those things where Right. You know, the bad actors are always the ones that spoil the party.

Yeah. Yeah. It it, what we've tried to do in the Tahoe context is at least make it if if we can't provide you with exactly the kind of access control you want, at least make sure that what we are providing you is is easy to understand and easy to reason about. So in Tahoe, if you when you when you upload a file, you get back this access string that we call a file cap, a file capability string. And it's maybe, like, 70 60, 70 characters long. It's got an encryption key in there. It's got, some hashes. The rule is that if you know the string, you can get the file back. And if you don't know the string, then you can't get the file back. And and the only way to find out the string is either to be

to have uploaded the file in the 1st place or to get the file cap from somebody who already had the file cap. So it's just very much like there's this

this this magic

secret name. And if you know the name, then you can get the data. And if you don't, you can't. And there are no user accounts. There's no access control lists. There's no system administrator.

There's just this very hard, fast, absolute access thing. And you can you can build stuff on top of that. You can make something where you sign into a given account and once you're signed in, then you get access to these things. You can build a thing where there's an administrator that gets access to everything and can hand it out according to whatever policies that they're supposed to implement. And in that context, we've we've kind of sketched out some ideas for Tahoe tool sitting on top of Tahoe that would take a directory cap. So files when you upload a file, you get a file. When you create a directory, you get a directory cap. And in Tahoe, we we talk about the root directory. Like, you have to remember at least 1 thing, and we call that the root cap. And it's a directory from which you can get to all the other stuff you've uploaded. And how do you manage your root cap? Well, you have to, like, store it somewhere or write it down or keep it on your local file system somewhere. What we've sketched out is a way of splitting that up into a couple of pieces using this, Shamir secret splitting mechanism

and making a little app where you kind of draw out a treasure map. Like you say, at the end of the day, what I want is this root cap. But I'm going to

break it up into in in such a way where I need 1 of these pieces plus 2 out of these 3 pieces. And and you can make sort of my my idealized GUI for this is that you can drag out these little nodes of 1 of these 3 or 3 out of these 5 and sketch out how you want this access control to work. And then when you're finished, you push this button to say render it and it creates the directory cap and does the secret splitting and then gives you back these pieces that you then have to hand to different people or store in different places. And when you want to recover this, it's like you get a treasure map and you have to go to this person and get this piece and then go to this place and get this other piece. And 1 of the pieces that that came up there when we were kind of brainstorming about this, you mentioned the,

the estate management, estate planning kind of stuff. You could rig this in such a way where if somebody,

has has passed away and their executor pushes the button on their client that says, alright. You know, I I know this person has died. It's time to start working on their estate. Push this button. And that starts a process by which the agent will eventually give them the decryption key. But,

if somebody is doing that fraudulently

or if you were just on vacation and somebody is jumping the gun and thinks that you're gone when you really aren't, then you get a chance to interrupt that. So, you know, 1 of the pieces

of this tool would be start opening the box, but don't actually reveal the key until 30 days have passed and it hasn't been the order to open the the box has not been countermanded by the original person. So you could build stuff like that. And and kind of, some of the frameworks that we use in the real world and the way that lawyers work and the way that you can set up wills to be executed after a certain amount of time and have certain people in control of it, you can make digital analogs of these things. And I'm looking forward to seeing how kind of personal agents empower that that mechanism.

Yeah. That's a really interesting approach. So

as technologists

and developers, we're all fairly well aware of the weaknesses that are present in the systems that we all use day to day, you know, and the bugs that can be introduced because we've all written our fair share of them. So I'm wondering what are some of the ways that we can make digital privacy and security more accessible for everybody and not just people who are building the systems? Yeah. I think a lot of it starts with usability.

It's it's tempting to take an existing system and then try and figure out how to bolt security onto it, but but that almost never works. In in some ways, it's something you have to bake in from the very beginning. It has to be part of the design. And we don't

this is getting better just recently and and but traditionally, we've not put a lot of effort into the usability of these tools. We've focused on the engineering aspects. We

enjoy nitpicking about the crypto algorithms that we're using when really that doesn't matter in the slightest.

And can somebody use this or not is the biggest determination of whether it's gonna be secure. You know, email is a good example. It's

I I think that we need to be willing to build something new instead of trying to fix something that's old and kind of hopeless. I know a lot of folks that are are doing valiant work in doing

automatic email encryption.

There's there's a good project called AutoCrypt,

that some folks have been working on just recently to do kind of automatic key discovery and trust on first use for PGP over email. But in some ways, you know, signal or,

WhatsApp with the encryption in there is gonna be much more usable because they aren't obligated to try and be compatible with clients that don't know what this protocol is or with a protocol that's 30 years old. So, you know, from the usability side,

we we need to involve more UX people,

much more user testing, much more

watching how people use a tool, and then try and figure out something that does the same thing but in a secure fashion. A lot of it is about

how this conversation between the user and the agent that the computer agent that's working on their behalf, what is the user doing to indicate their intent, and how can we honor that intent correctly and accurately and and and correctly display to the user what they can and what they cannot rely upon? So in an email client, a message appears, and it says from Alice

or or you you type out a message and it says to Bob. And as security people, we know that the from field is entirely under the control of anybody in the network that wants to send something to us. You can put anything you want in there at all. And so in some sense, an honest email client shouldn't display the from field at all or it should display it with something like this is from a person who claims to be known as x y z Or, you know, they want you to reply to this address, but that doesn't actually imply that they have control over that address. And if you go through this kind of design process of if you had to be brutally honest about what the UI was displaying and and change the names and the text of all of these fields to explain what was going on, then that would reveal all of the insecurity that we have in the current systems. If you wanted to make the from field be honest, then you would have to, if you if you wanna say have it just say from colon and you want that to be accurate, then you have to have some sort of signature. And then that steps in this question of, well, if I want to say from colon Alice, then what does the word Alice mean to this user? What this what I think it means is that this user has previously

indicated to their agent that future messages coming from this phone or from this person that I've just exchanged magic wormhole code with, I want to to have those messages be labeled as being from Alice. You know, it's it's introducing,

my agent to your agent through these 2 humans. And afterwards, the agent can say, oh, hey. I'm getting this you know, here's a message from Alice's computer. I I may not I'm not sure it's really from Alice, but I know for sure that it's coming from the computer that has this particular key, and you told me to call that Alice's computer. And so from that, we can say that this message is probably from Alice because she's the only 1 that's supposed to be using her computer. So we can build stuff up like that,

and and try to make all of those UI elements more accurate. Break it down to when somebody tells their computer to do something and they they refer to some other party, how exactly are they referring to that party? What what in their mind associates the name Alice with this particular person? And then how do we, accurately capture that stuff either when we first learn about Alice or for later so that it's it's really this agent that's doing work on your behalf that's being told the right stuff to do. Yeah. There are so many different ways that we can take this conversation. I'm sure we could probably spend hours and hours discussing

Yeah. For sure. You know, 1 of the things that I was, just thinking about as we were talking here was the idea of 2 factor authentication and, how how much

additional protection that provides you and, you know, the challenges of password managers, particularly when there's any sort of cloud components to them and what sorts of security they actually provide. And so Yeah.

But, I'm wondering if there's anything in particular

that you think that we should discuss that we haven't covered yet. I think that we're we're starting to make some good some good strides

in more usable security. There's a a group called Simply Secure that,

does they help, usability

testing. They have, I think, done a lot of grants on trying to improve the usability of, common secure software. And they have this focus of saying, well, you know, it should just like, usability is is key. It should be as simple as possible. You should have fewer options. We're starting to see product developer groups being interested in security as a you you can never really sell security as a feature. If you present somebody with 2 different products and the only difference between them that is that 1 is more secure, then that's not a selling point. But I think we're starting to see an environment where developers are interested

I has a tremendous amount of security engineering going into it. And until recently, they didn't market that at all. It's just the developers felt this is the right thing to do. And they they found an environment in which, you know, the business folks or the folks with the the money said, yeah, it's okay if you spend the time to go and do this right because we realize it's going to be good in the long run. So we're starting to see that. And there are still

sure, a lot of of startups that are trying to get something out super fast and feel they don't have time to make the stuff secure. But we're also seeing a lot of signals and a lot of, you know, the fact that that WhatsApp took on that stuff is just kind of amazing. So I'm hopeful. I I think that we're really starting to see security being recognized as a necessary part of the process and people trying to put the time into figuring that stuff out ahead of time instead of just, you know, putting it off until they get caught or there's a breach of some sort, and then they're trying to find a way of of bolting it on there.

Yeah. It's definitely

an important shift in the industry as we realize, you know, largely because of the number of breaches and the, economic fallout that that results in for the companies that suffer those breaches.

But the fact that we are becoming more security conscious as an industry

and just as a global community of people who are using the Internet and digital services in our day to day, yeah, I definitely think that it is

going to continue to improve. There's definitely ground to be made up, but I think we're on a good path. So for anybody who wants to get in touch with you or follow any of the work that you're up to, I'll have you send me the contact info that you'd prefer, and I'll put that in the show notes. Yeah. Probably, Twitter or, GitHub. Phone me on GitHub.

Sure. And, with that, I'll move us to the picks. Brian, what do you have for us? Yeah. I have a book that I finished reading recently that it's, kind of all I think it's maybe 5, 6 years old. It's called Ra. It's on a website called, that goes by the name of things of interest at qntm.org.

And it is a story about how magic is real and a bunch of physicists in the seventies discovered the particles of magic and how everything,

fits together. And then it gets really kinda weird and exciting from there.

The this author, I love a lot everything on this blog,

has such such an interesting style.

So I highly recommend, Raw on things of interest. Yeah. That sounds really cool. I'll have to check that out. Well, I really appreciate you taking the time out of your day to join me and discuss the work that you've done and pontificate a bit on digital security and privacy. It's been a fun conversation and, definitely some things to look forward to and some things to research. So I appreciate that, and I'm sure the listeners appreciate it as well. Yeah. Thanks. It's been my pleasure.

The Python Podcast.init

Summary

Preface

Interview

Contact Info

Picks

Links

The Python Podcast.__init__