Summary
Working with network protocols is a common need for software projects, particularly in the current age of the internet. As a result, there are a multitude of libraries that provide interfaces to the various protocols. The problem is that implementing a network protocol properly and handling all of the edge cases is hard, and most of the available libraries are bound to a particular I/O paradigm which prevents them from being widely reused. To address this shortcoming there has been a movement towards "sans I/O" implementations that provide the business logic for a given protocol while remaining agnostic to whether you are using async I/O, Twisted, threads, etc. In this episode Aymeric Augustin shares his experience of refactoring his popular websockets library to be I/O agnostic, including the challenges involved in how to design the interfaces, the benefits it provides in simplifying the tests, and the work needed to add back support for async I/O and other runtimes. This is a great conversation about what is involved in making an ideal a reality.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Aymeric Augustin about his work on the websockets library and the work involved in making it sans I/O
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by giving an overview of your work on the websockets library and how the project got started?
- What does "sans I/O" mean and what are the goals associated with it?
- Can you share the history of your work on the websockets project?
- What was your motivation for starting down the path of rearchitecting a project that is already production ready?
- Can you talk through how the websockets library is architected currently?
- How has the design of the project evolved since you first began working on it?
- At a high level, what were the changes required to make it functionally sans i/o?
- What do you see as the primary challenges associated with making network related libraries sans i/o?
- In your experience of porting websockets to be purely protocol oriented, what are the technical and design challenges that you faced?
- One of the goals of the Sans I/O approach is to support reusability and composability of network protocol implementations. What has your experience been as to the viability of those goals in practice?
- What is your current perspective on the cost/benefit of the sans i/o conversion?
- Who are the primary consumers of the websockets library?
- How do you foresee the target audience changing once you have completed extracting the protocol logic?
- What are some of the most interesting, innovative, or unexpected ways that you have seen the websockets project used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on the websockets project and sans i/o conversion?
- What do you have planned for the future of the project?
Keep In Touch
- @aymericaugustin on Twitter
- Website
Picks
- Tobias
- Aymeric
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Sans I/O: When The Rubber Meets The Road
- Websockets library
- Websockets Protocol
- Qonto
- Tulip
- Asyncio
- CERN Particle Accelerator
- Sans I/O
- Cory Benfield
- HTTP/2
- Twisted
- Curio
- Trio
- Inversion of Control
- ohneio helper library for implementing sans I/O network protocols
- SOCKS Proxy
- Sanic
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode,
[00:00:46] Unknown:
that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today, I'm interviewing Aymeric Augustin about his work on the web Sockets library and the work involved in making it sans. Io. So, Aymeric, can you start by introducing yourself?
[00:01:08] Unknown:
Hello. I'm software engineer, account manager. I'm the CTO of Konta, startup building a bank for SMEs in Europe. We're based in Paris. I've been a Python developer since around 2009. Today, I'm doing a lot more management and a lot less coding. The I still enjoy doing a bit of Python in my free time.
[00:01:28] Unknown:
And do you remember how you first got introduced to Python?
[00:01:31] Unknown:
The first contact, I think, was when I was a graduate student and a friend told me about this great and nice and beautiful language called Python. And at the time, pretty much everything I knew was PHP. And so I did it as, well, I can also open a file in PHP. It's not all that different, just like in other bows. And so eventually, I came to realize that I could do a bit more thing with Python than with PHP.
[00:01:55] Unknown:
And so the reason that we're talking today is because of your work on the WebSockets library and some of the work that you've been doing to remove the explicit IO dependencies. But before we get into that part of it, I'm wondering if you can just give an overview about the WebSockets library itself and how the project got started.
[00:02:11] Unknown:
Yeah. So this project has a fun history. It was back in 2013. I was working at a company where we we did a a lot of things around electricity storage and usage for transportation. We had been building the Autolip car sharing service in Paris, and then we were looking at deploying stationary batteries, in Africa to give power where the grid wasn't reliable. And so we'd have as these big power packs in various location with poor Internet connectivity, and we were looking for a bidirectional protocol to communicate with them. And we wanted something that that that would be reasonably easy to use on our end, the server side. And, well, we were mostly doing web stuff with Django at that point, so Python was a natural choice. And Guido van Rossum was just starting working on Tulip, which actually became Async IO.
And so I decided to write a library to implement WebSockets on top of Tulip because that's interesting with the plan of using this for this project. And so I left the company shortly thereafter, but they did end up using WebSockets and I think they still use it. And I continued maintaining it in my free time, even though I never did anything serious with it. Yeah, it's funny how you get an idea for something and then you just can't let it go, even though it's not actually part of your day to day anymore. We are the best that I still spend so much time there. To be honest, I I had some low points, especially when I realized that 1 of the most popular uses of WebSockets was for interconnecting to cryptocurrency exchange trackers.
Many of these provide WebSocket endpoints. And I take issue with the environmental footprint of cryptocurrency at this point in time, and I've been lectured abundantly about models that could be more energy efficient. That said, there are still many, many other interesting uses of WebSockets, and a particularly interesting 1 is that I know of a user who relies on WebSockets to do something around the particular accelerator of CERN, which is buried somewhere in those Alps. And so it's kinda amazing that that's WebSockets indirectly helps particle physics.
[00:04:18] Unknown:
Yeah. It's definitely very cool. Any anytime you can use the term particle physics in a conversation, it's worth it.
[00:04:24] Unknown:
Definitely.
[00:04:25] Unknown:
Before we get too much into the story of your recent work on the project, I wanna take a sidestep and discuss the whole idea of Sands. Io and what it means and the overall goals associated with that overall strategy.
[00:04:39] Unknown:
I think Sands. Io was popularized around 2016, and it started from noticing some waste. People repeatedly write the same things, typically HTTP, just because they want to interconnect it to the network in different ways. And so rewriting an entire HTTP parser or an HTTP 2 implementation, which is even more complicated just because you won't talk to a network with asyncio rather than, say, basic threads and sockets, like to request those, for example. Well, that's still wasteful. And so the proposed solution to avoid this, was to decouple the protocol implementation from 2 things.
The first 1 is how you talk to the network, and the second 1 is the concurrency model. And these things are pretty tied together, and typically, something like asyncio or twisted or curio or trio, well, involve both concurrency primitives, how you run coroutines or threads maybe, and how you talk to the network. So this gives a fairly restrictive baseline, because you're essentially allowed, bytes and function calls. But if you write something and your public interfaces are just bytes and function calls, then you can hook it to anything. And so this is a theory behind saw. Io is that this allows writing reusable libraries, and this is how, HTTP 1 1, HTTP 2, and WebSockets now are working.
[00:06:18] Unknown:
So you've been working on the WebSockets library for a while. You had a fully functioning project. It was being used by end users. I'm wondering what your motivation was for starting down the path of doing such a major rearchitecting of the code base in order to remove the IO dependencies?
[00:06:35] Unknown:
The good news about a Hubby project is that I don't have to have a specific reason to pursue different goals. And in fact, it's a complete switch from the original goal. The original goal of WebSockets was to feel asynch Ionally. And over time, I put a lot of energy into making sure the APIs would behave like similar APIs in plain asyncio. At this point, I think there's been about 1, 000 issues filed in the tracker. I answered probably several 100 of these with, Hey, how would you do this in plain asyncio? If you pass the same arguments, it should work in WebSockets.
The original strategy was let's make the best possible use of asyncio. So now the design is, hey, let's make the best possible use of Thor. Io and let's see how it's gonna fly. So in a sense, it's a natural continuation because despite being a good production ready library, WebSockets is also an experimental ground for me as you see how some concept can be applied, how it turns out in real life. And so I'm just taking the experience further. Well, it's a pretty big step because at the beginning, I had nothing. So I just started building the implementation. And now, well, I know all the bits and the features, that are useful to make a good complete WebSocket library. So so, of course, I have a bit of a second system effect where I want to do everything right.
But that that makes the experience all the more interesting because I really get to to compare the async IO version with the saw. Io version and, eventually, its declination with async IO with sockets and strads and perhaps if I want to write it with Curio or Trio.
[00:08:15] Unknown:
Digging a bit more into the actual technical implementation of the WebSockets library, can you talk through a bit about how it was structured prior to this refactor and some of the design and use case evolution that has happened from when you first started the project to where you are now with this major rewrite?
[00:08:35] Unknown:
There's there's an interesting story there. It took me quite a long time to converge to a design that works well. Mostly because I have very little experience to current programming when I started this. So I have to give a quick primer about the WebSocket protocol. Essentially, it's a frame oriented protocol built on top of TCP. You send the data frames of various types, text and binary, essentially. You can fragment them. So you will send several frames, which will be reassembled into a single message. And you have control frames, which are ping, pong, and close. Ping and pongs are essentially used as skip lights, and close provides a more robust closing handshake versus what you can get with TCP over today's hardware and networks with lots of hardware that interferes.
So the typical challenge if you want to do something that works well is you are receiving a big message in many fragments and you have a ping in the middle. You want to answer the ping with a pong before you get to the rest of the message. And so if you have an API that's like, okay, let's read the next message, which is what you want, you need to have something somewhere that will intercept the the ping frame and answer it. And so, eventually, when you investigate the problem, there's surely several ways to handle it. But if you're doing it the natural way with asyncio, which which is just I'm going to run a coroutine that reads frame from the network and processes them.
You end up having a background coroutine that reads all incoming frames, data frame and control frames, reassemble messages if they're fragmented and put them in a queue somewhere. This will control frame, ping, pong, close. And then user code running in another coroutine can fetch messages from the queue. So right now, we've gotten into a complicated territory where we have 2 coroutines running all the time, 1 managed by the library, and at least 1 other and perhaps many others managed by the user, by the user code who who will call the receive function of of WebSockets to to receive the next message and will await it in async IO.
Sending is less of a problem. There's a bit of complication if you if you want to handle back pressure right, and Asyncio took a few rounds to to stabilize its implementation of backpressure on Bright. So WebSockets has to had to follow suit and to support multiple Python versions. So it was a bit of a pain. But, eventually, around version 3 or 4 of WebSockets, this stabilized to a situation where both the receive and the send functions were async, where a coroutine was running in the background to process incoming frames, reassemble messages, and put them in a queue, where I could call receive to get the next message or to await it, and you could await send to to send any kind of frame. And it turns out that mostly for implementation reason, I had the 3rd coroutines that was responsible for eventually closing down, properly the TCP connection.
It was not strictly necessary from a theoretical standpoint, and I had long debates, like, Chris Jardanek in the in the ticket tracker about this. He's correct in saying that we don't need a search routine. But to me, it was more practical to ensure the robustness of the implementation, which is always a hard point when you have multiple coroutines running. People keep reporting bugs where clearly, state got out of sync. That was a concurrency issue I wasn't expecting, and asyncio doesn't really help there because it's very high level. And very often, when there's this kind of concurrency issue where where state gets in an unexpected state, I end up digging into asyncio internal to to understand what's going on. And so what happened over time is that gradually, I started removing the asyncio abstractions and getting closer and closer, to to to the low level APIs. And so asyncio was super useful to get started quickly.
But as a library implementing a protocol, eventually, I wanted to get rid of the stream readers and the stream writers, which I choose for abstractions. Well, give me, this many bytes and wait until you add them. This is a super user extraction. You you need this to implement the protocol like WebSockets. But what asyncio did forced me to go read the implementation, to understand all the fine detail of handling cancellation, for example. Cancellation is a very hard point in, programming. And so when I saw the opportunity to get rid of all this and to rebuild myself without depending on all these abstractions, it felt felt interesting.
[00:13:24] Unknown:
Yeah. It's definitely an interesting challenge to figure out what is the actual core concern of this protocol because particularly with networking and what has gotten us into this situation is that it just feels natural to handle the IO as part of the protocol implementation. And so that's why it's had to be redone so many times. And I'm wondering if you can just give a bit of an overview about sort of the core elements that are necessary for a Sans. Io library to be functional and how that changes the overall design and interface for the library where the protocol is the core of the logic, and you're actually adding the IO implementations as additional integration layers rather than as the primary concern?
[00:14:12] Unknown:
So from a regular user's perspective, I think it can change nothing. You can provide exactly the same high level API. And my plan at this point is to swap the legacy asyncio based implementation with the new Sans IO based implementation. So every user in some version update will get the new 1 by default, and I think it could work. So now, here's where things get complicated. So the initial implementation, which is based on asyncio, looks like a read message coroutine that calls a read frame coroutine and just some logic for triaging, control frame data frame, etcetera, which eventually calls the read exactly method of this asyncio stream reader to get the first 2 bytes. Because to read a WebSocket frame, you you first need to read the the the first 2 bytes and then maybe more header and then the body. It's a type length value protocol, so you always need know how many bytes, you you need to read. And so it's just a chain of coroutines calling other coroutines.
So so it's it's very simple. There's a diagram into documentation that shows the well, every every method to to different networks. So now once you're doing some IO, you cannot do this anymore because the basic data input is you receive a bunch of bytes from the network and you deal with these bytes. So it's very much the same as the data received, function of an async IO protocol. Just all all the nice teas are gone. You have some form of inversion of control in the store. You provide a function that can be called to push some bytes or to push end of file, and then you deal with this. And then the the caller can check to see if you produced events. So so frames typically in the case of WebSockets. There's a handshake that I'm not mentioning because it's just making things more complicated. You did and it's not strictly necessary for the purposes of the discussion.
So in the song. Io io call, the first thing you need is a stream reader abstraction, which is just a buffer where you can push the incoming bytes, and then you can read a fixed number of bytes, from this buffer. And then you need to analyze the cases whether there's enough or not enough. And so, interestingly, an elegant way to do this in Python is generator based coroutines, which we implement with yield from. And so WebSockets being originally written with yield from because this is all we had prior to Python 3.6, if more reserves, then switching to await and async in Python 3.6 has moved back to yield from and to generative based coroutines as opposed to native coroutines because I was not allowed to use asyncio as a concurrency model because it's some IO. So so I'm only allowed to use functions.
So here, I could certainly take an off the shelf stream reader. I think the whole new. Io project, like, if written by a German person, does this. And then there's a question of, do do I want to write, something that takes 100 lines and keep what sockets with the dependencies or understand someone else's stream reader implementation and take a dependency. And so since this is an experimental project for fun, of course, I'm writing my own. I get to figure out, for example, how to end the end of file. It's not very hard. There are better ways to do it, but it needs to be done. And I get to figure something harder, which is how to propagate exceptions.
This is something that you get for free in asyncio and with Python generators. And when you start manipulating all the bits by yourself, suddenly, you realize that coroutines did a lot for you for free. So in this sense, the implementation suddenly becomes 1 order of magnitude more complicated to to conceptualize and to manipulate than when I had all the niceties of asyncio. So what I did in WebSockets was build a SOARIO layer and then try to use it to build a synchronous version of WebSockets, which is something people have asked for mostly for writing clients. When you write a WebSocket client, very often you have just 1 client. You don't need the fancy, concurrency that allows writing a server with 10, 000 connected clients. And having to run an async IO, even loop, etcetera, at overhead, maybe well, you you just have these asyncs and the weights everywhere, which we you don't really need.
So there was value in providing a synchronous client. And so I have a work in progress version of this, except right now, I'm looking precisely, at how to end the end of file. So so when you get to the to the end of the TCP stream in the middle of a frame because the connection dropped, since WebSocket uses long running connections, this happens all the time. You cannot skip this edge case. And also at how to, handle access and properly. So so I'm at the step of investigating, and so I don't have the answer yet.
[00:19:17] Unknown:
1 of the other goals of the whole Sans. Io approach is the ability to, 1, use the protocol specifics of their IO requirements, but also for being able to compose together multiple different protocols. So for instance, if you wanted to be able to use WebSockets with an HTTP proxy or you wanted to be able to compose together, you know, HTTP 1 versus HTTP 2 compatibility. And in terms of your experience of working with WebSockets sockets and going down this road of making it sans. Io, I'm curious if you have experimented with the actual real world utility of composing together these different protocol implementations and then layering on your specific IO requirements and just some of the complexities or challenges or shortcomings that you may have experienced in the process?
[00:20:12] Unknown:
Yes. Pain is real. WebSockets currently can connect through HTTP proxy, but not all combinations of HTTP well, scratch that. Let let let me restart the the entire answer. While the pain is real, people have been asking for support for HTTP proxies in WebSockets. I have a pull request that does this, but doesn't necessarily handle well all the HTTP, HTTPS combinations or at least it's not tested. Why? Because asyncio is too high level and doing an TLS handshake inside a TLS connection. There's no good public API for doing this. And another example is that people have been asking for SOCKS proxy support. I'm willing to implement HTTP proxy support in WebSocket. It's just a connect request. I'm not willing to implement SOX.
An interesting fact at this point is that the transport and protocol abstractions of asyncio were supposed to provide this kind of composability and, were inspired for from Twisted, which as far as I understand, but I never used it in in real life. It's designed this way for for this reason. But to be honest, I looked at the existing asyncio based libraries for SOX, and I tried to figure out how I could perform the surgery in WebSockets to to connect everything together. And it felt complicated, so I left it there. It feels possibly simpler with a scenario approach, but I'm still seeing a possible issue.
Let's say so so I mentioned that in WebSockets, you have handshake, which is an HTTP request and response, and then you can send frames. So let's assume you are a WebSocket client. You send a handshake request, and you get 1 TCP packet, the handshake response, and a WebSocket frame. Most implementations will not do this, but some might. And so suddenly, in your software, and so in the case of WebSocket, in the stream reader, you have the response header that you read and possibly other things. And so if you have to hand this off to another library or, like I said, to to do some surgery in the middle to to reconnect things, then you start having to pull things out of your buffers and put them back into someone else's buffers. And this this looks, at the same time, entirely doable and possibly fraught with risks.
There's a key question for me here is, how do you the limit handovers from 1 protocol to the other. And I think it's hard to find a good software design for this. I'm already struggling with a very basic version of this in WebSocket, which is how to do the handover from the HTTP handshake to the WebSocket frames. And the code around this, I never found a way to make it elegant.
[00:23:10] Unknown:
And so in terms of your experience of going through this conversion from a functioning library that's tied to a specific IO implementation to extracting the core protocol capabilities into a Sans. Io implementation and then layering the IO back on and working on trying to compose together multiple Sans. Io protocol implementations. I'm curious what your current perspective is on the overall cost benefit analysis of the conversion and just the overall approach of Sands. Io, and if you think that the goals as stated still align with the reality.
[00:23:48] Unknown:
So if eventual goal was only to have an asyncio library, designing it around the saw your core and the asyncio integration layer wouldn't make any sense, in my opinion, because there's a huge overhead in terms of the design, and you do so many other niceties, and you you have to rethink everything from scratch. Essentially, you you're naked in the jungle with a knife instead of, flying a jet. 1 clear improvement though is on tests. As there's an absolutely horrific file in WebSockets, which is called test client server dot py, which essentially performs integration tests between the client and the server and tries tons of permutations of parameters in the protocol.
And this is so slow because every time, you have to create an event loop to isolate things properly and to start a server and to start a client and to make a TCP connection and to exchange a frame and to close everything. And this is so much slower than necessary. Once the protocol is extracted out, you can do all the testing of parameters and their permutations, purely manipulating bytes in memory, and it's incredibly faster. So as a promise of better testability is clearly delivered. I will have to complete the implementation of 1 IO integration before I can give the final word on this.
But I think the testability will likely be better on this end, modulo the fact that having 2 coroutines, at least 2 coroutines or 2 threads, for the reason I explained in the beginning is still inevitable. So there are still all the concurrency pitfalls, and it's hard to say if there will be large gains on this front in terms of having fewer concurrency bags. I'm afraid it won't be that different that they will just reopen a big can of bag by by writing a second system. So so these are my observations at this point. I'm likely to be more proud with the final result. So with v2, asyncio around a core as on with v 1.
It's likely to be a bit more maintainable. The only thing that will really make a big difference, in my opinion, is if another major project, server product integrates it, and it works well. So in the past, the folks making, Sanic, have been relying on WebSockets for their work for the implementation. But since they didn't have a good API for integrating it, they did what they could. So the result is a bit of a Frankenstein, and they tend not to get the benefits of APIs or improvements added to WebSockets unless they do specific work to implement it. So typically, when when WebSockets added compression, which seemed like a useful feature when you are shuffling a lot of JSON over the wire, the Sanny didn't get it. So 1 of the motivation drivers for, doing this project was making it possible for people doing this kind of thing to rely on WebSockets. Well, it's not under my control if a product eventually takes advantage of this.
[00:26:56] Unknown:
So you mentioned that the potential use cases and end users of the library are going to change a little bit where the top level API is going to remain the same, where it's going to be async IO oriented. But it also exposes just this core protocol layer that other people can integrate more directly or compose together. I'm wondering if you can just give some perspective on how you foresee the overall target audience changing and the interaction patterns for you as a maintainer evolving as more people become aware of and are able to take advantage of this peer protocol implementation?
[00:27:35] Unknown:
So 1 of the first things that's going to happen is that we will have multiple versions of WebSockets. I assume people writing servers will stick to asyncio for the most part. I think, few folks who like to tinker or to experiment may try to write servers with a Qio or Trio if eventually implement this. It's pretty low priority because I think these projects are still mostly experimental and have nowhere near as much traction as asyncio. So anyone who uses this is quickly going to run into ecosystem issues. I can't control my database, this sort of thing. And the sockets and STRAD implementation will be very useful for my clients as as as of well, I have pushed back feature requests saying, hey, can I have a callback based API?
Because I think coroutines are objectively better than the, callbacks for this sort of thing and also people for for projects with with callback based APIs. Now the feature request of I want a synchronous API is completely legitimate when you just want to write a client to wait for messages sent by a server and do something with them and move on to the next message. You don't really think how to do this, and so this will open a new field. And so now the last part which you had in mind was other projects integrating WebSockets. In my experience, maintainers of a sub product are very autonomous for doing this. Perhaps this could result in some interesting bug reports.
Historically, I took a very strong stance of quality in WebSockets. So it's very high result of test coverage. But while it could expose patterns that my own implementation is not exposed, some bugs may lurk in there. So we will see, but at this point, I trust the test harness of the style implementation, and there's no concurrency bugs in there, and concurrency bugs are the only hard ones.
[00:29:31] Unknown:
And in terms of your experience of building the project initially and then going through this conversion of extracting the core protocol layers, I'm wondering what are the most interesting or unexpected or challenging lessons that you've learned?
[00:29:44] Unknown:
So 1 challenge that I haven't cracked yet is how to swap the current implementation, which I'm now calling the legacy implementation based on I c asyncio with a new integration layer built with an asyncio on top of the scenario core. So I spent quite a bit of time exploring how I could do some clean surgery to rewire the current API and just swap the call so my entire test suite would still pass, and I could have high confidence that the results contains no regression. At this point, I have come to believe that it will not happen, and so I just moved the entire implementation into a legacy submodel.
I added some compatibility imports. I think I'm going to build a new implementation that, well, provides the same API. Perhaps I release a version where a user who wants to try it can opt in just by importing serve and connect the primary entry points from a different location. I'm slightly skeptical on this because I don't think anyone would actually try it. In fact, And at some point I think I will make the jump and, and swap. But the key lesson here is that the Son. Io folks have 1 thing really right. It's that once you buy into something like asyncio and the entire code base is predicated around the concurrency model and the IO model and the utilities provided by the library, Essentially, you can't do anything with it. You cannot refactor it. The only thing you can do is rebuild it, and this is what I end up doing.
[00:31:21] Unknown:
Are there any notable features or improvements that you have planned for the future of the project or anything that you're particularly interested in digging into?
[00:31:30] Unknown:
1 thing that could come afterwards would be supporting doing the WebSocket handshake over HTTP 2 versus HTTP 1. Right now, WebSockets has its own implementation of HTTP 1 because writing HTTP implementation is fun. So I did it like everyone else. And because I didn't want to take on the dependency if I could avoid it, and also because it wasn't sour. Io, so any dependency would be hard to integrate. So for HTTP 2, it's sufficiently complex that I'm really not considering doing it by myself. And so I could try to do this most likely with, H2 library as a hyper project.
That said, it looks like a lot of work for very minor benefits because the whole point of WebSockets is to have a long lasting connection, a permanent connection. And the first 200 bytes really don't matter.
[00:32:21] Unknown:
Are there any other aspects of the WebSockets protocol itself, your implementation of it, or your work on making the library sans. IO that we didn't discuss yet that you'd like to cover before we close out the show?
[00:32:34] Unknown:
1 last thing that's interesting, it's authentication. So right now, if you connect to with WebSockets to a server and there are credentials in the URL, username and password, WebSockets automatically performs basic auth, which is, well, just adding a header to the to the request. Is there something similar but with a slightly more complicated API on the server side? Because due to to inversion of control, it's not just a URL you're passing to a function call, but it has to be something you configure when you build the server. And, of course, people have been asking for digest auth, which requires 2 HTTP requests.
So client make a HTTP request, server sends a challenge, A client sends the authentication info, and then we can complete the sequence. Again, this seems simple, but when I actually try to build this with all the possible permutations of also following redirect, etcetera, etcetera, it gets complicated. And I suspect this could be easier to build inside the song. Io model, because it would it would just be as a handshake sec sequence lasting a bit longer. So we're exchanging more bytes, but it's pretty much transparent at the network level. So it's not very high priority, and I mentioned earlier the integration is not very elegant in this area.
Essentially, clients and server in WebSockets share the entire implementation, except they do something special with the first message because it's a HTTP request or response that depend on which side you are, and it requires some specific handling. And so now if it has to be the first 2 messages, in some cases, it requires a bit of rethinking how this part works. Once I do this, perhaps I can have basic auth, which seems like a very valid feature request. By the way, I realized I didn't talk about something which is handling redirects. If you're making client connection and the server sends a 3xx HTTP code, you want to follow the redirect.
And so here, very interestingly, the protocol provides an information that needs to be used for doing something at the network level, possibly opening a connection to different host. So this is typically where fire creaks a bit around the edges. Essentially, all this logic is pushed to the integration layer. It cannot leave into the core. And so the integration layer, in the case of a very simple protocol like WebSockets, stays fairly thick. The authentication logic I just mentioned, or the redirect logic is very likely to leave at least in part in every integration layer. So asyncio, sockets and trial, Choreo, etcetera, etcetera. So in that sense, even even though there's a good don't repeat yourself effect, thanks to the saw. Io core, it's not as complete as you might expect if you just read, let's say, the marketing of Square.
[00:35:31] Unknown:
Yeah. It's definitely interesting how some of those concerns can kind of creep outside of the core elements of the protocol, and there's no way around implementing it specifically for the each IO implementation. So it's definitely an interesting complexity to that problem. And I'm curious how many other network protocols have that kind of creeping of concerns into the IO layers.
[00:35:53] Unknown:
The way I look at it is just that being user friendly is hard. If you want to do things a simple way, you just push it from 2 users. But if you want to be nice and given anything that looks reasonably like a WebSocket URI
[00:36:07] Unknown:
or IRI, by the way, Eventually, we managed to connect to this and provide the send and receive calls. Well, then you have some work to do. And I think this is really what we are seeing here. For anybody who wants to follow along with the work that you're doing or get in touch, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose taking the time to sit down and do a puzzle every now and then just because they're fun and challenging, and it's a good way to get everybody in the family to work work together on something. So with that, I'll pass it to you, Aymeric. Do you have any picks this week?
[00:36:38] Unknown:
I think I'll just link to something completely different, you know, which is my my vision of how we can make engineer proud. In too many companies, engineers end up suffering and the pressure of deadlines and feeling pushed to improve quality. And I think it's an important problem to solve. So I'll link you to something I've heard about this.
[00:36:55] Unknown:
All right. I look forward to reading that. And thank you very much for taking the time today to join me and share your experience of going down the road of implementing a SANDS IO network protocol, particularly 1 as involved as WebSockets. So it's definitely a very interesting project, and I appreciate all the time and effort you've put into that. And I look forward to seeing where you take the project, and I hope you have a good rest of your day. Thank you for having me on the podcast. Good afternoon.
[00:37:22] Unknown:
Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com for the latest on modern data management. And visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@podcastthenit.com with your story. To help other people find the show, please leave a review on iTunes and tell your friends and coworkers.
Introduction and Guest Introduction
First Encounter with Python
Overview of the WebSockets Library
Understanding Sans IO and Its Goals
Technical Implementation of WebSockets
Challenges and Real-World Applications
Future of WebSockets and Sans IO
Lessons Learned and Final Thoughts