Visit our site to listen to past episodes, support the show, and sign up for our newsletter!
Summary
Dag Brattli is an engineer with Microsoft and in his spare time he created the ported the Reactive Xtensions framework to Python in the form of the RxPy library. In this episode we had the opportunity to speak with Dag and learn more about what ReactiveX is, why it is useful and how you can use it in your Python programs. It is definitely a very powerful programming patern when manipulating data streams which is becoming increasingly common in modern software architectures.
Brief Introduction
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- Subscribe on iTunes, Stitcher, TuneIn or RSS
- Follow us on Twitter or Google+
- Give us feedback! Leave a review on iTunes, Tweet to us, send us an email or leave us a message on Google+
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show you can visit our site at
- I would also like to thank Hired, a job marketplace for developers, for sponsoring this episode of Podcast.__init__. Use the link hired.com/podcastinit to double your signing bonus.
- We are recording today on October 2nd, 2015 and your hosts as usual are Tobias Macey and Chris Patti
- Today we are interviewing Dag Brattli about the RxPy project
Interview with Dag Brattli
- Introductions
- How did you get introduced to Python?
- For our listeners who haven’t heard of it before, can you describe what RxPy is and why someone might want to use it?
- What problem domains are best suited for using the Reactive X approach?
- What is involved in integrating RxPy into an existing code base?
- When should we use RxPy over asyncio or asynchronous workers like Celery?
- What resources or tutorials do you recommend people use when trying to understand how and when to use the Reactive X tools?
- What in particular about Python lends itself to the ReactiveX pattern, and what features of the language does RxPy leverage in particular in its implementation?
- In what ways does the Python implementation of the Reactive X framework differ from those of other languages?
- The project description references the use of LINQ for querying the various data streams that RxPy enables consumption of. I had always heard of LINQ in the context of traditional database queries. What makes LINQ a good choice for stream processing?
- I mostly hear about ReactiveX in terms of UI design, but the project description seemed to indicate it was much more generally useful. What are some of the less common and more interesting problems that RxPy lends itself to solving?
Picks
- Tobias
- Chris
- Dag
- ASTor
- How To Bake Pi – A book about the mathematics of mathematics
Keep In Touch
- GitHub
- Links
- Main ReactiveX Site
- rxjava site for documentation
- rxmarbles
- MSDN Channel 9
- Function Overloading in Python 3
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast.init, the podcast about Python and the people who make it great. You can subscribe to our show on iTunes, Stitcher, or TuneIn Radio, or you can add our RSS feed to your podcatcher of choice. You can also follow us on Twitter or Google plus and please give us feedback. You can leave a review on iTunes, send us a tweet, send us an email, or leave a message on Google plus You can also leave a comment in our show notes. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show, you can visit our site at pythonpodcast.com.
I would also like to thank Hired, a job marketplace for developers, for sponsoring this episode of podcast.init. Use the link hired.com/podcastinit to double your signing bonus. We're recording today on October 2, 2015, and your hosts, as usual, are Tobias Macey and Chris Patti. Today, we are interviewing Dag Bradley about the RxPi project. Dag, could you please introduce yourself?
[00:01:13] Unknown:
Yeah. Hello. My name is, Dag Bradley. Work for Microsoft in Norway, 69 degrees north. So it's probably the most northern Microsoft office in the world. So work for the for the Outlook and Office, 365 team. For Fast, it's called. It's a it's a company that was actually bought by Microsoft some years ago. Worked within the search and analytics team.
[00:01:42] Unknown:
That's great. So how did you get introduced to Python?
[00:01:47] Unknown:
Python is it's sort of hard to remember. It's it's, I'm not that young anymore. So it's it's, I remember at the university, and it's, the first version of Python I I can remember is 1.5. So we are going back to 97, I think. And I've been using Python all since then, but, I'm not using it in in, at the office anymore. Use c sharp and and other things. But so Python is is only something I use in the weekends as a spare time hobby.
[00:02:26] Unknown:
So for our listeners who haven't heard of it before, can you describe what RXPy is and why someone might want to use it?
[00:02:34] Unknown:
Yeah. RXPy is a library for event processing. So it's something you can use to to process event streams. So you can so if you have some programs that that handle different kinds of event streams and you need to to react on on what's happening in those event streams, maybe you want to do some filtering or combine the the the events in some way, then, RXPy is is should be a good choice.
[00:03:09] Unknown:
And what problem domains are best suited for using the reactive x approach?
[00:03:15] Unknown:
I think that, if you have some back end services that needs to deal with the multiple async data streams, then it's, it's best suited for for RX pi. If you have some, like, streams of flight information, weather information, location data, stock information, and, you need to do something with those data that you want to filter it or transform the the values in any way. If you want to do some aggregation, some sums, and some running totals, averages, if you want to combine the data streams in any way, or if you need to do some some time shifting, as we call it, that if you want to throttle the the the data streams, delay the data streams in any way, then it's a good choice for, for RXPI.
[00:04:15] Unknown:
And what is involved in integrating RX pi into an existing code base?
[00:04:21] Unknown:
Integrating should be quite easy. It's sort of it's a bit different if you you if your application is threaded or if it's using some kind of event loop. So if it's threaded, then then you can just sort of go ahead and use it. But if, if you have some event loop, you need to integrate with what we call a scheduler. And, our XP has scheduler for for all the common, event loops used in in Python, like asyncio, gevent, twisted, tk, cute, pygame, and the IO loop. So, we have some examples on on the the GitHub, site where you can see some, how we can do this.
[00:05:14] Unknown:
And talking about documentation, what are some resources or tutorials that you recommend to people when when they're trying to understand how and when to use the ReactivX tools and how to start understanding the different patterns that are involved in the RXPy and the ReactivX frameworks?
[00:05:33] Unknown:
Yeah. The reactive extensions of the reactive x can be a bit of, what do you call it, steep learning curve Mhmm. For many people. So but, there are a lot of good documentation out there. Unfortunately, the Rx pi is not that good documented. And since it's a hobby project, I I, should admit that I should be better to do the right documentation. But the good thing with Rx is that it's sort of a universal language. So if you look at the documentation for 1 programming languages, it's it will work for for, for all the others. So I would recommend people looking at the ReactiveX homepage, which is at, reactivex.io.
And, also, there's much good documentation at, rx Java site. There are a site called Rx Marbles, which, people should look at, where you can sort of play interactively with the operators. Channel 9, mstn.com, has tons of videos you can look at.
[00:06:49] Unknown:
That RX Marble site sounds very interesting. I can definitely see how having a visual representation of the reactive extensions actually operate on the data. Because I know that in my experience of trying to use RxJava on an Android application, it took some doing to really understand what the different operators were and how they all related to each other. But once I actually was able to get some data flows cobbled together it greatly simplified the overall program and I was very happy with the end result. So I definitely look forward to being able to integrate rxpy into some of the projects that I work on.
When, when would you wanna use rxpy over an asyncio, event loop, or background worker like Celery? Or is, Rxpy particularly best suited for background tasks?
[00:07:46] Unknown:
If you look at the Celery, which is a distributed task queue, is quite different from Rx, which is an event processing library. So I I don't think don't think that you should use 1 or the other, but it could be good to use them together so that you could use, Celery as your sort of platform and then use Rx within, the the task functions to do the processing. So you could use Rx to to, transform the messages through some computation and and, send messages to other tasks. Much more advanced use of of Rx would actually be to to generate, an Rx query, and then transform it into an an expression tree or an AST, as we called it in Python.
And then we could sort of delegate parts of the expression tree out to different tasks in a Celery application. It's a it's a bit more advanced, so I'm not sure if if it's
[00:09:03] Unknown:
clear or not what I I tried to tell. No. That definitely sounds like a very powerful combination being able to, as it were, compile the data stream and then be able to execute different pieces of that flow on different workers so that you don't have so that the overall process can be done maybe partly in parallel. That sounds like it would be a very interesting use of Rxpy and particularly the Python AST. And as we were talking, 1 thing that I was curious about is if there are any particular facilities in Rxpy or associated libraries that facilitate testing of programs that use the Rxpy library and being able to maybe hook into the event streams to provide assertions as the data flows through?
[00:09:50] Unknown:
Yeah. We have some tools in in the testing module that can be used to, to look into the data stream. And there's also what is called a virtual scheduler that you could use to process events in virtual time. So you can replay a sort of a a huge log file or or some event table that you have. And if you have time stamps on that, RX will process it as it would be done in, real life. But the events would replay in in, like, milliseconds.
[00:10:26] Unknown:
So what in particular about Python lends itself to the reactive x pattern, and what features of the language does RXPy leverage in particular in its implementation?
[00:10:36] Unknown:
The thing that when when I started writing RXPy, I I basically, ported most of the RX JS code. And, that was, basically, it was quite easy. You could do it line by line. And, the thing that you you use in Python is is, is what's called the higher order functions. So you need to be able to have functions that you can pass around as parameters to other functions. And and you also need to have functions that returns functions. Without that, it would be quite harder to to implement RX pi. And, you need to be able to to define functions within functions. And, that is is supported by Python. And, the the only problem is that in Python, 3, you have to declare this.
If you want to reference, an variable from an oth outer scope, you you have to declare it as nonlocal. And that keyboard is not available in Python 2. So it's, since rxpie runs on both Python 23, we couldn't use that. So we had to do a trick by by, wrapping the variables in an array because you can read, parameters from an outer scope, but you cannot change it. But with an with a list, I mean, not the array. You can read the array and you can change the the contents. So you sort of cheat the system a little bit. The other problem is that Rx has, like, 100 and or Rxpires 140 methods on on a single class.
Wow. And this is quite challenging in in Python because you you can only have a 1 class in a single file. And if you wanted to have a 140, methods in that class, it would look quite ugly. So you can say that the the the design is is not that good. But in c sharp, you have extension methods. So it becomes, much cleaner because you can have 1 file for every extension method. So I had to to implement a way to to sort of, create extension methods in Python so you could sort of patch in method into the class.
[00:13:01] Unknown:
So at runtime, you sort of monkey patch a method definition and attach it to the class definition before you know, as it's loading?
[00:13:11] Unknown:
Yes. So now now that we have so many methods, I'm starting to think that I might should probably, load on demand or something. So Mhmm. Because it can take some time to to stitch in all the methods.
[00:13:28] Unknown:
So are there any ways in which the Python implementation of the reactive extensions framework differs from those of other languages?
[00:13:35] Unknown:
Yeah. We there there are not that many differences, but, our expert tries to to to follow, PEP 8. So all the methods will have be lowercase with the separate, separated by underscores. The the the 1 more thing is that, when calling methods, you should, name your parameters or arguments as we call it in part. Mhmm. And this is because in in Rx, there there are several methods, that have, overloads. So the same method can can take different kinds of parameters. And in Python, we don't support, overloaded, methods. So I need to sort of call them with all the possible arguments and then, detect what the the user wanted to do.
We have some some methods will be like method with x or method with x and y, etcetera. And and that can also be, I I don't like that pattern, but I still have operator like a window. I have window with time, window with count, window with time or count. So I I was I'm hoping to clean that up, but I'm not sure how to do it in a good way in Python.
[00:14:57] Unknown:
I think I came across recently a Python library that actually adds the ability to do method overloading. So that might be something to look into.
[00:15:07] Unknown:
Yeah. Okay.
[00:15:08] Unknown:
I'll see if I can find the link to it again. And I'll add it to the show notes and send it to you. Thank you.
[00:15:14] Unknown:
So the project description references the use of link for querying the various data streams that rxpy enables consumption of. I had always heard of link in the context of traditional database queries. What makes link a good choice for stream processing?
[00:15:29] Unknown:
Yeah. Link, which is the language integrated query, that comes from, dot net, used in in c sharp and and perhaps in basic. I'm not sure. That can be written as what is called a query comprehension syntax and also a fluent syntax. So you can have, like, from x in entries where x dot value equals 10, then you can select x.name. And this is quite similar, actually, to what we do in Python with with the generator syntax. In generator syntax, we write the x. Name for x in entries. So, this is about querying collections. And observables can be seen as a future collection. So in a normal list, you you have the values already there and you can query over them.
In observables, you don't have any values yet. You can still make a query and things will happen when the events eventually arrive. So it sounds a little bit scary at first. But if you think about the iterative pattern in Python, you don't have the values there either. You have to go and and and call next to get the value, and then you can do your your filtering or what you what you do. So in our x, we we don't have the next method. We but we have the opposite, which is on next method. So things, reprocess the same way as you do with a with a normal collection.
[00:17:08] Unknown:
That's really interesting. So I presume though that, like, you know, when you're like you said, when you're processing a regular list of finite size, there are assumptions in play around, you know, the end. Like, you can say length and things like that or or, you know, you can say dot count or something. But for a stream, you can't make those assumptions. Does that affect the style of of the way you you you tend to solve problems and write your code?
[00:17:38] Unknown:
Not really, actually. Even with the with the generator, you you you're never sure if it will, terminate or not. It may terminate or it may not terminate. And it's the same with the stream. It might terminate or or might go on forever. And then, in RX, we have this callback called the on complete, that will signal the the end of of the of the stream, which is sort of the same thing as you would have with a stop iteration, exception in in an iterate.
[00:18:14] Unknown:
So I mostly hear about reactive x in terms of UI design, but the project description seemed to indicate that it was much more generally useful. What are some of the less common and more interesting problems that RXPy lends itself to solving?
[00:18:28] Unknown:
Yeah. You can actually use, RX pi for UI design. I came across a project where a person, made an, an MVC application for PyQt using RXPy. But this problem is not not the most common. And, if you look at RX JS, which is sort of the hottest Rx implementation, it's all about the UI and the user events. But, Rx can also be used the service side. And, companies like Netflix uses RxJava, client side, and they and they sort of expose their API using observers So you can observe service side strings And I think also Couchbase exposes API using observers.
And, Microsoft actually, uses Rx for submitting queries to Cortana. So whenever you ask Cortana for something, if you say to Cortana, Cortana remind me if the stock price for MSFT increases with, $1, Then the client will compose an Rx query. It will generate an expression tree or an AST tree of that. It will be serialized to JSON, and it will be submitted to the Cortana data center. And the data center will compile that AST tree again and activate that query in the data center. And then you will get a notification whenever an event triggers.
[00:20:10] Unknown:
That's really interesting. It's a it's a great real world example too using Cortana. That's definitely something that that anybody can can understand.
[00:20:19] Unknown:
So in the overall reactive extensions framework, what elements have you found to be the most useful, and also which of them have you found to be the hardest to understand in terms of observables or, you know, some, like, some of the map or flat map functions?
[00:20:38] Unknown:
Yeah. As I said earlier, there can be quite, a steep learning curve. And and, the easiest operators like, like, select and then where, which we call, filter and map. They're they're quite easy to understand. But if you go to, to, operators like, select many, which which takes an item and returns an observable of an item. No. It takes an observable and the function which takes an item and an observe. If you take, like, select many that, takes a function that takes an item and returns an observable of an item and then returns an observable of an item, then it becomes quite complicated. And especially if you're working with observable of observable streams.
So if you're using the window operator, the window operator will produce observers, which you can then, flatten within a select menu operator. Then it can be quite tricky to understand. It takes some time and you have to work with it. I I I didn't understand it myself in the beginning and and that was, was the reason why I started implementing this, library in in in Python So I was actually using it in in dotnet and I I couldn't understand things. So so, the best way is to is to implement it yourself.
[00:22:09] Unknown:
That's a very good and very interesting motivation, particularly for something as conceptually complex as the reactive extensions framework. So we all appreciate your effort on that front. Thanks.
[00:22:21] Unknown:
So are there any questions that we haven't asked you that that you think that we should have? Like, things that we didn't cover that you think our our view our listeners would find helpful or areas that you would like help with on Rx pipe that people can pitch in?
[00:22:36] Unknown:
Yeah. There's always, the the sort of need to to to stay current with what's happening out in the the Rx world. Rx Java is moving pretty fast, and I'm sort of having problems with keeping up with all the new operators. And there's sort of a new operator every week, I think. And, it it would be nice if if people wanted to to, submit, pull requests. And, I have already, received, many pull requests for new operators and for schedulers and and, example code, which is really good. And and, I think I've actually approved every pull request ever submitted. So, it's if anybody wants to to help out, I can promise that, the the code will be included.
[00:23:30] Unknown:
That's great. At this point, why don't we move on to the picks? So, like we mentioned previously, this is where we mention things that we find interesting and we think our our listeners might find interesting. Tobias, why don't you take it away?
[00:23:46] Unknown:
Alright. So my first pick this week is a project called icdiff, which is a diffing library written in Python that is used on the command line and just gives you a really nice output. So rather than a lot of diff programs that have all of the diffs inline vertically, this 1 actually splits them out horizontally, somewhat similar to what you can get on GitHub. So it's very useful for using in conjunction with git on the command line when you're trying to diff the changes in your source files or even just using it as a regular diffing tool between 2 files outside of Git.
So that's 1 that I've been using for a little while. My next pick is a new game that I picked up recently called Timeline. It's a card game that has some images and a short description of a certain could be an invention or an event in history. And on the other side, it has that same information plus the year that that thing happened. So it might be the invention of the theory of relativity and then another card might be the invention of stone tools. And what you have to do is you can play with between 28 players and everybody gets a stack of cards in front of them. And you have to try and place your cards at the appropriate spot in the timeline. So as you add more cards, it gets harder and harder to get it correct because you have to make sure you place it between the right years and events. And so it's just a really interesting way of learning a little bit more about when things happened and some new things that you may not have known about having happened before.
And each game has a different theme. So there's 1 for inventions. There's 1 for world history. 1 for American history. Movies and cinema. And so you can actually mix the games together to make it even more interesting and complicated. So I've been enjoying playing that recently. And my last pick is going to be the digital art created by griatch who was our guest on the avenia episode. And I started looking at the artwork that he has, posted on DeviantArt, and it's all just incredible. So I definitely recommend that you everybody take a look at that and just marvel at how talented he is. And with that, I will pass it to you, Chris.
[00:26:13] Unknown:
My first pick is an emacs mode called elpy. I had previously been using and liking Anaconda mode on Tobias' recommendation for writing Python, but e lpy just makes it it's just a much smoother process. It has a really great installation and configuration tool, which, is a big selling point because getting your, Python, you know, configuration going for emacs can be tricky with the auto completion tools and things like that. They're all very twiddly, and ELPI l LPI just really makes the whole thing flow very smoothly. Great UI. It has a lot of great features, auto peppy, auto imports, really great sort of 1 time syntax checking.
It's just it's just a really, really great, great tool. It's it has improved my quality of of Python coding life significantly.
[00:27:06] Unknown:
I I I recently switched to it under your recommendation, Chris, and I have definitely been enjoying it. Cool beans. Glad to hear.
[00:27:13] Unknown:
So my next pick is a tool called s shuttle. So this thing is great. I mean, yes. And I can hear the hardcore geeks in the audience saying, but you can do this anyway with SSH. And, yes, you can and ton devices. But s shuttle just makes it really easy and convenient. What it does is basically, it will take your entire networking. All the networking that you're doing on a given machine. And forward it through an SSH tunnel to another box. Which is really handy if you're in a very restrictive environment and you need to, you know, get out to some port that's not supported.
You can just SSH tunnel your way into the clear, if you have a VPS or something, and away you go. It's it's really very, very easy to use, works really well. The performance is surprisingly good. I thought that there would be a significant lag from, you know, encrypting everything over an SSH tunnel. But, really, it's it's it's barely even noticeable that you're that you're, using this tool. So it's I've been very impressed and would would highly recommend it. And my last pick is a beer. It's Chimay Grand Reserve. And I'm sure I'm not pronouncing that right because there's Naccentigu there. But, in any case, it's it's really good. I'd I'd had Chimay any number of times throughout the years and it's tasty. But, this is really, really something special.
Even the, the, the nose from the, you know, the foam of this stuff is, is really smells great and tastes delicious. It's just a really, really great light, you know, Belgian, on the lighter side beer. And that's all I have. Dag, what do you what kind of picks do you have for us?
[00:28:58] Unknown:
Yeah. I recently came over this, Python library called ASTAR, which is, for, for traversing and modifying abstract syntax trees. And that is something I've actually started using in Rxpy recently in in a sort of prime private branch because I'm I'm trying to to implement what's called a cube servable, which is, sort of the the data version of an observable. So it's an, observable, of an expression tree. And, and then you have, code as data, and you need to to dump this and see if it makes sense of of things. So this library made things a lot easier for me to when doing this.
I I also came across a a really exciting book when I I visited Redmond, in Bellevue recently. And that was a book by, a woman called Eugenia Chang, which is called how to bake pie. And it's sort of related to food, but it's actually a book about the mathematics of mathematics. And it sounds scary, but, but it's sort of it she tries to to to describe the mathematics behind Lambda calculus and and functional programming. And, this is what's called the category theory, which is is quite scary topic. And I've tried to read several books about this, and I I sort of, had to give up on page 5 of, all of them. But this book looks really nice and and easy to to read. So I hope to finally I I can start to understand this topic because there seems to be a a strange connection between what's called, pull and push collections and sync versus async, code versus data.
And and it's all sort of tied up to this, strange mathematics. And I I sort of hope to to start learning about this.
[00:31:05] Unknown:
Yeah. I was just about to say, I I'm looking at the the Amazon page for the book. And first of all, it's rated 4 and a half stars, which is something, and it it looks really great. Thank you. I'm I'm definitely gonna get a copy of that.
[00:31:17] Unknown:
Okay. That's what what that's all I had.
[00:31:20] Unknown:
Excellent. So for any so for our listeners who want to keep in touch with you and follow what you're up to both in your professional life and in your work on RXPI? What would be the best way for them to do that?
[00:31:34] Unknown:
Yeah. On RXPI, go to the GitHub page, and and you can submit, full request. And and, I'm not sure if my email address is out there. But, you can find me on on Twitter, d brattli. Should probably try to spell that. D b r a t t l I. Or if you, hashtag something with, with, reactive x, I will see it.
[00:32:10] Unknown:
Excellent. Well, we really appreciate you taking the time out of your day to join us and talk about Rxpy, and I'm sure our listeners will enjoy learning more about it. So, again, we appreciate that, and I hope you enjoy the rest of your day.
[00:32:23] Unknown:
Yeah. Thank you. RXPy has just been a hobby project for me, and it's, really exciting more people start using it. So, I hope so. Thank you.
Introduction and Host Information
Interview with Dag Bradley: Introduction and Background
Overview of RXPy
Use Cases and Problem Domains for RXPy
Integrating RXPy into Existing Codebases
Documentation and Learning Resources for RXPy
Comparing RXPy with Other Tools
Testing and Debugging with RXPy
Python Features Leveraged by RXPy
Differences in Python Implementation of Reactive Extensions
Using LINQ for Stream Processing
Real-World Applications of RXPy
Challenges and Learning Curve of RXPy
Community Contributions and Project Needs
Picks and Recommendations