Visit our site to listen to past episodes, support the show, and sign up for our mailing list.
Summary
Because of its easy learning curve and broad extensibility Python has found its way into the realm of algorithmic trading at Quantopian. In this episode we spoke with Scott Sanderson about what algorithmic trading is, how it differs from high frequency trading, and how they leverage Python for empowering everyone to try their hand at it.
Brief Introduction
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- Subscribe on iTunes, Stitcher, TuneIn or RSS
- Follow us on Twitter or Google+
- Give us feedback! Leave a review on iTunes, Tweet to us, send us an email or leave us a message on Google+
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show you can visit our site at pythonpodcast.com
- We are recording today on December 16th, 2015 and your hosts as usual are Tobias Macey and Chris Patti
- Today we are interviewing Scott Sanderson on Algorithmic Trading
Interview with Scott Sanderson
- Introductions
- How did you get introduced to Python? – Chris
- Can you explain what algorithmic trading is and how it differs from high frequency trading? – Tobias
- What kinds of algorithms and libraries are commonly leveraged for algorithmic trading? – Tobias
- Quantopian aims to make algorithmic trading accessible to everyone. What do people need to know in order to get started? Is it necessary to have a background in mathematics or data analysis? – Tobias
- Does the Quantopian platform build in any safe guards to prevent user’s algorithms from spiraling out of control and creating or contributing to a market crash? – Chris
- How is Python used within Quantopian and when do you leverage other languages? – Tobias
- What Pypi packages does Quantopian leverage in its platform? – Chris
- How do the financial returns compare between algorithmic vs human trading on the stock market? – Tobias
- Can you speak about any trends you see in the trading algorithms people are creating for the Quantopian platform? – Chris
Picks
- Tobias
- Chris
- Scott
Keep In Touch
Links
- QGrid
- SlickGrid
- Jupyter Hub
- Light Table
- CodeMirror
- Cython
- PyData NYC Talk by Scott
- Blaze
- Dask
- Theano
- TensorFlow
- Zipline
- Pyfolio
- PGContents
- SQLAlchemy
- Gevent
- quantopian.com/lectures
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast.init, the podcast about Python and the people who make it great. You can subscribe to our show on iTunes, Stitcher, TuneIn Radio, or add our RSS feed to your podcatcher of choice. You can also follow us on Twitter or Google plus and please give us feedback. You can leave a review on iTunes so that other people can find the show, send us a tweet, send us an email, leave a message on Google plus or you can leave a comment on our show notes. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. For details on how to support the show, you can visit our site at pythonpodcast.com.
We are recording today on December 16, 2015, and your hosts, as usual, are Tobias Maci and Chris Patti. Today, we are interviewing Scott Sanderson on algorithmic trading. Scott, could you please introduce yourself?
[00:01:01] Unknown:
Hey, everybody. I'm Scott Sanderson. I'm an engineer at a Boston based startup called Quantopian. And what we do is we build tools that allow anyone to do algorithmic investing, in the browser with Python, and most of that is built on a stack of open source software. So much of what I do is sort of built around the sci fi stack and building tools that enable people to do, open reproducible data science, especially in the sort of financial domain.
[00:01:31] Unknown:
Very cool. So how did you get introduced to Python?
[00:01:36] Unknown:
So the very first introduction to Python I had was my senior year of high school, I think, so that would be 2, 009, and I did like the basic intro to computer science class, which is, you know, here is a program, here is hello world, here's loops and conditionals, and all these other kinds of things. And then I went off to college at Williams College, which is a small liberal arts college in the northwestern most corner of Massachusetts, and I didn't do much programming for the 1st 2 years that I was there. So I was a math and philosophy major, actually, so a lot of ideas that are sort of related to programming, and a lot of stuff that I was interested in there was, like, foundations of math and set theory and that sort of thing. So it was a lot of stuff that has sort of intellectual roots that are very tied to programming, but I didn't actually come back to programming until, the end of my sophomore year, and they have all these courses that are sort of like gateway drug courses to trick people into doing programming. So I took a course that was actually on game design, and it was half, like, art class and, like, some very basic statistics, and then they teach you some intro to Java programming.
And so that sort of got me hooked back on programming stuff. And then the next summer, I, ran into John Fawcett, who's the CEO here at Quantopian, and he had just built the very first prototype of Quantopian, and it was like a you know, it was just him, like, working out of his garage, basically, and I got in touch with him because his brother was coaching wrestling with my brother at a school in Newton, and he was looking for people to, like, beta test the first prototype of Quantopian, and he wanted people who had, you know, mathematics, or, like, backgrounds that he thought would be interesting for Quant Finance. And so we got, talking about that, and I was really interested in it, and we got along very well. So I ended up interning at Quantopian that summer, and that was sort of my first real experience with Python. And as a result of that, I really enjoyed that experience. I learned a lot from that, and that sort of convinced me that I wanted to pursue software engineering as a career, and so I ended up going back to school doing a whole bunch more programming, almost ending with a computer science major in addition to the math and philosophy stuff. And then after I graduated from college, I actually was looking at a couple different things. And because I got into programming so late, I originally sort of was debating between going back to Quantopian and doing a couple other things, and I decided to try something new to sort of see how I felt about it. So I actually worked at a small game studio called Demiurge Studios, that is that are based out of Cambridge for about 9 months. And there, I was doing some Python, so I was working on a mobile app, actually a game that was on a whole bunch of platforms. But 1 of the things that it was on was mobile, but it had a back end server that was in Django.
And so, originally, I I was there, and I was working mostly on c plus plus engine level code, but I had this background in Python and especially Python web stuff from my original work at Quantopian, and so I ended up gravitating toward doing that work at Dimiurge. And then after about 9 or 10 months actually, I loved that experience. Had nothing but good things to say about the people at Demiurge, but 1 of the things that I actually really missed was the startup environment and also getting to work on open source software is a big part of why I ended up coming back to Quantopian after that period of time, and I kept in touch with Foss, who's our CEO here, and he sort of convinced me to come back and come work on Quantopian full time. So I've been back for almost 2 years now, so since March of the prior year.
[00:05:15] Unknown:
And do you have any insight into why he decided to choose Python as the lingua franca for doing the algorithmic trading and for exposing to end users?
[00:05:27] Unknown:
Yeah. So, 1 of the things that I think is really unique about Python so, like, the other obvious choices are either, like, a very low level, like, systems programming language. So, like, if you're at a giant institutional quant shop somewhere, the workflow is often that you have people who are like data analysts or who are quant researchers, and what they do is they'll write their algorithms or they'll prototype their algorithms in R or, in, like, maybe in Python or in MATLAB, but in a language that's, you know, just really focused around data science, and then they'll hand it off to the engineering team who will then rewrite their code usually in c plus plus. And so that often has that sort of introduces this whole translation layer where someone has to write the code in 1 language and then translate it into another language. And you can imagine if you're doing sort of a sophisticated quant financial algorithms, there's a lot of opportunities for things to get lost in that translation. And so 1 of the unique things about Python, is that it can sort of bridge the gap between the low level systems programming stuff that you need to be efficient enough, while still providing a lot of nice high level APIs that people who are doing the sort of more exploratory data science work, wanna do. So things like, NumPy and Pandas and the whole sort of scientific Python ecosystem, make it a really compelling, platform for doing for actually implementing, like, the statistical pieces of the quant models, but then the fact that Python's a really great language for interfacing with more low level code, and then also the fact that, you know, if you're doing Python's very unique in that it has both a really robust numerical computing community and also a really robust web programming community, which I don't know of any other languages that really hit the intersection of those 2 things. So, obviously, Quantopian, you know, the main thing is it's a website, and so we have large portions of our stack that are built around some of the Python sort of web framework stuff.
And it's great that we can be able to use Python for sort of both ends of, our stack.
[00:07:32] Unknown:
And for anybody who doesn't know, can you explain what algorithmic trading is and how it differs from high frequency trading?
[00:07:39] Unknown:
Yeah. So I so algorithmic trading very broadly is any, investing or trading where the placement of your orders are is not made by a human being pushing a button somewhere, but it's automated by a running computer algorithm. So generally, what that looks like is you have some program that's running on a server somewhere, and it's getting fed a constant stream of pricing data and other data as well, so things like, company fundamentals or, like, sentiment analysis or all these other, interesting data sources is 1 of the things that people are working on. But the so that's algorithmic trading very broadly, right, is any form of investing where you're where it's a computer that's placing the trades and that's making that's doing the business logic.
A sort of narrow subset of algorithmic trading, which is what gets a lot of press, is this notion of high frequency trading, which is, you hear about, you know, these giant firms on Wall Street that are, like, you know, collocating with the exchanges and building these these fiber optic networks and all this stuff. So these are all people who are the the main distinction between HFT versus just algorithmic trading is that they're trading much, much faster. So their principal advantage is that they're trading, you know, 100 or 1000 of times a second, and they're making decisions on micro or nanosecond frequencies. And so I guess to answer your specific question, so HFT is a specific kind of algorithmic trading which focuses primarily on execution speed and on, market making as the way that it generates returns.
[00:09:10] Unknown:
And can you give some insight into the kinds of algorithms and libraries that are commonly leveraged for algorithmic trading and some of the mathematical principles that are necessary to be able to write effective algorithms for those purposes.
[00:09:23] Unknown:
At the core of pretty much any project in Python that's doing serious numerical work is NumPy, so that's the sort of low level n dimensional array library that lots of things are built on top of. And that provides really that's 1 of the reasons that you can do this work and be efficient enough, is that NumPy allows you to write what looks like nice high level Python code, but then the heavy number crunching happens, in C and highly optimized code inside NumPy, and then it also provides really good interfaces to a bunch of linear algebra routines that are the core of a lot of, the business logic of these algorithms.
And then so NumPy sort of sits at the base of the scipy stack, and then at higher levels, you have libraries like pandas, which is, an and or mostly a 2 dimensional, but it has higher dimensional tabular and labeled data library. So that's really great for it also has really robust support for time series data, which is an important part of any sort of financial analysis. And so those are sort of the building blocks in terms of which people are writing their algorithms. And then the specific challenges that you have for writing an algorithmic trading strategy is basically at any given time, what you're saying is given all the information that I know about the world, do I want to make any trades? And so generally speaking, the way that you bifurcate that problem is you say, well, given, you know, my view of the world and all the data that I have, what are my desired positions? Like, what are all the assets that I wish I could hold if I could just wave a magic wand and say these are the positions that I now hold? And then once you have sort of a desired target portfolio, you say, well, given that that's where I'd like to be and given that this is where I am now, how do I move to point A to point B?' And as you get especially if you're managing a large portfolio, there's a lot of interesting subtleties in terms of, like, how do you go from point A to point B without having a large impact on the market or being reasonably sure that things will execute at the at the prices that you expect them to.
And then with that first piece, right, of how do I decide what positions that I wanna hold or how do I decide what portfolio allocations I have, a lot of that sort of turns into the a big, like, convex optimization problem where you're basically doing this big you're you can think about it as sort of this giant vector space of, you know, the basis elements are each asset in the possible set of equities that could ever or in our case, US equities and soon US futures. But in general, if you're trading, you know, all n assets, say, you can sort of think about it as this, like, very high dimensional space where the unique elements are all the different assets you can trade. And what you're trying to do is filter that down into, usually a much smaller dimensional space, which is the particular portfolio elements that you're gonna have at any given time. And so it's there's a bunch of sort of, canonical examples of, like, simple kinds of training strategies. So, like, very broadly, you have a class of strategies which are thing which are called, like, statistical arbitrage, and basically the idea there is you're identifying some statistical property of the market that you think has held in the past and you have reason to believe will hold in the future. So, like, a common example of this is something like a pairs trade where you say, I believe that, you know, stock a and stock b are correlated. So it's something like, like, you know, McDonald's and Burger King, where the nature of the businesses that they're in, you have reason to believe that if McDonald's goes up, then Burger King might go up, or if McDonald's goes down, Burger King might go down. And so that by itself isn't a trading strategy, but if you have some thesis that says like, you know, I think McDonald's is going to outperform Burger King over some time period, then something that you might do is say, well, I'm gonna buy and I'm gonna go long. I'm gonna buy, you know, $1, 000 of McDonald's, and I'm gonna short, which essentially amounts to buying negative shares of Burger King. And so what that says is that you're not exposed to the movement of the market as a whole. So right. So if you just believe that a given stock is gonna go up, then you're you would just buy that stock. And then if it goes up, and you predict it correctly, you make money. So a more sophisticated strategy is something like if you believe that you have some statistical relationship about the relative motion of 2 stocks, then what you can do is something like a pairs trade where you go long 1 of them and short the other 1, and then if the whole market moves right, if everything just goes up or down by 1%, then the value of your position doesn't change because you have, you know, 1 share in stock A and negative 1 share in stock B, and the gains from 1 cancel out with the losses from the other. But if you're a long 1 and short the other 1, and then just the 1 that's long that goes up, or if they both move up but the 1 that's long goes up more, then the net effect of that ends up being that you make money. And so the idea there is that if you can identify some sort of meaningful statistical relationship between 2 where you can generalize this to sort of larger baskets.
So, in general, if you can identify some sort of statistical property of relationships between various different assets, then you can construct portfolios that will, make money if that relationship continues to hold. So that's sort of 1 very broad class of strategies that sort of gets called statistical arbitrage. And then you have, another sort of more interesting class or not a not a more interesting 1, but a different 1 is, doing things like identifying criteria of stocks that you think are systematically under or overvalued, and that can be based on, you know, their earnings estimates or their balance sheet, or they tend to be more based on like the fundamental attributes of the company.
And so looking at those and trying to understand, like, well, why might some company be you know, undervalued or overvalued, or especially if it's undervalued or overvalued relative to similar companies, if you can identify assets like that. And often when you're doing that, you're doing it on very large, broad universes of stocks and, sort of doing these large scale screens and trying to identify things that you think are systematically over or undervalued and then placing bets based on those, those assumptions or those beliefs. And so
[00:15:34] Unknown:
1 of the other things that you mentioned as factoring into trading decisions is doing sentiment analysis. And so does your platform also provide, various news feeds for your end users to consume to do that analysis in conjunction with the actual raw stock data?
[00:15:53] Unknown:
So we don't directly provide any, so we're not out, like, collecting any data like that. So 1 of the more recent things that we've started doing is we've built a data store, through which various vendors can come and provide their data generally at a fairly substantial markdown from what they normally sell at. So 1 of 1 of the things that I like to talk to people about, with Quantopian is that there's sort of 3 historical barriers to entry for doing quant finance if you're someone who knows how to do it or is interested in doing it. So there's access to infrastructure, so that's things like servers and being actually connected to the market and all the sort of code pieces that you have to have.
There's access to capital, so if you have some algorithm and you can connect it up to the market, then none of that matters if you don't have money to trade with it, and there's access to data, and that's all these sort of different data sources. And so 1 of the things that we've been working on in the last, for the last last 9 months to a year is building out this sort of library of different datasets that people can bring in, and import into their algorithms and use. And, historically, that's data that's been very expensive because their customer base has been, like institutional quant shops and, you know, Wall Street hedge funds and people who have deep pockets and who are buying enterprise contracts. And so if you're, you know, a single individual, most financial data is sort of priced out of what's in a reasonable price range for you, and especially if you're doing it sort of as a hobby or if you're trying to break into the field. It tends to be pretty expensive, and so 1 of the interesting things that we've been able to do with the data stores is because we're a single platform where people can show up and use this data, it sort of provides like a retail market that hasn't existed up until now. And so, you mentioned sort of the sentiment analysis stuff. So there's actually a bunch of vendors on there who they don't provide some of them provide the actual articles, but that's pretty hard to work with because you have to be, like, training your own machine learning algorithms to make some sense of those. So what what many of those vendors do is they aggregate all of this, data, and then their core business is writing machine learning models or writing sentiment analysis models, and they'll produce like a score or like an impact score. So they'll say, you know, on this day, the net aggregate news sentiment for Apple was, you know, plus 1 with an accuracy rating of, you know, some value, and so basically what they'll say is something like, you know, Apple's net sentiment was positive or negative, and we're this confident about it. And so, the data that's sort of been ingested a little bit at that point tends to be the APIs that we're providing right now. I think in the inevitable fullness of time, we may try to give people more access to the raw data, but we've started with data that's a little bit easier to consume and a little bit easier to work with and doesn't require as much deep Right? If you're if you're just taking, like, the Twitter fire hose and trying to turn it into trading information, you need to have an awful lot of different kinds of expertise to pull that off.
[00:18:55] Unknown:
Absolutely. Yeah. And, also, providing the more raw data would necessitate increasing your infrastructure capacity as well for being able to actually perform the training on the subset of data and then actually running the generated models against the remainder data. So I can definitely see how that would prove, even just from an engineering rather challenging. And also, too, having to have the support for those additional libraries, whether it's the NLTK, which I know is quite popular, or any of the other natural language toolkits that are out there. Yeah. It's also like
[00:19:32] Unknown:
even ignoring sort of the engineering challenges of how do I make the data available to you, even I think in some ways harder actually is how do I make an accessible API for that? So my sort of principal role here at Quantopian is API design, so it's sort of my role to think about, like, well, what are the functions that you can call on your algorithm, and what is you know, there's sort of the nitty gritty tales of, like, what are they named, what are the arguments, like, what is that sort of what how do you spell things? But in some ways, more importantly, is, like, what are the underlying concepts that we're presenting to the user?
Like, something that I talk about a lot with people here is this idea that any API that you design is normative in the sense that it is making an argument about how you ought to think about a problem. So when I'm presenting to you an API where, you know, you know, all that stuff I talked about where I said like you can think about a training algorithm in terms of this, like, high dimensional vector space getting mapped down into a smaller vector space of, you know, weights like that, If I present an API where that's sort of the core abstraction, then I'm encouraging you to think about the problem in a certain way. And I think, at least for certain strategies, I think that's kind of the right way to think about the problem, but it's definitely there's a conscious choice being made to encourage the user to think about a problem both in a way that's sort of tractable and effective and then also sort of encourages them to decompose the problem, in ways that they can tackle efficiently or tackle effectively. So I I think even more than the technical challenge of, like, getting the data available and letting people run the models, it's also the sort of conceptual problem. If if we have all this, like, radically unstructured data, how do I give you some uniform API for working with and thinking about that data?
[00:21:13] Unknown:
And for people who are using the Quantopian platform, have you found that there tends to be a sort of common background among them? Or do you find do you have laypeople coming in and making effective algorithms as well?
[00:21:28] Unknown:
I would say there's a pretty broad mix of people. I would say most people who come to Quantopian and are successful have some sort of technical background, but the the particulars of that technical background vary quite a lot. So we have a lot of students, both of, like with math and CS or econ backgrounds or finance backgrounds. We get a lot of people who are scientists by trade, people who do, like, physics or chemistry or biology or math or that sort of thing. We get people who come from finance backgrounds who may or may not have a lot of programming experience, so they're people who are traditionally used to working in Excel or working in a little bit less of like a robust programming environment, and so they'll come in with a lot of financial domain knowledge, but then their sort of hurdle is learning to program in Python. And we do get a fair amount of people who have learned Python or learned to program at all coming into Quantopian. And so there's sort of this spectrum of people with financial knowledge and people with engineering knowledge, and most of our users fall sort of somewhere in between there, where they're strong in 1 place, but maybe not in the other place, You know? And then we get it we do get some people who show up who are like, have a whole bunch of knowledge in both of those places, and we have a we have some people who show up and don't have any knowledge, but are just interested and wanna learn and, work at it and get better at it. But there's definitely sort of a broad range of people who, come and work on stuff here.
[00:22:53] Unknown:
Arguably, I would think that for the some of the finance people that you mentioned with some of their high end Excel knowledge, if you've seen what some people do with Excel between the macros and the VBA and things like that, I mean, some of it's pretty crazy. Like, some of it, I'll grant and warrant that it it is not a formalized programming language in the sense that Python is, but I feel like it at least rivals the complexity. So if they've learned that, they should be able to come up to speed with Python. If not fairly easily, then it should be a fairly tractable
[00:23:24] Unknown:
list. Yeah. Right? Yeah. Yeah. I I think the thing that people who are who come from an environment like that, they, like, they don't struggle with, like, how do I do math or how do I do that, but they struggle with things like not having immediate visual feedback, or with, like, understanding how to structure a Python program in a way that will be efficient. Right? So things you get sort of, in some ways, basic things. Like, you know, if you append to the front of a list over and over in Python, that for kind of low level implementation reasons turns out to be slower than appending to the end of the list, so things things that are more sort of idiosyncratic domain knowledge.
And 1 of the things that I think helps a lot with that and 1 of the things that we focused on a lot early was, building the community around, Quantopian. So, you know, you can come to Quantopian and write algorithms and, you know, create your create your strategies, but there's also a big forum where people can share knowledge and make posts and share IPython Notebooks as a relatively new thing and share backtest, which is especially unique in finance because finance is sort of historically this really secretive, you know, I've got my secret special sauce, and that's how I'm gonna make money, and I can't tell anyone because then they'll steal my idea thing. So 1 of the things that I think is really interesting and unique about the platform that we have is that we've built this place where people can interact and share and learn from 1 another, and you still definitely get a little bit of caginess from some people, especially people who are newer, I would say, where they won't talk about exactly what they're doing, but they'll still ask programming questions, and they'll say, like, you know, I'm trying to do, you know, I'm trying to, you know, do this NumPy thing, and I'm getting a syntax error. I'm getting a index error, and people will come and help them and talk to them about the implementation stuff. And so I think that's a really interesting thing. And also, 1 of the things that often happens there is you get people with programming knowledge and people with finance knowledge, and they meet up, and they team up, and they work on things together, and so you get that sort of collaborative aspect of the community that I think is really interesting.
[00:25:23] Unknown:
Yeah. And, you mentioned the IPython Notebooks. And when you were commenting on the, people with the financial background who are more experienced with Excel, not having the immediate visual feedback, that was immediately what I thought of, of being able to do things in the notebook and actually in line either matplotlib or seaborne graphs to be able to see what's actually happening as they're transforming the data. And also with the Pandas, tabular displays, and, you know, a Python notebook being nicely formatted.
[00:25:52] Unknown:
Yep. Yeah. So that was actually, when I first got back to Quantopian, I did a couple small projects. And then for about 6 to 9 months, my main project was building a hosted IPython notebook platform. And so I worked a lot with the IPython guys directly, and particularly, this was they have a newer project that has come out called JupyterHub, which is essentially a multi user IPython notebook server, and it was designed for a sort of, small scientific computing lab. So they're they were imagining on the order of 50 to a 100 people talking to 1 of these Jupyter hubs at a given time. And the idea there was be, you know, I have a lab at some university somewhere, or I'm teaching a class. Like, 1 of the 1 of the really awesome things about the notebook is it's this great teaching environment.
And so they built this project that lets you basically have a whole bunch of notebook servers, then it adds sort of login stuff and some basic tools on top of it. And so we've, taken that. I'm actually running a cluster of these multi user Jupyter hubs and worked with a lot of the IPython folks to bring features in that support that sort of large scale use case. And so we now actually have a big I I think probably the largest JupyterHub installation in the world, running on a whole bunch of EC 2 instances, and we're dynamically creating Docker containers for people and connecting to them to notebooks. And that's been a really, really, exciting thing to see people take to that because, before that, you know, you could show up and you could write, trading algorithms, but in people who like do quant finance professionally, like it's their full time job, they'll tell you that they spend maybe 10 to 15% of their time, you know, writing trading algorithms where they say like, This is my idea. I'm going to write it, and I'm going to test it. And they spend probably 80%, 90% of their time just sort of exploring data and trying to do research and trying to figure out what ideas are even worth pursuing in the first place, and the notebook is such a great environment for that. So I've been super excited about sort of the and it's also, you know, the fact that you can have the result of a computation be interactive, right? So like 1 of the really cool things about the notebook is that you can write an object whose, like, repr is a piece of JavaScript that then displays dynamically and you can have, like, an Excel table that you can, like, scroll through and move and sort and filter and do all these things. We have 1 of the cool open source projects that we have is this library called QGrid that is our sort of fancy representation for data frames where you say, like, show grid of a data frame, and it renders it using this JavaScript library called slickrid, and that gives you these like beautiful, smooth scrolling things, and you can filter stuff and edit them in place and have it take in your data frame, so it gives you this really interesting free flowing interaction between your code and your data and the representation of your data. So I'm sort of really excited about the next 2 to 3 years of that project and what it's it's gonna do for interactive computing.
[00:29:02] Unknown:
And 1 thing that I'm curious about is in terms of your platform, how interactive it is in terms of being able to develop the algorithms. I know you mentioned the Jupyter Notebooks on the JupyterHub, clusters. But for people who are trying to test against live data, do you provide any sort of in browser execution? Or do people have to sort of lay out the total format of their code and then submit it, en mass to be able to run against the actual live data for testing purposes?
[00:29:35] Unknown:
Yeah. So the so the way it works basically is it a training algorithm on Quantopian consists primarily of 2 functions. So you write an initialize function, which sort of sets up any basic state that you want to have in your simulation, and then you write a function called handle data that gets called every minute, with the updates for that data. And so the first thing that we ever built was backtesting. So backtesting is basically you write these 2 functions, and then you say, well, if I had deployed this algorithm, you know, in 2, 002 and run it up until now, how would it have done?
And so if you go to the IDE, in Quantopian and you you write out your algorithm, and then you can press a button that says, you know, run this from time a to time b, and it'll give you sort of a little streaming graph that'll show you what your returns were over time. And then you can tab over to another thing and investigate those returns more more, closely. And it'll give you there's a whole bunch of standard financial metrics that you can use to evaluate the performance of your algorithm. So, like, as you're developing your algorithm, generally what you're doing is, you know, making changes, running back tests, seeing how it performs.
We also have an in browser debugger, so it has sort of a little Visual Studio kind of interface where you can click on a line number and it'll set a breakpoint, and so you can run your back test with breakpoint set, and then it'll pop up kind of like a Chrome's like JavaScript debugger style UI where you actually have a REPL that you can dynamically investigate the the state of your process at any given time. And then so once you've actually developed your algorithm, you know, and you're satisfied that it's interesting to you, then generally what people do is move to paper trading where you can you deploy it and then it's sort of our infrastructure takes control of it, and we'll actually start feeding it live updates of data every day, and it'll will tell you your simulated returns over time.
And then, obviously, if you're if you're happy with your paper trading results or if you you're convinced that this is actually an algorithm that you wanna deploy, then you can connect it to an interactive brokers account and actually deploy it against capital. And then from the algorithms perspective, 1 of 1 of the really interesting things about, the way that the system is designed is that from backtesting to paper trading to live trading, your code is always the same, which is not true of a lot of, institutional systems where you have this rewrite gap. Right, if someone writes the strategy in R and then somebody else ports it to C plus plus to trade against actual real things, and that's sort of this terrifying prospect, right, of we had this code that worked, and we believe it was tested, but it was written in a totally another language. And then to actually deploy it live, we have to rewrite it in C plus plus and so having a single code base that is, you know, going through all these staging all these pieces of, testing, I think, is 1 of the interesting things that we've we're able to do.
[00:32:25] Unknown:
Yeah. For that kind of scenario, I imagine that having some sort of test harness that actually allows for executing against the different language platforms would be very useful to verify that the actual translation was done appropriately. But, also, I'm sure that people who are focusing more on the actual data manipulation aspect and don't necessarily have as thorough of a, your your more traditional software engineering background might not necessarily be as rigorous with the testing that they do perform, if any. And so for the in browser execution, are you using something like Python Anywhere, or have you built your own sort of homegrown solution to to provide that capability?
[00:33:06] Unknown:
So the execution isn't happening in the user's browser. I should probably make that clear. So, you the IDE is actually a Ruby on Rails app, so the place where you're typing is Rails, and then the front end is CodeMirror, which is the same thing that Light Table is built on. It's a well known project. And then when you press build algorithm, that takes your code and makes a post to a Flask server that then forks off a simulation process and connects it to the to a live data feed, and then that thing streams results back to the browser over a WebSocket. And so the actual code is still is running, you know, in AWS on on 1 of our servers.
[00:33:43] Unknown:
I'm assuming you provide some sort of sandboxing. Are you just using Docker containers for that, or do you have some other sort of mechanism in place? So there's a couple of different layers of sandboxing. So that's 1 of the very interesting and
[00:33:55] Unknown:
challenging parts of Quantopian. Right? It's for many websites, accepting other people's arbitrary code and executing on them on your servers is a pretty good definition of getting hacked. And for us, that's our core business. And so we do sandboxing in a couple different levels of abstraction. So there's operating system level abstraction, so that's things like, making sure that the user's process doesn't access to the file system, doesn't have the ability to, like, make HTTP connections, that sort of thing. And there's also Python mobile sandboxing. So we actually do a lot of static analysis on people's code and, in some cases, even, hopefully, transparent dynamic rewriting of their code, as well as so that that's things like verifying that you can't import the OS module, for example. So if your code contains the statement import OS, then we'll statically fail that and say you're not allowed to use the OS module because it does various unsafe things, and then you can imagine immediately, well, what about eval and exec and all these dynamic language features? So a lot of the, a lot of those dynamic features end up getting disallowed on Quantopian.
And then in addition to that, there's also runtime dynamics and boxing as well for, things that are not necessarily easily catchable at at compile time or as much of a compile time as Python has, and so there's various different layers of static sandbox, direct static Python sandboxing, dynamic Python sandboxing as well as operating system level, isolation.
[00:35:23] Unknown:
Does the Quantopian platform build in any safeguards to prevent users' algorithms from spiraling out of control and creating or contributing to a market crash?
[00:35:32] Unknown:
Yeah. So I I mentioned, you know, I had worked on that JupyterHub project, and then I had done a small project before that. So the first thing that I actually did when I came back to Quantopian was building a trading guards module in Zipline, which Zipline is the project that's actually sort of at the core of Quantopian, which is an open source project. And so what that provides is an API where you can say things like, set max order size or set max leverage or set max position size so that your algorithm, you can basically say, I can never, you know, place an order for more than $10, 000 or I can never place an order for more than 100, 000 shares. 100, 000 shares is a lot, but more than some fixed number of shares or I can never hold a short position, so things like that. So those are things sort of at the user's discretion that are just sort of good housekeeping for preventing logic bugs. So you can imagine, you know, if you accidentally write while true order 1 share of apple, very quickly, that ends poorly for you.
So you as a user would just want to write you add guards like that that would prevent you your algorithm from sort of spiraling out of control. And then as far as you you had mentioned, like, preventing users' algorithms from contributing to a market crash. You at the levels of investment that people tend to be running on Quantopian, you would have to it would be pretty hard to have, like, a truly large global impact on the market. You know, maybe if you're trading some very thinly traded stock, you could have a material impact on its price, but most people are trading on the order of tens or maybe 100 of 1, 000 of dollars, where the things you hear about in the news are people trading, you know, 1, 000, 000 or 1, 000, 000, 000 of dollars, and they're also trading at a much higher frequency. So we talked a little bit earlier about that distinction between high frequency trading and algorithmic investing, and everything on Quantopian is happening at most minute resolution. So basically, your algorithm gets its handle data called once per minute, and then it gets to make decisions there. And so the things you hear about, like, flash crashes or something like that where, you know, all these algorithmic training bots are running amok tend to be things that are operating at much, much lower frequencies and that are all often sort of trying to follow 1 another and trying to mimic what each other are doing.
And so it's much harder to be like, the kinds of algorithms that are really feasible to write on Quantopian tend not to lend themselves to that kind of extreme failure mode.
[00:37:56] Unknown:
That's that's great. It really makes me wonder then how I guess when you look at the really sizable crashes, it wasn't just, like, 1 trading algorithm. Right? It was all the trading algorithms kind of joining hands and doing this horrid death spiral leap of doom. We saw the the big graph for the stock market just take this giant plunge. Mhmm. So that's good to know that it it would be really tough for 1 user at least to trigger that kind of thing. Yeah. I mean, like, the reality rate is, like, you know, if you're trading
[00:38:29] Unknown:
even a couple $1, 000, 000, like, that's such a tiny fraction of the market. You know, you basically have to be either a huge institutional fund or, like you said, you have a whole bunch of algorithms that are implicitly correlated or that are all sort of secretly doing the same thing to have sort of the kind of major market impact that you're talking about.
[00:38:51] Unknown:
And so earlier, you mentioned that part of the web interface for people to develop their code in is actually a Ruby on Rails app. And so I'm wondering in which other places you leverage languages other than Python and for what reasons you came to those decisions?
[00:39:09] Unknown:
Yeah. So I would say probably 60 to 70% of our stack is Python, depending on how you measure. So there's there's Python is sort of the most of the core services. The front end is Ruby on Rails and JavaScript. We also for a lot of the performance intensive pieces of, of our Python stack, we drop into Cython, which I don't know if you guys have worked with Cython at all, but it's basically sort of a a Python like language that then gets compiled directly into Python c extension modules. And so it's a really nice language for taking pieces of Python code that have become performance bottlenecks and transforming them into much more efficient code. So some of the the really performance intensive pieces of Zipline are have been ported to Cython, and there's a couple of pieces that are, I think, just in straight C.
We have a couple services that are in Go, and I think we have 1 script in Haskell, which the person who wrote the Haskell script is very proud of. It's a script for parsing a data vendor's documentation. It was written as a Haskell Alex Lexer.
[00:40:18] Unknown:
It's really funny. It it seems like, you know, everybody sort of says, well, Haskell's not useful for writing code in production. But what I've seen now is a number of different companies finding these really interesting kind of niche cases where y s, in fact, not only is it useful for writing code in production, it's much better at it than some of the stock procedural languages that we're all used to because the particular problem domain really lends itself to Haskell's gifts.
[00:40:49] Unknown:
Totally. Yeah. I mean, so I'm I am, actually, finance is 1 of the places where functional programming has, like, taken a big hold. I would say the big player there is OCaml, so there's a lot of big, institutional quant shops whose or even just, like, in, financial institutions have that are running a lot of OCaml code, but Haskell is also a big 1 there. And so there's a there's a couple of pieces of that. 1 is that, pure functions are really easy to parallelize, and they they're your compiler can do a lot of work for you, both from an optimization perspective and from a correctness perspective. And, you know, you can imagine writing code in the financial domain, like, correctness is a thing that really, really, really matters a lot because the failure mode for writing a bug is, you know, someone's money goes to the wrong stock or someone orders 10 times as much as they want it to. So you really wanna make sure that your logic is ironclad, and so things like Haskell that can provide really strong static guarantees about the behavior of a program, or that can, you know, encapsulate at compile time things like you're correctly handling errors of various sorts are potentially really powerful. I actually so I gave a talk at PIData in New York maybe a month and a half ago, about a new wing of Zipline that I just added, which is basically this sort of little DSL for describing computations on trailing windows of financial data.
And 1 of the subsections of that talk was called symbolic computation is eating the world. And basically, the argument there was that increasingly what we're seeing in Python is that the high level interfaces to perform a numerical code are doing symbolic computation or deferred computation, which are sort of bringing back a lot of the traditional benefits of having a compiler where you have you're separating the representation of a computation from the execution of a computation. Right? So if you're working with NumPy or you're working with Python dictionaries or Python lists, right, you have the data right at your fingertips and you can do whatever you want with it, and so there's this nice immediacy to it, right, where you can do exactly what you want, and you get to think. Even in Python, sort of you get to know exactly what's happening to your data.
The flip side of that is you have to know exactly what's happening to your data, and you have to tell the computer exactly what's happening to your data. And so what we're seeing now in libraries like, Blaze and Dask and Google just released TensorFlow as their big new machine learning library and things like, Theano, and there's this whole sort of litany of libraries that have sprung up or become much more popular in the last 2 to 3 years. And this the thing that they all share in common is this idea that they're separating the representation of a computation where by that what I mean, like, you build up in memory, you're not working directly with your data, but you're working with like expression objects where you have something that represents like an abstract array and you do myarray plus myarray plus myarray, and if you actually had a NumPy array, then Python has to go execute that, and 1 of the consequences of that is you're constantly creating and throwing away temporary arrays, for example, because if I have a big gigabyte size array and I add it to itself and then add it to something else, well, I didn't actually need to allocate a whole another gigabyte array and then immediately throw it away.
And so 1 of the things that you can do when you have the symbolic computation layer, right, is you get to see the whole expression at once, which is a thing that you never get to do in standard Python, where everything is just getting interpreted on the fly. So if you have an abstraction layer where you can say, well, here's the entire expression that I want to compute, and then you feed that into some execution engine, then you get a bunch of really nice benefits. So 1 of those things is, like I said, you know, you can do optimizations on it that you can only do if you see the whole expression. And the other thing you can do is when you've separated the representation of computation from the execution of computation, well, then you can mix and match the executions, right? So you have libraries. 1 of the things that 1 of the libraries that a lot of people are excited about recently is this library called Dask, which is a library for doing parallel computations or out of core computations on NumPy or Pandas, and there's a new arc of it called distributed that, Matt Rockland, who's a guy at Continuum Analytics, has been sort of building out as the newest wing of that. And 1 of the cool things about that is you build up these abstract computation objects, which internally are basically represented as, like, s expressions, sort of Lisp style. And then what you do is you build up with these abstract array objects your expression and then you feed it to some executor to get run. And the cool thing about that is you can there's, like, a single threaded executor, but there's also a multi threaded executor, and there's a multi process executor, and there's even this distributed executor where you say, take my expression and farm it out to my giant cluster of machines and compute all the pieces of it in parallel, and then cobble them back together and send them back to me.
And so 1 of the things that I think we're going to see more and more of in Python, especially for scientific and high performance computing, is this idea that you're not going to work directly with the data, but you're going to work with more abstract symbolic representations of the data.
[00:46:01] Unknown:
That's interesting. And it it also occurs to me that, in addition to that level of abstraction and symbolic sort of representation that you're talking about, which has such huge wins when you're manipulating such large datasets, it occurs to me that with things like Haskell and I realized some of these kinds of libraries exist for Python as well. Some of the testing capabilities, like you can kind of prove the correctness of your program, could be really handy where actual money is involved.
[00:46:30] Unknown:
There's also the 1 of the big benefits is if it fails, you learn earlier. Right? So if you're gonna do some giant, you know, data crunching, algorithm, right, that's gonna run for 5 hours on your cluster and do all this numerical stuff. And then the last step that it does is add an integer to a string and it crashes. It's really frustrating if you're doing that in Python, right, and you go through all that work, and then the way the Python works is you don't learn about the failure until it actually happens at run time. But if you've built up this abstract expression object, and then you said executed it, well then you can have some sort of some, like, type checking layer, right, even in Python that says, hey, you added an expression of type integer to an expression of type string. You're not allowed to do that. That's gonna crash when I try to execute it. Right? So, like, 1 of the traditional benefits of a compiler is that you catch certain kinds of errors before you ever even run your program, and that's what I think sort of the same kind of thing you're talking about with Haskell, right, where you get these strong guarantees where if you're if we ever started to execute your program at all, then you have certain kinds of correctness guarantees that come out of it. And obviously, you still have to verify that the actual core logic is correct, and then you did, you know, you didn't subtract where you meant to add or something like that. But preventing yourself from making certain kinds of category errors, is a really big win sort of in terms of your productivity because you learn about errors faster.
[00:47:49] Unknown:
Yeah. And I think that that's also why there's been such a recent upsurge in both static languages and also in optional and progressively typed languages and layers on top of dynamic languages, even including the recent type module for Python 3.5, which has certainly caused some, some contention in certain areas. But I think that because of the fact that it is optional, it still provides a lot of potential value. Particularly in terms of being able to provide more effective static analysis of your source code. As long as you provide, accurate type annotations.
[00:48:26] Unknown:
Yeah. I think that I'll be very interested to see how that plays out because part of it, I think, is that this the the type annotation stuff is really only as valuable as if it's it's really only truly valuable if it gets adopted. And so I think it'll be interesting to see whether I think the big challenge for that in a lot of ways is gonna be the Python 2 3 compatibility stuff because the annotation syntax, I don't think is legal in Python 2 for it. And so if you have libraries, you know, there's this large community of libraries that are trying to do single source Py 23 compat stuff. And if you want to leverage the type annotation system, then although I guess you you have the interface file, right, you have those the, like, system, then although, I guess, you you have the interface file. Right? You have those the, like, stub files, so you don't actually have to put them in source. Exactly. And I think that that was a big reason for Guido adding that capability as well is so they was a big reason for Guido adding that capability as well is so that you can still have the 2 to, you know, 2 and 3 compatible libraries and Mhmm. Not having to directly modify the source while getting some of the benefits of the optional typing. Yeah. The Guys, is this type hinting? Is
[00:49:32] Unknown:
this what we're talking? I just wanna make sure I understood what you were referring to. Yes. I was just gonna say in, Python 3.5, there's a new type module where you can provide type annotations in function definition. The intended purpose is for IDEs and static analysis libraries to be able to provide ahead of time checking to make sure that the types of the arguments are appropriate both for the arguments to the function itself as well as the return values.
[00:50:00] Unknown:
I was just curious if this was the same thing that Guido gave the talk on at Python 2015 Yes. Where he basically said, yeah. Okay. Good. You know, there's kinda no point for raised hackles because he he made it very clear, like, this is a completely optional concept. It's mainly intended for companies with ginormous code bases like Google and the like, which, as you said, need to perform better static analysis, and maybe IDEs if you choose to use it and the IDEs support it. But, you know, he made it very clear that this feel like this was in no way, shape, or form. You know, we're not gonna make Python a typed language all of a sudden. Right. And it's also an experimental module as well. So it's subject to future modification.
[00:50:44] Unknown:
So the current API is subject to future evolution.
[00:50:48] Unknown:
So what PyPI packages does Quantopian leverage in this platform? I know you've mentioned a few already, but let's just sort of round out the list if we can. Yeah. So we we maintain a couple. So Zipline is our sort of big flagship 1, which is the the backtesting engine that's at the core of Quantopian, and that's what provides all the APIs that I talked a little bit about. And then it has, classes and stuff that let you import your own data If you wanted to run an algorithm sort of in the Quantopian style but run it on your own machine, then Zipline is sort of the place where you do that, and the caveat there is you have to actually have your own data as well.
And then a more recent library that we released maybe 5 or 6 months ago is a library called Pyfolio. That's a visualization and algorithmic or it's a library for assessing the performance of trading algorithms, so if you take basically, like, the return stream and the positions and the transactions and all the sort of exhaust of a backtest or of a live trading algorithm, you can plug it into PyFOLIO and it does a bunch of cool visualizations that let you get a sense of what your algorithm is doing and like which positions it's holding, it's making money, or which positions are not making money, or how it did at various points of like turbulence in the past, that sort of thing. And then we have a couple smaller libraries. So I I have a library called pgcontents that, does IPython notebook storage and postgres instead of on your local file system.
We have that library QGrid I talked about that's for data for data frame representations. So those are the ones that we maintain. And then obviously, I talked a lot about being built on top of like NumPy and Pandas and SciPy and Matplotlib and Seaborn, so all these sort of standard scientific computing libraries for Python. And obviously, IPython and Jupyter, that whole computing framework is what the whole research platform is built on top of. And then some other ones that we use off the top of my head, SQLAlchemy for talking to SQL databases.
We use Flask for most of our Python web stuff, and gevent for the, gevent and some stuff using I don't know if it's pronounced g unicorn, or I I tend to read it as gunicorn because I find that word hilarious, but that's that's the web server stuff. There's also an interesting 1 that sort of ties back to that symbolic computation theme I talked about a little bit is 1 of the libraries from Continuum Analytics or the people who make the Anaconda Python distribution. There's this library called Blaze, which is this sort of abstract symbolic interface to lots of different data sources. And so the idea with that is you build up these, like, Blaze expressions, which are just sort of these abstract objects, and you say execute this Blaze expression against these NumPy arrays or against this data frame, but it can also do more exotic stuff. So you can say execute these Blaze expressions against a SQL database or against CSVs on disk or against, like, KDB plus or all these other crazy things.
And so we use that for some of our internal data loading infrastructure, and then we have some of our newer APIs, especially in that research environment, are built on top of Blaze. And we have 1 of our engineers here as 1 of the core contributors and maintainers of that library as well.
[00:53:57] Unknown:
And so sort of bringing it back to the beginning, how do the financial returns compare between algorithmic versus human trading on the stock market?
[00:54:06] Unknown:
So that's sort of an interesting question to answer. So 1 is, it's not there's no, like, obvious way to say like, oh, well, these returns came from like, if you're just looking at all the trades that happen on the stock market, like nothing publicly tells you, you know, these were placed by a computer, and these were placed by a person. I think increasingly also, like, the distinction between algorithmic trading and human trading is sort of getting blurred because there's like, no 1 who's doing trading isn't using a computer for some part of it. Right? You're always whether you're using Excel with your models by hand, and then it spits out a thing that says you should buy the stock and then you buy it, or you're running a computer, but you can send it, you know, manual messages to say, buy the stock or another thing. Like, the the distinction between algorithmic trading and human trading, let's say, is sort of getting increasingly blurry, I would say.
And so part of me wants to say, well, that that question doesn't necessarily make sense increasingly. But in in addition to that piece, I think it's also actually quite hard to answer that question because, like I said, these, financial trading firms are historically super secretive about what they do and how they execute, and so it's actually quite hard, I think, to get good data about the returns of, you know, human driven investing versus algorithmic driven investing, even insofar as you can make that conceptual distinction. In some ways, I just I answered your question by not answering it. No. That that's totally fair and very valid. And as you said, it's
[00:55:40] Unknown:
as in anything, technology is increasingly blurring the line between what a individual human is capable of and what is, facilitated by the technology. We are gradually turning into the Borg.
[00:55:56] Unknown:
Yeah. So can you speak about any trends that you see in the trading algorithms people are creating for the Quantopian platform?
[00:56:03] Unknown:
Sure. Yeah. So a lot of that so 1 of the things I talked about a bit earlier is this idea that we are always making these sort of normative decisions about how people should think about writing training algorithms and how what kinds of algorithms people should be writing. And so, a lot of what drives the kinds of algorithms that people are writing are the kinds of tools that we give them. So if you look back 2 or 3 years ago at what was possible in Quantopian or in Zipline, and there was no basically, the way it worked is you had to explicitly say ahead of time, these are the assets that like, these are the equities. You know, these 10 stocks or whatever are the things that I'm trading, and then you could only get pricing history for those 10 stocks, and then you could only place orders against those 10 stocks. And very quickly, you know, what we learned was that more sophisticated trading algorithms tend not to say, these are the stocks I know about ahead of time, and these are what I wanna trade. But what they'll do is use more interesting criteria to decide, you know, based on all the stocks that I could ever see or all the stocks I could ever know about, what ones do I actually wanna trade today, and what ones do I actually wanna care about.
And 1 of the things that we've done as we've sort of evolved as a company, 1 of the things that we started doing is trying to build the Quantopian hedge fund. And the idea there is 1 of the sort of historical barriers to entry I talked about for Quantopian existing is access to capital. So we have all these users, we have this great community, we have all these people who are building interesting trading algorithms, and for many of them, the barriers are or 1 of the barriers to actually trading is not having the capital to deploy to their algorithms. And so what we've been doing is or what we're working towards and hoping to build and to become our core business model is building a fund where we can raise institutional capital and then offer it to users on Quantopian and basically say, you've come to Quantopian, you've written this great algorithm, and we want to offer you capital to run your algorithm and we'll pay you a percentage of your performance fees, and then we'll, you know, whoever's money it is that's actually trading will get a portion of the returns, and then we'll take a portion of the returns for building the infrastructure and connecting the capital to the algorithm.
And so 1 of the things that we've done as a result of that becoming a focus is we've really tried to encourage people or make it possible to build tools that can handle these large universe style algorithms. So, for a bunch of, reasons, it makes sense for us to try to encourage people to write algorithms that hold many different positions at different times. The big 1 there is basically it gives you more reason to believe that it's actually correct and not just lucky. Right? So if you imagine we're looking at this whole universe of possible algorithms and someone has an algorithm that had really, really good returns and it turns out that their algorithm just happened to buy Apple in 2003, right, well, they're going to make a lot of money because Apple did a whole bunch of their Apple's value went up a whole bunch in that period of time, but that doesn't necessarily give us any reason to believe that we should invest in that algorithm if all it did was just happen to buy Apple at the correct time once. Whereas, if you have an algorithm that every day is based on some criterion, you know, choosing a different 100 names every day and investing, you know, allocating its capital toward those 100 names and then holding that position, you know, maybe for 2 weeks or a month or more or less than that amount of time. If you're making lots of independent bets and consistently outperforming the market, then maybe that gives us some reason to believe that there's actually something worth investing in in your algorithm. Algorithm.
And so we have very consciously chosen to encourage, or not maybe not encourage, but we've chosen to develop APIs that let people do, let people trade with larger universes and let people look at larger, broader amounts of data. And part of that is just sort of the natural evolution of of the platform technically. So when we first launched, we only sort of had the technical capacity both because, you know, servers have gotten faster and because it was a simpler problem. We only sort of had the ability to stream your algorithm 10 stocks worth of data. And as we've built out the platform over time, we've increasingly given people the ability to look at more and more data over time, and as a result of that, they're able to trade strategies that incorporate more and more data and that look at more and more stocks.
And so that's sort of the biggest trend that I've seen over the last year and a half is looking at more data, looking at more kinds of data, and looking at larger universe sizes, and incorporating those into their universe in a more dynamic way. So not just saying ahead of time, a priori, these are the things I care about, but rather sort of defining algorithmic criteria to decide what things your algorithm actually wants to trade.
[01:00:47] Unknown:
So this is just idle curiosity on my part. I think we've gotten a pretty good idea of what kind of technical knowledge is required to get started writing trading algorithms on on the Quantopian platform. What kinds of resources are required? How much money do you have to put into the till in order to be able to start writing algorithms and seeing how they they work?
[01:01:10] Unknown:
The actual amount of money you have to put in is $0 right now. So all the stuff I just talked about is all free services. I think I think you meant that a bit more metaphorically. But so we, you know, it depends what your background is, I guess. You know, if you come in and you're 1 of those people that has a ton of finance knowledge, then your challenge is gonna be picking up Python. If you come in as 1 of those people who knows a ton of Python but doesn't know the finance, then you have to sort of pick up the other side of it. 1 of the things we've started doing in the last maybe 9 months or so is we actually have a full time engineer whose job is is building, tutorials. So if you go to quantopian.com/lectures, I think. Uh-huh.
There's a whole lecture series that's basically going from, like, what is a trading algorithm all the way through, like, pretty sophisticated portfolio optimization techniques, and they're all done as IPython notebooks, and they've got references and diagrams and pictures and code samples. Like, the thing that I love about the notebook, right, is it lets you sort of interleave code with narrative, and so it's this really effective teaching tool. And then on top of that stuff, there's videos that go with them that, Delaney, who's the engineer who's been working on all this stuff, his background is mostly in statistics and computer science. So he's done a really good job of, building these sort of tutorials for teaching people the basic tools for this.
[01:02:31] Unknown:
That is very cool. Because I think in a sense, Quantopian, although it leverages all these other technologies, it really is kind of a technological platform unto itself. And and it's like the challenges in getting people to adopt Quantopian seem to me to be very similar or if not identical to the challenges involved in easing adoption for people really of any technology or API or framework that you might be offering. So I think it's really super important and that this is this is a drum that I beat at every opportunity and and very cool that you guys recognize that and have invested the effort in providing good tutorials. And like you said, iPython Notebook is, you know, my kind of education. Right? Like, here, here's this thing. I'm going to explain it to you. Now go try it. Now go play with it. Now break it. Fix it. Right. Yeah. That I was gonna say that piece is really cool that you can,
[01:03:25] Unknown:
you can open up 1 of these lectures, which are done in the notebook, clone it into your own account, and then change it. So like 1 of the things for me personally that really helps me learn is to take something that I think I understand and then think about, well, if I change it in this way, how do I expect it to change? And then being able to see if my sort of thesis about the world actually holds true Yeah. And getting that sort of reciprocal feedback from the thing that I'm learning, I think, is a really powerful way to come out a lot of this stuff.
[01:03:56] Unknown:
Absolutely. And maybe building on what you just said, sort of being able to edge yourself out, right, like, onto the ledge. You know, when you because when you think about it, like, the unknown in any circumstance, learning something new is about, you know, to an extent sort of conquering the unknown. And it's like, you know, dealing with the psychological aspects of that is much, much easier if you have a platform where you have a known working example and you can change small things and then change larger things. And before you know it, your training wheels are falling off and you're, you know, writing your own algorithmic training trading, you know, formulas.
[01:04:38] Unknown:
So that's very cool. Yeah. The other thing I think too that sort of comes with that, that that sort of like that terror of the unknown thing is that I think the lecture series helps with is it has an order to it. Right? So there's like, you you show up, and it says lesson 1, linear regression, lesson 2, linear correlation analysis. Right? And there's this linear sequence of things where you'd say, well, you should learn this, then you should learn this, and then you should learn this. And if you want to go skip through them, you can, but giving you sort of a concrete path through the topics, I think, is 1 of the sort of subtle benefits of it, that it doesn't just give you this blank slate and say, here you go, you know, learn this entire domain all at once.
[01:05:16] Unknown:
Right. So are there any other questions that you think that we should have asked or that you feel? Oh, it's, you know, it's it's 1 of those things like sometimes, you know, we we try really hard to come up with a good set of questions that really sort of explore the topic but you are the expert. Right? So, like, maybe we're missing something or or maybe there's just something that occurred to you that we didn't cover.
[01:05:38] Unknown:
So I I I think you guys asked this a little bit, and I sort of glossed over, but you mentioned, you guys asked at the very beginning, like, why did we pick Python for, Quantopian to begin with, and I talked a lot about, like, all the libraries and all the, all these sort of things that exist in Python. 1 of the things that I wanted to come back to as we were talking about sort of IPython as a teaching tool and all that stuff is that 1 of the other really powerful things that comes with Python is the interactivity of it, and having the ability to get immediate feedback on what you're working on, I think has also been a really important part of Python's success in the data science community. And I think we're seeing increasingly Python taking a larger and larger share of data science because it has this great interactive story, and it has this great data analysis story, but then it also is this very versatile general purpose programming language where you know, you can do pandas and NumPy and all this stuff, but then you can also attach all that stuff to a web server, or you can attach all that stuff to, you know, some other, you know, programming domain where, you know, like r, for example, and I've used r some, I'm not a huge r person, so I will apologize to any r lovers who are also podcast in it listeners.
But so, like, R is great for statistical stuff. Right? If you just want to crunch numbers and you want to do, some sort of fancy statistical model, there's probably an R package for that. But if you then wanna hook that up to a web server or you wanna, like, run that remotely on another machine or you wanna do another sort of more general programming task, the ecosystem dries up really fast. And so I think 1 of the things that's been great for us with Python is it having these really great focuses in numerical computing and in web stuff, but then also having this very broad sort of expansive base of other skills and other libraries and other use cases.
And a lot of that comes from its sort of rich, long history of being around. But I think also the nature of the language itself has encourages, you know, these these very general purpose solutions.
[01:07:58] Unknown:
Absolutely. And then just quickly, I really feel like the power of the REPL should never be underestimated. Totally. You know, I I I really feel very strongly for a while, it seemed like the repl kind of fell out of favor. Right? Like, you had languages like Java and c and c plus plus. Not that they were new or anything, but Mhmm. It just seemed like the emphasis was moving away from interpreted languages and and I just, like, I'll admit, I am lost at c when learning a new language if I don't have a REPL. I mean, I I can do it, but it takes me an awful lot longer. For me, that ability to interactively play with expressions and noodle on things is just key to learning any new language or technology. Yep. I think especially too with Python,
[01:08:40] Unknown:
even compared to other languages that have REPLs, like we talked a little bit about Haskell earlier, and Haskell has a really good REPL. 1 of the things that it doesn't do super well that, and I don't think this is a knock on Haskell, it's like pretty deeply part of the design of the language, is that it doesn't have the rich introspection capabilities that Python has. So like 1 of the things that I love about Python, right, is if I'm trying to remember what some, like what this method signature of some Pandas, you know, some data frame method is or the data frame constructor, which takes a 1, 000, 000, 000 arguments. The fact that I can pull it up in in IPython, do, like, data frame question mark and have it show me the docstring or do help of data frame and have it list all the methods for me and have be able to sort of, like, introspect and interact with the code itself, is really, really powerful and really, really it's 1 of the things that makes Python a really productive language for me is that I often joke that I mostly use IPython as this interactive documentation engine, essentially, where I I have Emacs open in 1 terminal, and I have a REPL open, and the REPL is just there so that I can look up docs Yeah. Right? And I can quickly see, like, what the signatures of the code that I'm writing are. And 2, when you're working in the REPL, being able
[01:09:49] Unknown:
to view the current state of an object as well by just typing the dunder dict attribute and seeing what are all the different keys and values that are currently associated with this particular object. Totally. Yeah. Ripple driven development has definitely made my life a lot easier in a lot of ways when I'm working in Python and other languages that support it. To, pair Chris's question, is there anything else that we didn't ask that you think we should have or anything that you wanna bring up before we move to the picks?
[01:10:16] Unknown:
I don't think so. That's I feel like we got a pretty broad swath of different stuff.
[01:10:23] Unknown:
Alright. So with that, I will get us started. My first pick today is something called Kinetic Sand, and it's a lot of fun to play with. I have kids who love to play with it. We actually have a sand table where they can mold it and mess around with it and use toys to create shapes. And what it is, is it's actually just regular sand, but it has food grade silicon mixed into it so that it will stick together as if it were wet. But when you take your hands out of it, they stay completely clean, and it sticks to itself and nothing else except other silicone items. So if you have, cooking tools that have silicone on them, don't let your kids put them in the kinetic sand.
So I definitely recommend getting some and playing around with it. It's very interesting to mess around and see how it flows and sort of the weird state that it exists in. It's sort of a mix between a solid and a fluid in some ways. My next pick is a band called Trivium. And they're a very talented, very technical metal band that has had a lot of really great evolution in their sound over the years that they've been around. So each of their albums sort of has a different tenor to it where, you know, versus some bands where every album sounds kind of the same. With Trivium, each album sort of takes their sound in a new direction and they do a lot of experimentation to evolve their skills. So definitely worth checking out.
And my last pick is a website called Thriftbooks. And it's an online marketplace for used books. They have really reasonable prices. The site is very well designed. It works great on mobile. It has a lot of really great just little elements that you don't necessarily notice at first but really add to the experience. And with that, I will pass it to you, Chris.
[01:12:20] Unknown:
Great. My first pick today is a completely unintellectual, game, but it's really, really well done. It's 1 of those things where, you know, it's it's not like this is a really, really deep game, but it is beautifully, constructed and contrived and can and built. It's called Threes, and it's a puzzle game. I know you're gonna say, oh, it's just like 2, 048, but 2048 was was essentially an homage to this. Like, 2, 048 in its credit said, you know, by the way, this is heavily inspired by threes, And threes is way better than 2, 048 in in a number of ways.
The audio visual experience is really cool. Each number has kind of a personality unto itself. There's a lot more strategy in my opinion than 2048 in threes, and it's just really well put together. You can get it from mobile platforms. There's a clone for the web called threes.js. It's just it's great fun. I keep it on my phone, and when I'm stuck on the t and my brain is leaking out my ears and I can't I don't have the, you know, the wherewithal to read, It's a great way to pass it pass some time. My next pick is a new series on Netflix. It's the same production company I've been told that made Daredevil for them, and I think it's really good. It's called Jessica Jones. It's a Marvel, series, And it's just this really great sort of dark, moody, it is a superhero story but in a very unconventional way, and the the villain is just exquisite. He is quite a character.
It's it's really good stuff. My last pick is a podcast series that if you unless you were living in a cave, you may have heard of last year in its first season, it's back. The podcast is called Serial, and I think it's phenomenal. I think it's back with with a flourish. This this new season has a really compelling story about someone, this gentleman who, was captured by the Taliban in a real but the thing is, this is not your standard, like, you know, war hero, oh my goodness, we should be all swooning over him. He did some really questionable things and had some really kinda delusional ideas about what he was going to do and what he was going to accomplish. And as usual, the folk these folks really make it into a compelling story. So I highly recommend Serial, both the first season which was phenomenal and the second season at least out of it which has come out so far which is just 1 episode.
[01:15:04] Unknown:
But, great stuff. Scott, what do you have for us in the past? Cool. I I will also I I have played threes and threes is excellent. So I will I will second your, recommendation for that. Excellent, so I will second your recommendation for that. So I have 2 very bookish ones and 1 not, so I'll start with my not bookish 1, which is a game that, is probably the game that I have spent the most time playing of any game that I've ever played, which is DOTA 2. So DOTA 2 is the is a game made by Valve, who also made, the Half Life games and Team Fortress and a whole bunch of other games. They pretty much oh, and Portal and Left 4 Dead and all these other great games. They basically never made a game that wasn't phenomenal, and it's the sequel to a Warcraft 3 mod called DOTA, which stood for defense of the Agents.
DOTA 2 technically does not stand for anything, so the game is just called DOTA because of conflicts with Blizzard. But it's the sort of the original MOBA. So if you're familiar with League of Legends or Heroes of New Earth or any of these other, like, 5 v 5 team based hero leveling up battle games, DOTA 2 is basically the originator of that genre, and in my opinion, still the best version of it that's ever been made. And they just released a new big balance patch today, which I will probably go play after this because I'm really excited about it. My second 1, changing gears entirely, is a philosophy book that was written in, like, the 19 forties and published, maybe in the 19 fifties because it was not published after its author died, which is, Ludwig Wittgenstein's Philosophical Investigations.
So this is wearing my philosophy major heart on my sleeve a little bit. So, this is, Wittgenstein was this guy who was part of this big movement in the thirties and forties of philosophy of language and who were trying to connect language with logic, and they had this idea that, the way in his early career, he had this idea that the way to understand language was by reducing it to logic and understanding that every sentence expressed some sort of abstract logical proposition. And what's really interesting, I think, about his career as a philosopher is basically later in his life so he sort of builds up this program. He writes what he considers his magnum opus called the Tractatus Logico Philosophicus, and then he declares that he has solved all the problems of philosophy and goes off to become a school teacher and goes off for several years. And then through conversations with other philosophers, basically decides that all of the ideas in his in the Tractatus were wrong and and that he's totally misunderstood the point of language, and that you can't understand language without without taking into account how it's actually used by human beings, and that philosophy of language needs to focus on, like, this social context in which language is embedded and all the ambiguities that are he was sort of trying to stamp out and avoid and, theorize around, he realized are sort of central to the project of understanding language, that language isn't sort of essentially inherently ambiguous and that's part of how we interact with 1 another as human beings. So I think it's, it was 1 of the it's sort of odd to say this, but, like, it's a piece of intellectual work that I find, like, incredibly inspiring and incredibly interesting. And so if you're at all interested in, philosophy of language, I encourage you to take a look at that.
[01:18:35] Unknown:
It sounds great. It also sounds like it, you know, he basically pulled the, you know, philosophic philosopher academics equivalent
[01:18:43] Unknown:
of saying, I win, and then he dropped the mic and went home. Yep. But then but then the great thing about it is he did that and then came back and said, no, actually, I was totally wrong Yeah. Yeah. And, like, had the intellectual integrity to, basically renounce all of the other things that he had written. Not renounce, but, I don't know. It's very it's it's The book is very interestingly written also because the first book, the Tractatus, is this like it reads like a logic textbook. It's like proposition 1, proposition 2, theorem, proposition 3, and then and it's like this very technical rigorous piece of work. And then philosophical investigations, it's these, like, little snippets, and it's all, like, little paragraphs that are not necessarily directly related, and it's like this almost stream of consciousness style writing that's very interesting. I'm actually gonna add 1 in here now while I adlib 1 is, if you're interested in sort of the intellectual history of philosophy of language and philosophy of mathematics in that time, there's a great, great book called Logic Comics, which is a graphic novel about Bertrand Russell, who was another philosopher in this time.
And it's it's Russell and David Hilbert and Wittgenstein and Kurt Godel and all these people who are wrestling with these questions of, logic and meaning and mathematics and truth and language. And it's it doesn't sound like that would be an incredibly engaging piece of fiction, or it's not fiction, but it's, like, dramatized, fiction about Russell's life. I that was 1 of the things that originally got me interested in that topic, so I'll I'll I'll throw that 1 in for going on to the last 1. And then the the last 1 that I was gonna talk about is, is Infinite Jest, which is my favorite novel.
It's by an author named David Foster Wallace, who wrote mostly in the nineties and early 2000. This giant brick of a novel. It's really hard to explain. It's it's, like, half set in a tennis academy and half in a halfway house, and there's all these different characters. And it's about, like, how to, like, think about art in contemporary America when we're saturated with things that are trying to be easy and trying to be seductive. So a lot of it is about the relationship between, like, novels and visual art and things that ask the reader, ask the consumer to do work and be difficult and how those relate to things like television and the Internet. It's it's actually written before the Internet was really, as big as it is now, which I think is interesting because it's very appreciative in a lot of ways, sort of foreshadowing this idea that we so much of so much of, like, our most precious resource now is sort of attention in a lot of ways. Right? We have so many possible opportunities to be engaged and to be entertained and to be stimulated that it's hard for certain kinds of art or certain kinds of intellectual work to not compete exactly, but, like, it's hard for us to bring ourselves to engage with those kinds of works in the way that you need to in order to get things out of them. And so it's asking this question, right, of, like, how do we if we believe that those kinds of works are valuable and that they, you know, they provide certain kinds of experiences or certain kinds of stimulation that you can't get from something that's a little bit more superficial and a little bit easier, like, what's the role of that in our lives? And it ties that in interesting ways to addiction and to religion and faith and to sort of being a part of something larger than yourself.
I can't recommend it enough. I think David Foster Wallace is a brilliant, brilliant author.
[01:22:30] Unknown:
It kind of reminds me a lot of a book that I have not read, but I keep seeing referenced over and over and over again called, I believe it's entertaining ourselves to death, about how our society is so fixated on, immediate gratification and sort of, you know, Twitch based entertainment that we are losing our ability to focus on anything meaningful, kind of similar to what you were saying there. And as far as the Logic Comics, I have not read that 1 in particular, but I have read a very similar series called, like, the cartoon guide to, philosophers, various philosophers. Like, I read the cartoon guide to Heidegger, the cartoon guide to and there are a few others.
And it they are actually really, really, there's actually something really and I think I think it's they're actually by the same guy or the same publisher. And he also did, like, the cartoon guide to history, philosophy, chemistry. And there's actually something and in fact, modern day bloggers, I forget which 1 now. Maybe it was Saranya Bark, but I can't I I'm I'm getting that wrong, I bet. There's a number of prominent modern bloggers who have kind of made their name also in terms of explaining some kind of complex concepts in the form of cartoons.
And I actually think there's a tremendous amount of power in that medium to kind of trick us into conveying these things that could otherwise be really dry. But when you're consuming it in that format, it's like a spoonful of sugar helps the medicine go down kinda thing, you know, and you you actually do wind up walking away with maybe actually having thought about something that was kinda profound and kinda changed your worldview a little bit when in reality, you know, you're reading this book full of cartoons.
[01:24:21] Unknown:
Mhmm. Totally. Yeah. That's that also reminds me of, Randall Munroe, the guy who writes xkcd. Yeah. Just came out with a new book, the, thing explainer, which is all explaining these, like, complex engineering things using only the 1, 000 most common used English words, and so it's full of great things like, you know, the the Saturn rocket is the up goer 5. You know, it says if if this end is not pointing toward the ground, then you will not go to space today. So it's this very sort of simple language, but it still manages to capture a lot of the core ideas of these complex things. And I I I actually have not yet gotten that 1, so I can't recommend it, but I look forward to reading it when I get the chance to.
[01:25:05] Unknown:
Alright. So for anybody who wants to follow what you're up to and keep in touch with you, what would be the best way for them to do that?
[01:25:13] Unknown:
Sure. So, on if you're interested in some of the technical things that I'm working on, so Zipline and IPython and the PIData stuff, you can find me on GitHub. I'm sanderson on GitHub. I am, I think, scottbsanderson on Twitter. I should probably know that off hand, but I only fairly recently joined Twitter. But I believe I am Scott b Sanderson on Twitter, and you can also find me my email address at quantopian is scott@quontopian.com.
[01:25:46] Unknown:
Great. Well, thank you very much for taking the time out of your day to join us and tell us about algorithmic trading and your work at Quantopian. It's been very interesting, and I'm sure our listeners will appreciate it as well.
[01:25:57] Unknown:
Cool. Great. Thank thank you guys for having me on. It really has. And I and I have to say, it's kind of funny. I have a couple of weeks vacation coming up here at the end of the year. And not that I'm gonna spend the entire time because I wanna spend some time with my wife and go, like, enjoy the world, but I I will say that knowing that it this stuff is free to sort of even just, you know, start to poke around and you guys have some really great tutorials out there, I may actually invest some time in, you know, at least exploring it. If if only to understand what it's about and have a sense of it because it sounds really neat. I mean, it sounds like it it would be, even just, you know, as a technologist kind of fun to explore what you guys have built.
[01:26:35] Unknown:
Totally. Yeah. I mean, like I said, like lots of people on Quantopian do it as a hobby. It's not like their full time job or anything. It's a thing, which the irony, right, is like we've built this platform about encouraging people to come and do what would be in another context like a paid job, right, like a pretty serious amount of work to write code that does this very sophisticated thing. And people come, and we're sort of, I guess, hopefully not too surprised, but always sort of kind of amazed that people come and do this really hard work because they enjoy it and find it engaging and interesting.
[01:27:10] Unknown:
Yeah. Definitely a very interesting platform and a very interesting problem domain. So thank you again for taking the time to explain it to us, And I hope you enjoy your time playing the new DOTA 2 expansion.
[01:27:21] Unknown:
I will do that. Alright. Good night. Have a good day. See you guys.
Introduction and Host Information
Interview with Scott Sanderson
Scott's Introduction to Python
Choosing Python for Algorithmic Trading
Understanding Algorithmic Trading
Common Algorithms and Libraries
Sentiment Analysis and Data Sources
API Design and User Interaction
User Backgrounds and Community
Hosted IPython Notebooks
Backtesting and Paper Trading
Sandboxing and Security
Quantopian's Tech Stack
Symbolic Computation in Python
Algorithmic vs. Human Trading
Trends in Trading Algorithms
Getting Started with Quantopian
The Power of Python's Interactivity
Final Thoughts and Picks