Summary
Speech is the most natural interface for communication, and yet we force ourselves to conform to the limitations of our tools in our daily tasks. As computation becomes cheaper and more ubiquitous and artificial intelligence becomes more capable, voice becomes a more practical means of controlling our environments. This week Steve Penrod shares the work that is being done on the Mycroft project and the company of the same name. He explains how he met the other members of the team, how the project got started, what it can do right now, and where they are headed in the future.
Brief Introduction
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
- You’ll want to make sure that your users don’t have to put up with bugs, so you should use Rollbar for tracking and aggregating your application errors to find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300,000 errors for free on their bootstrap plan.
- Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
- To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
- Join our community! Visit discourse.pythonpodcast.com to talk to previous guests and other listeners of the show.
- Your host as usual is Tobias Macey and today I’m interviewing Steve Penrod about the company and project Mycroft, a voice controlled, AI powered personal assistant written in Python.
Interview with Steve Penrod
- Introductions
- How did you get introduced to Python?
- Can you start by describing what Mycroft is and how the project and business got started?
- How is Mycroft architected and what are the biggest challenges that you have encountered while building this project?
- What are some of the possible applications of Mycroft?
- Why would someone choose to use Mycroft in place of other platforms such as Amazon’s Alexa or Google’s personal assistant?
- What kinds of machine learning approaches are being used in Mycroft and do they require a remote system for execution or can they be run locally?
- What kind of hardware is needed for someone who wants to build their own Mycroft and what does the install process look like?
- It can be difficult to run a business based on open source. What benefits and challenges are introduced by making the software that powers Mycroft freely available?
- What are the mechanisms for extending Mycroft to add new capabilities?
- What are some of the most surprising and innovative uses of Mycroft that you have seen?
- What are the long term goals for the Mycroft project and the business that you have formed around it?
Keep In Touch
Picks
- Tobias
- Steve
- Ethiopian Cuisine
- Kansas City Barbecue
Links
- Google Home
- Tom Waits – Heart Attack & Vine
- mycroft.ai
- FLITE
- Vocalid
- Vocalid TED Talk
- PocketSphinx
- GE FirstBuild
- Sonar GNU Linux
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast.init, the podcast about Python and the people who make it great. I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode@linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app. You'll want to make sure that your users don't have to put up with bugs, so you should use Rollbar for tracking and aggregating your application errors to find and fix the bugs in your application before your users notice they exist. Use the link rollbar.com/podcastinit to get 90 days and 300, 000 errors tracked for free on their bootstrap plan. You can visit our site at python podcast.com to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
To help other people find the show, you can leave a review on iTunes or Google Play Music and tell your friends and coworkers. You can also join our community at discourse.pythonpodcast.com to talk to previous guests and other listeners of the show. Your host as usual is Tobias Macy. And today, I'm interviewing Steve Penrod about the company and project Mycroft, a voice controlled AI powered personal assistant written in Python. So, Steve, could you please introduce yourself?
[00:01:12] Unknown:
Sure. I am Steve Penrod, as you said, CTO of Mycroft. And, so Mycroft, as you as you mentioned, is the, open source world's answer to your Siri and and echo and any of the voice assistance platforms, that's what, we're looking to build is what we can use on the in the open source world.
[00:01:36] Unknown:
How did you first get introduced to Python?
[00:01:39] Unknown:
I think, you know, back in college, I had a lot of interest in all sorts of programming languages in general, you know everything from Lisp to, you know Ruby and those sorts of things have caught my caught my eye. I think the first time I was using Python actively, was in a couple years ago when I started using it mainly as a scripting language for, you know, operating system building on top of operating systems, sort of a shell extension approach.
[00:02:12] Unknown:
So you mentioned briefly what Mycroft is. Can you describe a little bit more about, I guess, how it got started and how you got involved in it? Sure.
[00:02:21] Unknown:
Mycroft was originally begun when the location where we're talking right now was starting up it's a makerspace, and there was some interest in adding some sort of cool geeky voice interaction. You know, everybody wants the Star Trek computer, and so they were looking around and they're trying to find something that they could use to do that. And, there really just wasn't a great solution. This was a couple of years ago. So this predated, really the release of even the Amazon Echo. So that voice interaction technology was cutting edge is really just being thought of how we could do things, especially, you know, there were a few things like the Siri systems, where you could activate them, but really something that was listening to you all the time and that you could activate with just your voice and nothing else and just talk to it. That, that was still a thing of of science fiction. So, when basically after looking around for this, nothing came up. It was decided, alright. Well, we're gonna try it. So, this is where there's a little bit of strange, strange history.
I actually was doing the exact same thing, in Kansas City about 30 miles away, not knowing these guys were doing it. And so we didn't join together until earlier this year in our efforts, But we both recognized the need for this sort of technology, and wanted us and we started building it a couple of years ago to allow, you know, like I said, the the Star Trek computer.
[00:03:56] Unknown:
So you mentioned that this actually started as 2 different projects that have been sort of merged together. So I'm wondering what that process looked like because I know that quite often, varying code bases quite frequently bear little resemblance to each other, and trying to figure out how to piece them together can be difficult unless they're designed in such a manner as to be very modular. So I'm curious what that looked like.
[00:04:17] Unknown:
Yeah. And and we're amazingly similar and completely different. The technologies that we used, a lot of what I was using was built on top of JavaScript and, and even using PHP and some mechanisms like that and Apache servers. And none of those literal pieces of code are going to exist in Mycroft, but conceptually, we were doing very, very similar things. And so it's been pretty easy for me to, you know, adopt the things that I had built that were architecturally different and that I wanted to incorporate into the Mycroft project.
[00:04:56] Unknown:
And how is it that you actually found out about the work that was happening in the Makerspace or that they found out about the work that you were doing so that you were able to sort of join forces?
[00:05:07] Unknown:
Interestingly enough, we were both talking to the same guy at Google, looking at potential for collaboration, and this is before Google had made any announcements about their interest in stepping into the voice interaction with the Google Home project. And so I had some old friends and friends of old friends who were who were working there and and introduced me to this guy. I'm talking with him, and and, you know, almost the same week, the Mycroft guys were speaking to him, and he he asked them. He said, do you know this guy? Because you guys are, like, 30 miles apart, and you're building the same thing. So it was pure serendipity, you know, since we've discovered we were both at the Kansas City Maker Faire, you know, a year and a half ago, a couple hundred feet apart. And, you know, we were we've been crossing paths just missing each other the whole time. So That's pretty funny. It's like they somebody wanted us to get together. And finally, we actually paid enough attention to do it.
[00:06:08] Unknown:
And how is it that you decided to grow a business out of this project? Because I know that you said it started off as just sort of a, hobby interest that you wanted to build up. What is it that made you realize that there was actually business opportunity behind it?
[00:06:23] Unknown:
Well, for me personally, when my interest in this goes back quite a ways, I'm gonna date myself a little bit. So back in, back in the nineties I worked for Autodesk for about a decade, and while I was there 1 of the pieces that I helped develop when they were coming out with, the original tablet PC from Microsoft, that was for the younger listeners. They didn't have touch screens on laptops back in the day, and they didn't do things like voice interaction and being able to draw on your screen with a pen. That was all kind of new concepts at that point. And so, I was building, some demonstration software on top of Autodesk products that, when they were introducing the tablet PC, you know, they could show off on stage what the capabilities of this was. And, 1 of those pieces at that time, Microsoft had the speech API available. I think it was, like, version 4, maybe 5 at that stage. And, so from a personal interest, I just started what what what really boiled down to is I wanted to be able to play my music through that and and be able to voice control my music collection because it was kind of big and it was cumbersome to interact with it in any way. And so I was looking to see if I could build a voice interface to allow me to say, you know, play, you know, Heart Attack and Vine by Tom Waits. And, the short answer is I couldn't back into the time frame. The pieces just weren't ready for it. Technology wise, you know, speech API was okay. It had to do yeah. It had to do sort of command and control where you preloaded it with with a fixed vocabulary, which maybe was doable. What I was looking at loading, you know, thousands of words or tens of thousands of words it wasn't really well suited for. And then on top of that, the the hardware at the time, you know, microphones were 100 of dollars and this predated the pervasive wifi. And so you really were talking about running cables all over your house if you wanted to be able to do this. And, you know, it just boiled down to you couldn't do it. So, you know, I abandoned that idea in the late nineties. But then when I finished my, the last company I was involved with, finished at that startup and, and I was trying to decide what I was going to tackle next, I started looking around at the world that it developed, you know, and things that had changed.
And it struck me that today's technologies with the single board computers and the the power that was possible there that now was easily as powerful as the desktops back in the 90s, and the speech to text technology on the desktop machines was better, but also the cloud, especially a couple of years ago, was just ramping up. And so these cloud approaches where you were able to throw even heavier duty processing at it and where they were able to collect massive amounts of, voice data to train their systems. Those were making leaps and bounds forward. And then finally, microphone technology.
The joys of cell phones is they made microphones in huge demand. And so the mics of today are easily as good as the mics that cost 100 of dollars. You know, honestly, it's probably $500 for the quality of mic that we're talking about in our cell phone today that cost, you know, $2. So all of those pieces came together to where it struck me after I did a little prototyping that what I had thought about building back then was suddenly in reach, you know, capable to build this. And, similarly, the not to the exact same story, but, you know, within the Mycroft Makerspace, everyone was going through kind of the same kind of process, thinking through the technologies that are available today and realizing that they could build something like that. And so I think there's a this happens with a lot of technologies. There's a time where it is ready to emerge.
And, you know, you see that if you even going back to, you know, the light bulbs or, you know, the the automobile or or planes, people around the world that had no knowledge of each other are simultaneously developing these, some sort of technology that's just the world has come to a place where it's it's now ready to support that technology. And I think voice interaction like this, it's just time for it.
[00:10:51] Unknown:
So can you dig a bit into how Mycroft is architected internally and some of the biggest challenges that you've encountered while working on building out this product?
[00:11:01] Unknown:
Sure. There there's a little piece of the story that I think I left out that's, of interest when it comes to the architecture. Originally, when they started building this for the makerspace, they realized, you know, if we want this, probably other people are gonna want it too. So that's when they looked at the idea of, doing, you know, Indiegogo Kickstarter projects with it. And so that's really what funded it and got it going from just a couple of guys kicking around an idea to a business. And so, you know, had a a strong demand merged out of that, and that started the Mycroft device, this project that they were building. And, so originally, that was going to be, you know, if you and if you go look at mycroft dotai, you can see this cute little enclosure that, look at it, and it's got big big round eyes and a matrix display in between that you can see its mouse moving and smiles and things like that. So it is a very interactive, piece of hardware, very aging piece of hardware. But as this was being developed, we recognized that there were a lot of things we were building that could be used outside of this specific piece of hardware. And so a little bit you know, I I won't say it hadn't been thought from the beginning, but somewhat midstream, there was a little bit of a shift that to make sure that it would be possible for this core that we were building. And and so ended up calling it Mycroft Core to exist independently of the hardware that's enclosing or that's that's, wrapping this piece of software. And then you could wrap that software core with your your enclosure which might be a simple smiley little robot box or it might be a refrigerator, or it might be a desktop you know running a Linux distro. But as far as we were concerned it doesn't really matter. It's just piping in the audio data that we process and we do some some stuff with, and we return to it, audio data that the enclosure, whatever it may be, can play out at speaker, presumably. So then the the other piece to what we were building, the other architectural piece that is critical for this to be a successful and really useful technology and not just a toy is we don't want it to only be able to do handful of things that we make it do. We really know that in order for this to be what, you know, everyone imagines it to be, it needs to be extendable. And so the other side, you know, you've got people making different enclosures, but then the other side of that is we need different kinds of skills. And I think that terms become associated with voice, voice systems a little bit here. And so you can create a skill that can, for example, control your Phillips Hue lights or a skill that can reach out to your you know, favorite website and, and scrape some data from that or or access a web API and, bring back a result and then speak the result to you. And so we the intention is that we would build a handful of these skills, but, honestly, those are just examples for the rest of the world to reference and build the things that they imagine and to support the services that they have, you know, already built.
And they just want to allow somebody to be able to interact with those services using their voice or to control the Internet of things devices that they've built. And, you know, let's I don't know if you've played with, any of the IoT lighting systems, but the reality of having to pull out your cell phone and pull up an app and flip to the right screen and tap a button to turn a light on and off, it's just not that great. You you want it to just happen. You wanna say turn on the lights or dim the lights or turn off the lights. You know, that's that's good. Or, you know, some of the things that we're gonna be coming out with is you just walk into the room and it knows you're there and the lights come on. That's what you're really imagining the Internet of things to be, you know, where the environment is immediately available and and controllable by you without any effort.
[00:15:33] Unknown:
And I know that when you go to the project page on mycroft dotai, there are a few different libraries there. There's so there's mycroft core, which ties it all together, but there's also a couple of ones for the speech to text and the text to speech, as well as the AI routines. So I'm wondering if you can speak a bit to those, different projects and what the current state of them are respectively.
[00:15:56] Unknown:
So there are a couple of different pieces that are necessary to create a voice interaction system, and 1 of them that you would start with is the conversion of your voice, the speech to text. So speech to text is an encapsulated technology that we're very dependent on and interested in, and at the moment we've got a a tool that we've started to build called Open Speech to Text. That's a it's been we're not completely sure we're going down this path. There are a couple other options, but we've been looking at Kaldi, which is another open source projects that's been out there. We're looking at building on top of that. A lot of what we're what we've done is really standing on the shoulders of giants. Other people build things, and we're continuing that work. Then, another, critical piece once you've done that speech to text is figuring out what that text means. And so there's a there's a phrase that we use called intent parsing or intent processing, where you're trying to figure out what's the intention of those words that the person just spoke. And, for that, 1 of the another piece that was built, and this 1 was largely built for the Mycroft project when the, Sean Fitzgerald was 1 of the early guys involved in Mycroft.
It allows you to define keywords to identify within those phrases skills that the phrase is intended for. And so we use that to say, you know, when I say turn on the lights to recognize, oh, this is the intent is to deal with, with the light system and on is the action that we're wanting to take. And so then we can pass that off to the skill to handle it. Then once you get a response from that, you need to take that response that's coming back in some form and speak it to the user. So that's the next piece of technology is text to speech. And, we're using a system that we built on top of, open source project called Flight, which, you know, the whole text to speak or text to speech world, Flyte was actually built on top of festival.
So there's a there are generations of text to speech technologies built on top of each other. Mimic is the name of the package that we've done. Then the final piece, which we which is Mycroft core is what pulls all of those bits together. The speech to text, the intent processing, and or the intent parsing, excuse me, and the, text to speech and pulls those together into the complete process that is Mycroft, the complete system that does that. But each of those technologies is actually self contained and is usable in and of itself.
[00:18:48] Unknown:
And when I was looking at the list of projects there, I noticed 1 of the things about the mimic project is the, intent for it to be able to have a higher quality voice than the sort of traditional robotic voices that we've dealt with when dealing with typical text to speech programs going back to our, you know, original cell phones before they gained any sort of intelligence. So I'm just wondering, I guess, what are some of the approaches that you've taken there and some of the problems that you're needing to overcome? Yeah. This is 1 of the 1 of the things that I as an aside, 1 of the things I really like about this problem that we're dealing with is the the ability for us to take all these
[00:19:36] Unknown:
you know, 1 of our our tagline is AI for everyone. And what I like about that is the overloaded way we can use that. When I say when we say everyone, partially, it's, you know, making things available to the elderly and to, you know, people who previously, you know, because of maybe physical reason, they can't use a keyboard, so they weren't able to use a computer. So now we're able to enable that. Other ways where it's useful are where we can take pieces of our technology and make them available to other groups to build, as building blocks that they can do interesting things with. This text to speech in particular is an area where, there's a a company called Vocalid out of, Boston and their business is built around the idea of or their effort, I should say, is built around the idea that all of the text to speech voices that exist, there's a limited number of voices. It's getting better recently, but there were especially when they started up, really, there were only a handful of voices that someone who had to use an assistive technology had available to them to use as their as their robot voice or their their surrogate voice. So if you're a 11 year old girl who can't speak and has to use some sort of an assistance to speak, you really don't wanna sound like Stephen Hawking's voice as an 11 year old girl. You wanna sound like an 11 year old girl. And so that was their charge was to build this bank of, voices to enable them to, enable people to build, lots of different voices and personalize that. Their goal was to do it to assist handicapped people, or people who had some sort of reason that they weren't able to use their own voice. We've been able to work with them in partnership with the Mycroft project. So they have built the default voice that we're using right now, which came from, Alan Pope, who's a community manager, the in the Ubuntu world. And we're going to be helping them in, in acquiring some of the recording data to be able to build more voices. So it it's a, you know, a virtuous, interaction we have going there where we can help each other to help make the world better, honestly.
[00:21:54] Unknown:
Yeah. That's very cool. I'm actually aware of the work that they've been doing because 1 of the founders of that project actually did a very interesting TED talk about the work that she was doing and the, business that they started up with that. So I'll add a link to that in the show notes. And, as an aside for anybody who has a few moments to spare, they actually have a space where you can go and donate your voice to contribute for them to be able to build new voices for people who do require, assistive technology for being able to speak so that they can have more of an individualized voice than what is generally available to them.
[00:22:30] Unknown:
Sort of thing that just makes you feel good about yourself.
[00:22:33] Unknown:
And I know that the, speech to text aspect of it is actually 1 of the more difficult problems in the in the overall voice interactive space. So I'm curious, I guess, what sorts of progress you've been able to make there. And also 1 of the things that has made that email a more surmountable effort lately is as you mentioned, having access to cloud computing resources and being able to train those models on multiple different inputs. But an associated concern that comes out of that is privacy because of the fact that your voice does get shipped up to a remote server that isn't under your specific control. So I'm wondering what you're doing to address, any sort of privacy concerns around that or if there's the possibility for doing offline training.
[00:23:19] Unknown:
So the the shortest answer is the offline bit is really not not practical today. There's just too much for the small devices that we're talking about. You're never gonna be able to do that sort of processing on a Raspberry Pi class of device. There are few technologies that are being used in situations where, you know, there's no Internet connectivity or for other security reasons. Yeah, I know a few military type applications where they didn't want to have that going on, but those are more limited vocabularies that they're able to handle. So if you're really talking about being able to take vocabularies that they're able to handle. So if you're really talking about being able to take conversational, speech, like what you and I have going on right now and converting that into text as a free flowing thing. In the near term, that's going to have to involve some form of cloud interaction. As of today, that means we've got to do some form of cloud interaction. And the choices that we've made while we're building the open speech to text technology is that for a voice interaction system to be valuable, you have to have really high quality speech to text.
You know, if it has a 20% error rate, you know, 80% sounds okay. Maybe if you're taking an exam, it's an okay score. But if you're trying to use that to control things around your house and, you know, 1 time out of 5, it gets it wrong, that's just irritating and you're gonna stop using the technology, whatever it is. So for today, we've chosen to go with, the by default, the strongest speech to text engine that we that's available to us, and that's been the Google speech to text engine. But we privacy, as you mentioned, is is, you know, 1 of the things that really jumps out when you're talking about voice interaction technologies because by their nature, they're listening to every word you're saying. Right? So the first stage of privacy is that the actual every word you're saying listening that goes on, that does occur locally on the device that, that is running the system. So on the Raspberry Pi, we've got a piece running that's built on top of PocketSphinx, that is doing what we call wake up word processing. So what that basically means is your microphone is constantly listening to every sound coming out or coming at it, and it's trying to do some analysis to see, did you activate the system? In our sits in our system, it's Mycroft, or hey, Mycroft. That's what brings its attention.
You'll you see wake up word technologies used in the other systems. You know, if you if you call up Alexa, that's the wake up word for that or or the okay Google phrase, the the Google products. And so that the wake up word technology is the first protection of personal privacy. Now that leads to some of the value of open source technologies versus closed source technologies. If you look at if you wanna know on a Mycroft device or Mycroft system, is it listening to me all the time? You can pull up the code and you can look and see what the code does with that audio stream coming through and you can see up. Okay. Yeah. They're looking through this. They're looking for the wake up word. And then that's when they trigger the connection to the cloud to perform the speech to text. If you're looking at, an Amazon system or a Google system, it's more opaque what's going on. You know, you're you're asked to trust that someone is doing the thing that I just described. And, you know, I completely believe they are doing that, but, there's no way to verify that that's the case. So that's 1 of the first levels of privacy protection that an open source system gives you. So the next level of privacy protection that we're implementing is when you are talking to 1 of these systems from a traditional company, they have commercial interests. You know? They have lots of other things going on that they're wanting to tie you into their systems and and support their systems. And so, you know, for example, every time you're doing a a voice interaction with with a Google device, it knows that it's you, you know, Tobias, that it's that's talking to it, or it knows that it's me, Steve, that's talking to it. But based on my, either the login session that I have going on on my browser or, you know, my phone, cell phone association or whatever. And so they are able, again, I'm not saying they they're doing it, but they do have a very tight coupling between the information that they gather and the voice that, that is transmitted to them. So they know the identity of the individual speaking to them. And so that's 1 of the areas where we're, again, able to do some some even using some of these proprietary sys or these proprietary systems, we're able to put in an isolation layer. So we just we break apart. They know that a Mycroft device is speaking to them. Is it this voice data is coming from a Mycroft device, but they don't know for sure who it is that that created that voice data. They don't there's there's no connection between the user identity and that voice data that's going out. So today, that's the level of protection that we're able to offer is that, anonymity to the voice that we've created.
[00:28:43] Unknown:
So as far as the artificial intelligence aspect of it and the sort of adaptive intent parser, I'm wondering what kinds of machine learning approaches are being used and whether they also require a remote system for being able to execute or if those run locally on the device.
[00:28:59] Unknown:
This is gonna be a a little bit of a long answer. The way that we're approaching it right now today in 2016 is the Adapt system that powers Mycroft is much more, procedural, I'll put it that way, and the way it does the intent parsing. But what this does is it's laying a framework for us to be able to start gathering the information that we can use to do some classification and, have a more free form flexible intent parsing. And so we've part of that requires you gotta have a lot of users and a lot of interactions for us to start building the, data to allow a learning system to really learn well. And so but we have done some prototyping.
1 of the, Jonathan D'Orleans, 1 of the, developers on our sis on our, the Minecraft project, he's got a educational background and a lot of, personal interest in artificial intelligence techniques, And he has built a thing called Neo Evolution. And so that's a framework that he's created for a evolutionary artificial neural network. So this pulls together a couple of different concepts out of the AI community. So neural networks, you've probably everyone's heard of these things and deep learning and genetic algorithms. He's pulling all of these things together. And what it allows is the the neural networks. It allows them to evolve better than the way they've traditionally have been implemented in the past. And so we're then able to use it.
That framework is capable of doing a lot of different things, you know, problems and classification problems and, you know, even playing games. It's capable of of learning lots of different things. So we've been, he's built this framework and we're, prototyping using that framework to then to take some of those control phrases and just be able to toss them out at it and let it learn. So for example, today, going back to the lighting control example, it's the a programmer would have to say these are the words that I'm interested in, you know, lights or and on and off. We'll just stick with those simple ones. And so if the the Adapt system is able to identify that you're interested in lighting and the action is on or the action is off. And, you know, we've got some flexibility in the way you can throw those words together, but it's kind of it has to be those things that the programmer has thought about. They have to do some pretty explicit thoughts of what the the activation phrases are that a, that a user is going to have. What the Neo Evolution will allow us to do is take a whole bunch of different phrases that we can that either we've come up with or that the skill developer comes up with when they're developing their new skill, or, we can do, opt in mechanisms where the unrecognized phrases that someone tries to speak to a device get reported back to us, and we can start, throwing those into the system. And so instead of saying, you know, turn on the lights, you could say something like, it's too dark in here. Hey, Mycroft. It's too dark in here. And it can figure out from that that the intention is to bright to turn on the lights. Even though you didn't use the word light or on, those would get lumped into that, that intention.
There the the approach that we're gonna be doing with that is a a thing called, there are a couple of different ways of doing that. So there's supervised learning and unsupervised learning. We're gonna be doing more of the supervised learning where we're throwing it into this is the batch of stuff that your phrase corresponds to. We're looking at a few other approaches to how other ways we can enhance the experience down the road, but that's our first approach.
[00:33:07] Unknown:
You referred a number of times to running this on a Raspberry Pi, but you also mentioned the ability to run it on essentially anything with a microphone and speakers. So I'm wondering, I guess, what's the minimal amount of resources necessary for being able to get this up and running? And what does that install process look like for somebody who wants to start tinkering with it for themselves?
[00:33:25] Unknown:
Part of the architecture is we want it to have a lot of flexibility and flexibility in all sorts of different ways. That's been 1 of the challenges of it. And so 1 of the ways that we want to make it flexible was the computational resources, the demands you have on there. So the sort of the default implementation of of Mycroft right now, so if you pull it off of our GitHub, make use of it the way it's configured by default, the system that gathers the audio data that's, of course, running locally, doesn't take up much computational resource to do that. Then it reaches out to the cloud to perform some speech to text. And so the computational resource there is is away from the device. Then the speech to text result comes back down to the device, and that's where we're doing the intent processing. So that's occurring under the Adapt system locally. Then depending on the particular skill you trigger, that may or may not occur on the device that you're running.
You know, if you're saying what's the weather like, that's obviously gonna reach out to a database somewhere in the world, and or not database, but an API somewhere in the the world to pull that data down. If you say, you know, something along the lines of set a timer that might all be running on the the Raspberry Pi itself or or whatever your device is. So there's a little bit of variability on on the computational requirements for that. Then the last piece by default that we do is the text to speech through Mimic is run on the Raspberry Pi. So that probably the single highest computational consumer on the Raspberry Pi is that process of going through the text and generating the wave representation of that and speaking it. Part of our architecture, though, is allowing us to switch things. So the speech to text engine, as I said, the default is that we're using the Google speech to text right now, but we also can switch between different providers.
We have worked with the IBM Watson Technologies coming out of there. We've worked with our own open speech to text. And so any speech to text mechanism you can connect up with. Could even involve a local 1 if you wanted to have a complete offline system. I you know, we haven't attempted it, but it would certainly be possible to hook into, you know, Dragon systems and some of those that have pretty decent local, speech to text could be connected into our system, and that could be a cloudless implementation. On the other side, the text to speech that we do by default, like I said, is mimic, but, we also have hooked up to other systems. So we could connect up to the Google text to speech engine or ispeaches text to speech engine or 1 of, you know, a dozen different services that allow you to send off to the cloud and it pulls down a wave representation.
In that case, there's very little processing used to perform the text to speech on the local device. So that's part of our our intention is that the Raspberry Pi implementation, we're we're getting the pretty much the most out of it right now. And using Mimic would be a challenge on anything with much lower processing power and memory than a Raspberry Pi does have. But if you started using a cloud text to speech, I think there's a good chance we could get it down to the level where you're able to run on, you know, probably not a microcontroller, but, you know, some of the very, very simple system on chips, things running on your watch and those sorts of things.
[00:36:59] Unknown:
And for the actual install process, is there any sort of easy to follow, installation guide, or is it still sort of a work in progress?
[00:37:10] Unknown:
So, currently, the easiest way to get started if you're running on Linux is, you can pull down our system from GitHub, under, github.com/mycroftai. All of our all of our code sitting out there are available for you. And if you follow the getting started guide that we've created, at, docs dot mycroft dot ai slash development slash getting started. That will, walk you through the process of pulling it down. There is a there is a stage. The majority of what we do is, the vast majority of what we do is in Python, but Mimic is is c plus plus code. So you do have to go through a compilation stage on that 1. And I will warn you about that 1. On a Raspberry Pi, it takes a while. Don't, you can start it and then go have dinner. But you only have to do that once. Beyond that, it's all Python based, so it's pretty pretty straightforward to get going with it if you're familiar at all with Python.
[00:38:08] Unknown:
And are there any, distributions of Linux that are more well supported than others?
[00:38:14] Unknown:
We're, working with Ubuntu for a while, and, I I can't, a few things I don't think I can really drop names yet, but we're talking to some other people about things. And, I think there's gonna be a lot of, tight collaboration. The open source world, the Linux world is recognizing that, you know, they need their version of Cortana and Siri. And, you know, I'm happy to say that they seem to be recognizing us as the technology that is likely to fill that gap for them. And they want to work with us in integrating this into a lots of different systems.
[00:38:52] Unknown:
So essentially, going back to the Raspberry Pi example, if somebody just installs the Raspbian distro, then they shouldn't have any issue with getting this set up. Yeah. It should be pretty straightforward, and we are,
[00:39:04] Unknown:
we're getting, where they can pull it down off of, you know, using the the standard distribution methods. We've got a few packages being built, and so those way those are gonna get easier, in the very near future for people to just pull down as a package and and, be up and running.
[00:39:22] Unknown:
So there are a number of success stories, but also a number of failures in terms of trying to run a business based on an open source project. So I'm wondering what are some of the benefits and challenges that are introduced by the fact of making the software that powers Mycroft so freely available?
[00:39:39] Unknown:
It is, you know, our greatest strength in our Achilles' heel, I guess. You know, the I think what really makes this into a compelling story is people recognize it's kinda creepy to be listened to all the time by your computer. And what is it doing? You know, that is something that, is a huge privacy concern. You know, at least I can turn off my laptop or I can leave my cell phone in the other room and walk away from it. But the world we're looking at moving into and in the very near future is that there's going to be a device in every part of your life that's listening to your words and listening to ambient voices floating around and wondering, you know, as an individual, I've got to wonder what is happening to that. And I think that the open source approach is really the only way to, to satisfy people that their voices and their privacy are being respect. And, so that's the biggest strength of the of Mycroft. And that's what distinguishes, I think, from everything else is the fact that we are making all of this available to the world to to look at what we're doing. And, you know, we have no secrets about it. And we don't we're not hiding anything. And we are inviting people to come and look at what it is that we're doing with their with the the sensitive data that we've got. So that's the advantage of it. And then the other advantage also beyond just the privacy is the fact that there are so many things you want to do with your voice.
If you just stop and think about if I could talk to my environment, this is actually how I got started on the project years ago before I even built a single thing. I was just walking around my house and people probably thought I was insane, saying out loud commands, you know, the things that I would want to do and and walking through that process in my head. And there are lots of things and I can't build all those things. And any team of of a dozen people or even a 100 people can't build all those things that the world wants to do and control. And so 1 of the cool things about the fact that this is open source is we can, put it out there and enlist the help or or make it available for others to join in on the effort. And, people can reference other people's designs and other people's clever ideas. You know, as we're doing voice interaction, this is this is a brand new thing. I I liken it to back in the nineties when or the late eighties even when we were first creating GUIs. And if you the the the idea of a checkbox and a radio button and push buttons, those are so simple to us now and they seem so obvious, but they weren't actually back then. Then. I can point to some really early, you know, Windows implementations where they hadn't really figured out the difference between a radio button and a checkbox.
People mixed and matched the way those things interacted because the patterns of usage hadn't been established yet. That's where we are today when it comes to voice. You know, how do you use your voice? You know, how do you categorize? How do you direct when there is no you know, it's really boundless. The context of what you're trying to do inside of your head, the computer doesn't have many indicators of what that is. You don't have an active app that's on screen. It could be anything that you come up with in your head. So how do you figure out and categorize contexts and interactions and phrases?
You know, some of the things that I've come up with, there's, you know, phrases like show me to indicate I want something to appear on a screen and to be shown to me. I think those kinds of interactions are going to be developed, and, those techniques to respond to it are going to be developed. And as an open source mechanism, people get to build on top of each other's efforts. And I think it's going to accelerate the way that we're able to make all these different things that we want to interact in the world to make them available to us. And so that's 1 of the greatest, the great powers of the the open source efforts is is coming together to do this. Now you you ask, of course, what's the flip side of it? What's the cost of it? And, for us as a fledgling business, when you're talking to an investor in any business, 1 of the first questions they ask is, how do you make money? That's what drives the business world, and that's what drives investment is return on the dollar. And so the idea of handing off your intellectual property, that's for for years, you know, there were the patent wars going on and and everyone was protecting stuff. And, so to these investors, it's just it makes no sense what you're doing at first blush. It scares them away. And so for us as a business, it has definitely been a challenge finding investment investors and investment that, you know, is willing to jump into something where it really is a a big long term project and the the payoffs come as we build this for the world and as we make this thing available, that's how it becomes a viable thing. In the short term, we're not going to generate 1, 000, 000 of dollars on your return.
So for some investors, the fact that they have to wait to get a result from this challenging technology, They'd rather go invest in somebody who's making, you know, fancy shoelaces that they can sell, in in 2 months and and see a return on that, a profitable return on that. That's a tricky tricky sell. Fortunately, we do have a few examples that we can point to, projects that have, succeeded in handing off their their source code. You know, Linux obviously in and of itself is is a huge success and and it's humble beginnings as this little project that grew and grew, and now it's suddenly, you know, underlying technology for so many devices that people don't even recognize, you know, the the the Wi Fi routers that they depend on. They don't know what's running it under the hood or the the gaming systems that they're using or the the phones that they're doing, you know, the underpinnings of Android. It all comes back to this little open source project that this guy started years ago. And, so we have at least a few of those success stories to point to. Android itself is another open source approach.
Some people might debate some bits about that, but it is it was an open project that that that was made available to all these phone manufacturers to pull it in to their their equipment. So that makes sense to some investors now that they can look at those examples. But it is still limited, very small subset of of the traditional investment community that that understands the value of this and why it is important to make it make this technology open and available for everyone.
[00:46:18] Unknown:
Yeah. And going back to your point about the fact that you wouldn't even that you wouldn't have the time to be able to build all the different possible applications for this kind of technology if you had just had a single team. Not only does it increase the amount of actual work that can be done, but also it increases the diversity of the applications. Because if you do just have a single team focusing on the project and the only input into what should be built are the people on the team or the people that those individuals are directly interacting with, and there's huge swaths of the global community who would be left out because of their particular cultures or their particular environments or varied levels of capability in various aspects.
So that's another aspect of the open source movement that adds a lot of power and weight to making something like this open source. And then on the potential downsides, there's also at least perceived conflict of interest when you do have a business entity running an open source project because you have to worry about what is the community engagement and community management look like. Because, well, on the 1 hand, the community has their sets of wishes and wants that they would like to see present in the open source project. The business may have a completely conflicting set of ideals, and so you need to try and figure out early on what those governance models look like so that you don't end up with a situation similar to, you have with, for instance, what's going on with Docker where there's the potential of a fork because people aren't quite, happy with the direction that they're taking the project.
[00:47:45] Unknown:
Right. Right. Right. No. I I completely agree with you on both points, and that is something I left out. 1 of the big pieces of of our technology that I'm hoping we can take advantage of is the international aspect of it. We've got people who are working with us, in our Slack channel from, you know, literally around the world. And there's if you look at the current voice assistance that exist, it is very focused on English and the English language. You know, the echo device was only for sale in the United States and they finally made it international by allowing you to sell it in England. So it's still a very small portion of the world that has access to these technologies. And, even if you look at portions of the world where people are able to speak English in speech to text technologies, accents actually are make it difficult for speech to text to operate. So if you're say, someone in India who speaks English very well, but you do have a definite accent, there's a good chance that the current generation of technologies don't work. And so I'm really hoping, that we're able to work together with a lot of different entities as an open source project, university efforts that have been underway and doing speech to text in lots of different languages. I'm hoping that we can establish and solidify some of these partnerships with them to bring Chinese and Japanese and even the smaller, more obscure languages that have been traditionally been ignored by technology because there just wasn't a large enough population to justify a commercial entity from making a investing a lot of effort in supporting them. So like you said, I really am hopeful that we're able to make the most of that as open technology. But as you said, the other flip side of it is, it is a community. And we, Mycroft Incorporated, this the hunk that I'm involved with as part of the community is just a part of the community. I don't get to dictate things. It is it is open to everyone. And so, you know, we are, we do recognize and and we don't begrudge that. You know, It's actually 1 of the great things about it is it helps keep you honest, and it helps you make decisions based on larger point of view than just the the side of the problem that you're looking at from where you're sitting today. And, you know, it does have the dangers, like you said, if, of a forged project, you know, terrifies so many people.
But I'm willing to live with that risk. You know, I think it is. I think what we're building, the technology we're building is too important for the world to not be available for everyone to make use of and to be able to pull into things in whatever ways they see fit.
[00:50:33] Unknown:
Bringing it back a little towards the beginning, how did the decision to write this project in Python get made, and what were some of the reasoning behind it?
[00:50:41] Unknown:
So that was an early strategic decision actually. Python, you know, when there are there are some advantages to the form of what we're doing to a scripting style language rather than a compiled language. Being able to plug in and and and making it easy for people to develop things and and be able to bring pieces in. It worked it works well with a with a scripting language. And Python in particular stood out in both its capabilities as a scripting language. You know, the object oriented, pieces are we take a lot of advantage of that, but also the wealth of libraries that are available for people to be able to pull in and to be able to take advantage of those. As I said, we expect lots of different things to be built. And so access to all of these libraries just makes it easier for people to build things. So it felt like it was the right tool for those reasons. And then the fact that we want to be able to run this technology on lots of different platforms and capability platforms, higher high end and low end computational devices. We needed something that had broad based support. And, Python is 1 of those scripting languages.
That's probably over classification, but Python's 1 of those languages that exists very early on lots of platforms. So we liked that, from a strategic standpoint to, any technology that we built, we want to be able to take advantage of that, widespread distribution possibility.
[00:52:12] Unknown:
And what do the mechanisms for extending Mycroft look like? Is it just a matter of dropping a set of scripts in a particular directory structure and having it pick it up automatically? Or is there any sort of particular interface that's exposed from the Minecraft core for people to be able to hook into?
[00:52:30] Unknown:
So there there is an interface. We have, the skill API. So it it's a combination of what you just said there. There is a if you're doing development today, the approach is there's a folder on your device that has your custom skills. And so you drop into that folder, you create your own folder, for your skill, and you could have multiple skills each in their own folders. And that, you know, your your init that goes in there, you derive from the skill API that we've exposed. And so there's a handful of of, methods that you would then extend and derive from and the capabilities that it then provides you to write back to say, you know, I want to speak these words or I want these things to show up on a display.
The skill API that you derive from allow you to send those things out to the Mycroft system. Beyond that though, you know, you can do pretty much anything your imagination comes up with. We're we're we provide starting point. So we come at you and we say somebody wanted you to do something. Here's the here's the intent that we're that we got from them of what they wanted to do. And then it's up to you to do it. React in any way you want. And then we also provide the mechanism for them to feedback to the user, say something or show me something on the on the display.
[00:53:55] Unknown:
And what are some of the most interesting or surprising applications of Mycroft that you've seen people put it
[00:54:00] Unknown:
to? So there are a couple of applications that, have been developed for Mycroft itself, that have been, you know, fun applications using a 1 guy, had built a robot using OpenCV, to do some image processing. And he combined that with Mycroft voice control. So the robot would be able to look at someone, track them, and you could say things to say, you know, take a picture now, those sorts of things, to be able to control this robot using its voice. We've done, you know, obviously, lots of things with controlling houses and lights and and things like that. We've already had community members working on using it, different environments, running it on your Android system, running it on your Android Wear devices, running controlling desktops. Those are kind of obvious, but, you know, that's it's it's also good to see that, the ideas that we had when we were building this, we were able to run it in different locations are possible or feasible and are happening now. And 1 of the things that I got to participate in that was a completely unexpected style, or usage of this, We went a few weeks back to a facility that GE runs called FirstBuild in Louisville, Kentucky. And, GE had invited us to go there to we'd been speaking with them about some potential things, possibly working together or showing them the sorts of things that we could do. And they invited us down to this facility, which is it's kind of a factory and a makerspace that GE makes available to the community. And they hold periodically different events there. And once a year, they have this weekend long event where they pick a theme and people break up into teams and they build some sort of, typically some sort of device around that. And the theme this year was the future of cooking. Well, I had Josh, Joshua Montgomery and I went down there. Josh is the CEO, and we were just going to go down. We didn't honestly know what we were going to do. We were just going to watch probably. But when they got there, they encouraged us to participate. And so we ended up joining a team with some, college students from Art and Design College, and we chose cooking for the the visually impaired and the blind. And so what we ended up building was an inductive surface stove that had Mycroft incorporated into it, so along with some sensor systems. So as you walked up to the stove, a proximity sensor from TI would trigger and the Mycroft unit would recognize it. It would warn you if it was active.
It would speak up and say, be careful. There are hot pans on the stove. And you could inquire about which burners are occupied. You could set things like, you know, set the temperature. Those touch surfaces are great if you can see the lights indicating what's going on. But if you're blind, that doesn't really do you that much good. And so being able to interact with that using your voice was pretty cool. And then we also prototyped some concepts of, you know, imagine trying to pour milk into a measuring cup for a recipe if you're blind and get that right is really difficult. So what we were looking at is you just on while it's on the stove, you would say to the device, you know, Mycroft, I need to add a I need to add a, half cup of milk to this recipe. And it would look up the density of milk, and it would 0 out a scale built into the burner, measure the weight of the pot and the content at that moment. And you start pouring milk in, and it would say stop when it reached the weight that meant you'd put in a half cup of milk. And, some of those things, to be honest, I would love to have in my own stove, not just, for the blind. I think it'd be pretty cool to be able to just pour things in until I've got the right amount. So that was a unique case. You know, I talked about enclosures. That was an unusual enclosure. It's not a watch or or anything like that. It's built into imply appliances and and really seeing how far we could take the ideas of of smart appliances. And I think the last 1 that really has surprised me in a good way, there were a few this was built on 1 of those pieces of technology that we developed. So the Mimic text to speech, as I said, it is fully encapsulated text to speak to speech engine that anyone can make use of. And, received an email from a gentleman, Kendall Clark, who runs a distro of, of Linux called Sonar GNULinux, and it's specifically designed for the visually impaired and, enhancing the Linux desktop experience for people who are either blind or who have issues with other issues with vision, color blindness, things like that. So he had taken Mimic and was incorporating it into the tools that they were using there.
You know, he was he wrote us so excited that it was such a such a a much more friendly voice than anything that he was able to find and just thrilled to know that he could use it and incorporate it into, his project. Those are the things that I that really get me going about what we're building here is seeing where other people are bringing it in and and, you know, honestly, extending technology to people who have been left out of technology up until this point.
[00:59:00] Unknown:
And what are some of the long term goals, both for the Mycroft open source project and the business that you're using as a vehicle to power it?
[00:59:09] Unknown:
But as a business, this is really our only interest is natural language, voice processing. That's that's what we're here for. I don't expect that we're going to diverge into self driving cars or anything like that. So long term, what we're really focused on is how we can expand this technology from where we stand today to where it is available throughout your life. And if you think about the way you interact with any voice technologies today, it's a very fragmented system. Your today, you have you can say things to your your Android phone, and you can make things happen with your Android phone. Then you walk over to, you know, your desktop machine at work and it has Cortana. And you can say things to that too, but they're different things. And, you know, maybe it maybe they do the same things, but there's different syntax or different speech or or you have to set them up independently, you know, configure them both. That's kind of a pain. But then you go to your car, and maybe your car has Ford SYNC on it. Well, it has another subset of things you can do, which is probably not all the things you can do on all these other devices. And you probably you know, the results are a little bit different there too.
And same with your Apple devices. Again, there's this fragmentation that's occurred in the voice industry, and I think it's counter to the entire goal of voice interaction is that it's supposed to be natural and easy and follow around doing what you wanted to do. So when I say what's on my calendar, I wanna know what's on my calendar. I don't want to have to explain, you know, what's on my Google Calendar at steve.pinrod or anything like that. That's that's no longer a useful voice interaction. You know, I want to be able to talk to it like I talk to a person. Quick and easy, and it gets the context that's associated with me and with the location where I am. And so that's where we're looking to carry this technology, where you're able to the system is able to assist you throughout your life, throughout your all the environments that you're interacting with. And, as you when I say it that way at first, it starts, you know, that's where the creep factor comes in. And and, again, that's the value of doing it through an open source system.
We can build this in such a way that people have confidence in what we're doing is not being used in a way that they don't intend us to use it. And so that is our goal is to carry this technology to its, you know, the nth degree, making it available to people and allow and do it in such a way that we retain trust throughout that process.
[01:01:48] Unknown:
So are there any other topics that you think we should cover before we close out?
[01:01:53] Unknown:
No. I think that was pretty good.
[01:01:56] Unknown:
So for anybody who wants to keep in touch with you or follow what you're up to and what's going on with Mycroft, what's the best way for them to do that?
[01:02:04] Unknown:
The best way to track what's going on is that you go to the website mycroft dotai, and that has links to blogs and connections. We have a Slack channel that's mainly developers that are sitting in there, but anyone is invited to join in. If you don't have developer credibility, you can assist in other ways, concepts, translations. There's just lots of things that people can can do to kick in, but you can get to all of those things via the our website, mycroft dotai.
[01:02:33] Unknown:
So with that, I'll move us into the picks. My picks today are, first off, the YIP project, which is a front end to PIP, which adds some of the sort of aesthetic of the Yower project from Arch, which is the 1 of the front ends to the Arch user repository. So it just makes it a little easier for searching for project for Python libraries and getting some information about them before you actually install it. And my next pick is the Myths and Legends podcast, which is a podcast of a gentleman who explores some of the various myths and legends from, folklore around the world and tries to expose either versions that you may not be as familiar with or stories that you haven't heard before. And he does a good job of presenting them in an entertaining and sometimes amusing manner. So I definitely recommend people check that out if they're at all interested in some of those aspects of mythology and the different cultures that gave rise to them. And with that, I'll pass it to you, Steve. Do you have any thoughts today?
[01:03:34] Unknown:
Sure. Sure. So my my life at the moment is boiled down to sort of code and and food. So I'm gonna focus on the food side of things. There are a couple of couple of picks would be, first 1 is just Ethiopian. If you haven't had Ethiopian food, you ought to give it a shot. Right? It's getting bigger. You can find an Ethiopian restaurant in most, most cities. Blue Isle in Kansas City is pretty darn good. And then the other 1, as a Kansas City boy, as I've been traveling around the country, I appreciate the barbecue that we have in Kansas City. I like trying it elsewhere, but I still like Kansas City's best. And, my favorite by far right now is Joe's KC.
I'm not the only 1. Anthony Bourdain put it on the list of places you have to eat before you die. And, yeah, I I've been to that 1. That's the only 1 I've been to so far. I I highly recommend it.
[01:04:26] Unknown:
Excellent. Well, I really appreciate you taking the time out of your day to tell us all about Mycroft. It's definitely a very interesting project and 1 that I'll be keeping a close eye on as it continues to develop new capabilities. So thank you for that. Been my pleasure, and,
[01:04:41] Unknown:
we definitely the more the merrier joining into our project.
Introduction and Guest Introduction
Steve Penrod's Journey with Python
Origins of Mycroft
Merging Two Projects
Voice Interaction Technology Evolution
Mycroft's Architecture and Skills
Text-to-Speech and Mimic
Privacy Concerns and Solutions
Artificial Intelligence and Intent Parsing
Running Mycroft on Various Devices
Open Source Business Challenges
Extending Mycroft and Community Contributions
Interesting Applications of Mycroft
Long-term Goals for Mycroft
How to Get Involved with Mycroft
Picks and Recommendations