Summary
Being able to control a computer with your voice has rapidly moved from science fiction to science fact. Unfortunately, the majority of platforms that have been made available to consumers are controlled by large organizations with little incentive to respect users’ privacy. The team at Snips are building a platform that runs entirely off-line and on-device so that your information is always in your control. In this episode Adrien Ball explains how the Snips architecture works, the challenges of building a speech recognition and natural language understanding toolchain that works on limited resources, and how they are tackling issues around usability for casual consumers. If you have been interested in taking advantage of personal voice assistants, but wary of using commercially available options, this is definitely worth a listen.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Adrien Ball about SNIPS, a set of technologies to make voice controlled systems that respect user’s privacy
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by explaining what the Snips is and how it got started?
- For someone who wants to use Snips can you talk through the onboarding proces?
- One of the interesting features of your platform is the option for automated training data generation. Can you explain how that works?
- Can you describe the overall architecture of the Snips platform and how it has evolved since you first began working on it?
- Two of the main components that can be used independently are the ASR (Automated Speech Recognition) and NLU (Natural Language Understanding) engines. Each of those have a number of competitors in the market, both open source and commercial. How would you describe your overall position in the market for each of those projects?
- I know that one of the biggest challenges in conversational interfaces is maintaining context for multi-step interactions. How is that handled in Snips?
- For the NLU engine, you recently ported it from Python to Rust. What was your motivation for doing so and how would you characterize your experience between the two languages?
- Are you continuing to maintain both implementations and if so how are you maintaining feature parity?
- How do you approach the overall usability and user experience, particularly for non-technical end users?
- How is discoverability handled (e.g. finding out what capabilities/skills are available)
- One of the compelling aspects of Snips is the ability to deploy to a wide variety of devices, including offline support. Can you talk through that deployment process, both from a user perspective and how it is implemented under the covers?
- What is involved in updating deployed models and keeping track of which versions are deployed to which devices?
- What is involved in adding new capabilities or integrations to the Snips platform?
- What are the limitations of running everything offline and on-device?
- When is Snips the wrong choice?
- In the process of building and maintaining the various components of Snips, what have been some of the most useful/interesting/unexpected lessons that you have learned?
- What have been the most challenging aspects?
- What are some of the most interesting/innovative/unexpected ways that you have seen the Snips technologies used?
- What is in store for the future of Snips?
Keep In Touch
- adrienball on GitHub
- @adrien_ball on Medium
- @adrien_ball on Twitter
Picks
- Tobias
- Adrien
Links
- Snips
- 2048 Game
- Smart Cities
- Raspberry Pi
- WikiData
- MQTT
- Google Assistant
- Amazon Alexa
- Microsoft Cortana
- Mozilla Common Voice
- Rust Language
- Snips Hermes messaging library
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you get everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models or building your CI pipeline, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode.
That's l I n o d e. Today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. Bots and automation are taking over whole categories of online interaction. Discover.bot is an online community designed to serve as a platform agnostic digital space for bot developers and enthusiasts of all skill levels to learn from 1 another, share their stories, and move the conversation forward together. They regularly publish guides and resources to help you learn about topics such as bot development, using them for business, and the latest in chatbot news. For newcomers to the space, they have a beginner's guide that will teach you the basics of how bots work, what they can do, and where they are developed and published.
And to help you choose the right framework and avoid the confusion about which NLU features and platform APIs you will need, they have compiled a list of the major options and how they compare. Go to python podcast.com/ discoverbot today to get started and thank them for their support of the show. And you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media, DataVarsity, and the Open Data Science Conference.
Go to python podcast.com/conferences to learn more and to take advantage of our partner discounts when you register. And visit the site at python podcast.com to subscribe to the show, sign up for the newsletter, and read the show notes. And to help other people find the show, please leave a review on iTunes and tell your friends and coworkers. Your host as usual is Tobias Macy. And today, I'm interviewing Adrian Ball about SNPs, a set of technologies to make voice controlled systems that respect users privacy. So, Adrian, could you start by introducing yourself? Yes. Of course. So
[00:02:38] Unknown:
I'm Adrian Baul. I'm a French data scientist and machine learning expert. I'm 29 years old, and I've been working at Snipps for the past 4 years where I've been building a voice assistant technology focusing on the natural language understanding part of it. So my academic background is about applied mathematics and computer science, a mixture of which back then was not yet called data science. So I love playing with data, trying to extract information from it. And for this very reason, I've been a Python fan for a while now. And more recently, I jumped into Rust, and I've been enjoying it a lot. And do you remember how you first got introduced to Python?
So it's hard to remember exactly when it started, but, I remember what really convinced me that it was very nice language. And I think that was about 5 or 6 years ago when a mobile game named 2, 048 became viral. So this game was, 4 by 4 puzzle where you had to merge squares containing identical numbers, which were powers of 2, to form a new square, which value was equal to the sum of the 2 squares. So this game was very popular, and, I realized that the game was fun because the strategies were completely not obvious. So I decided that, I would try to build my own algorithm to see if it could beat me at this game. And so I I seized the opportunity to to discover a new language and, figured out that Python was apparently very well suited for that.
And so the algorithm I developed turned out to be actually much better than than me, at the game, as it's managed to to reach the the goal of 2, 048 at almost each run. So, yeah, that I think was, what's selling my my adventure with Python.
[00:04:25] Unknown:
And so you've been working at Snips for a little while now. So can you start by giving a bit of an overview about what the Snips project is and how it got started and how you got involved with it? Yes. So Snipps was created about 6 years ago,
[00:04:40] Unknown:
but back then it was more a machine learning consultancy, doing missions and building products for big companies around smart cities. And at some point, the founders realized that they wanted to build their own products and to start implementing the long term vision that they had in mind, which was to make technology disappear while ensuring privacy by design. And the idea behind this mission statement is that by leveraging artificial intelligence and contextual information in a privacy preserving manner, We will eventually make technology so intuitive that it would actually disappear in the background. So as a result, the beginning of Snipps as a product company, was about building a context aware technology that would remove frictions in our day to day use of technology while protecting privacy at the same time. And quickly, we realized that 1 of the biggest frictions was about the interactions between humans and machines. And it became clear that understanding natural language, both text and voice, was the first step towards making making technology disappear. So since then, that was a bit more than 2 years ago, the goal has been to build a voice interface technology, which consists in, 3 main components. You have a wake word engine, which is the Snipps equivalent of the Okay Google or Alexa that you probably know.
So this component is responsible for, detecting when a user wants to start interacting with a device. Then, you have what we call the automatic speech recognition engine. This component is responsible for processing the the audio from the user and converts the speech into text. And, finally, the last component that I've, been working on is the natural language understanding component, which extracts the meaning from the text, given by the ASR. So these 3 components are the the basic, the basics of what makes the the Snipps platform. So we've been, on on par on the parallel side, we've been shipping a web console that you can use to build your assistant before download it. I mean, you can train it on the web console and then download it, embed it in, in several devices that that are supported.
[00:07:06] Unknown:
And so for somebody who's interested in getting started with the Snips platform, can you just talk through the overall onboarding process of going from starting with setting it up and training it and then getting it deployed onto a device? Yeah. So
[00:07:22] Unknown:
the first step is always the same. You start by going on the Snipps, web console. Here, you you will have to create your assistant. And based on your use case, you will define a list of intents, which correspond to actions that you you want to capabilities that you want to cover. You also have the choice of using already built applications. So we have what we call an app store, which contains applications made by other developers that are, that can be used off the shelves. So once you have created your assistant and defined all the intents that that are in it, you you will have to train it on the console.
And then you have the ability to to test it in order to see if it works correctly and if it covers what you want it to cover. So in this process, usually, you will make some iterations and, adapt a bit your your ontology, to to make it better and better at understanding what you what you want. And once you are happy with, the the the results, you will actually move to the deployment stage. So you have several different options. You can either do it manually, like, you have the options to download your assistant. And then, from the the file you get, you will just copy paste it into the device that you target and follow the instructions.
So for instance, if we take the the Raspberry Pi, which is the the device the device that we generally use for our for our showcase, You have, a companion app that you can use from your your laptop and which will help you directly move your SD stand that you trained on the console to your Raspberry Pi. So all you need is just a Raspberry Pi with a microphone and optionally a speaker if you want to have audio feedback. And then then that's it. You you if you if you already have written your action code on the console, it will walk out of the out of the out of the shelf. Otherwise, you have to, yeah, to to write it, on your on your side.
And the action code is really the parts that make the connection between the outputs of, what Snipps has processed and, what you want to to give back to the user. So for instance, if you want to if you are building a weather assistance, Snipps will provide you the query of the user containing, for instance, the time range, the dates where the user is interested to to have the weather, and also the locations, for instance. And what will be left on your side is to plug the the weather API that you want to use and to give an audio feedback to the to the user. But, fortunately, developers have already built such use case for you on the on the app store, so you can already use that, as is, and it will work directly.
[00:10:27] Unknown:
And when I was looking through the documentation, I thought it was interesting that you have this option for automatically generating training data in order to fit a number of different potential phrasings for a given action that you want to achieve. So I'm curious if you can just talk through a bit about how that training data generation functions.
[00:10:46] Unknown:
Yes. So a bit of context before that. So as we do not collect any user data as opposed to our cloud based competitors, we indeed offer a way to improve the recognition accuracy of, of your assistant in the form of data generation. So this, actually consists in 2 different things. The first 1 that you may not have seen, I don't know, is, what we call entity extension. So, essentially, this helps you enrich the vocabulary of an entity by automatically finding similar words. And this leverages, external databases such as Wikidata. So you can think of of, the example where you want to build an application for smart lights, and you don't you don't know in advance, what will be the color, of the lights that are spotted. So what you wanna do is just cover as many colors as you want. And using the entity extension tool, you'll be able to just define a few colors, and it will automatically suggest many other colors. So the second thing that you mentioned before is, indeed what what we call automatic query generation or the data generation in a simpler way. The idea behind that is that the more utterances you provide, the more diverse they are, and the better will be the recognition performances of your assistant. So the data generation tools that we've built help you generate new formulations automatically with a meta crowdsourcing platform that starts by generating tasks automatically based on your your the the utterances that you have already provided.
It will deploy them to the right crowdsourcing platform, gather the results, and then post process these results with semi supervised machine learning algorithms in order to clean to clean up everything. So, basically, you can collect in a few hours or days enough data to reach, the NLU performance plateau. So the the key here is to leverage crowdsourcing platforms. But it's not as easy as it may sound because you you get, in this process a lot of noise. So we have, as I mentioned, we have a step of cleaning, which is made automatically and, in which you you essentially, used the the natural language understanding component to to make some, partial trainings on the data and find potential errors and do a kind of automatic cleaning in that way. And at the end, you get a a list of, utterances that should actually be, good enough so that it will improve, your your assistance.
[00:13:37] Unknown:
And so as far as the overall architecture of the platform, you mentioned that there's the automated speech recognition and the natural language understanding and then the actions capabilities. So I'm wondering if you can just talk through a bit of the overall architecture of how that fits together and what the end to end interaction looks like as it flows through the system and particularly how you manage to fit everything both size wise and functionality wise onto a device for enabling this offline capability?
[00:14:09] Unknown:
Yes. So the Snipps platform relies on multiple components, as you mentioned, which communicates with each other through a messaging protocol, which is called MQTT. So, essentially, each component can react to messages and then send send back also, messages based on the processing that it can make. So 1 of the components is the dialogue manager, and this component is responsible for orchestrating the processing flow. So, typically, this is the component which will toggle on and off the other components at the right time. So the typical, flow that a user will trigger when interacting with the Snipps platform is that, so initially, you have the white cord component, which is always listening. That's the purpose of of this component. The other components are kind of idle. Then you start by saying something like hey, Snips. This will trigger the wake word. The wake word will recognize it will recognize that you have said this specific word. The dialect manager will then activate the automatic speech recognition engine and start listening for what you have to say. Once so the then the ASR engine will detect when you stop talking, which is called end pointing. So at some point, you will finish your sentence.
In parallel, it will already have started to decode what you said. And at the end of your sentence, it will send back, using MQTT, it will send back your decoded sentence to the dialogue manager, which will hand it over to the last component, which is the natural language understanding component. And this 1 will try to extract the the intents corresponding to your query. So it could be turning on the light or asking for the weather and also the parameters of your query. And, and this is the final result that will be sent to to through MQTT to the to the action code. And the action code is actually the responsibility of the developer. So you will have, as a developer, to decide on how you want to handle this final information and move forward with it. And so as far as the automated speech recognition
[00:16:32] Unknown:
and natural language understanding components in particular, I know that there are a number of other competitors in the market, both commercial and open source. And I'm wondering if you can characterize your overall position in the market as far as how you compare to some of the leading competitors.
[00:16:48] Unknown:
Yes. So I think that the the biggest difference between, us and what is available available in the market is that, our focus is on having lightweight solutions that can run anywhere. So there are 2 aspects to take into consideration when comparing what we do with the competition. And the first 1, linked to what I just said, is the offline capabilities. So the the the landscape is divided between cloud based competitors and non cloud based competitors. And the first category is actually much bigger than the second 1. So today, white label, it would be large large vocabulary, ASR engines run-in the cloud. So that's typically Google, Nuance, Houdify, Cortana, and so on. On our side, we believe that there is a huge value in running locally because by providing such a technology, we bring privacy by design as well as support for offline use cases.
And these are things that cannot be claimed by a cloud based player. So, the second aspect is, of course, regarding the recognition capabilities. So, for us, the metrics which makes sense is the end to end accuracy. So, this directly captures, the proportion of voice interactions that work perfectly. And in the end, as a user, that is what really matters. So for this reason, we are not focusing on the word error rate when assessing the quality of our speech to text engine. And so, a few months ago, we we benchmarked speech to meaning pipeline. So that means the the whole pipeline from ASR, to NLU.
So we benchmarked it against other competitors such as Google on datasets, that we made available publicly. These datasets were actually corresponding to very complex use cases, for instance, music assistance, because you have a very open vocabulary. And so, when, the outcome of these benchmarks were that, the performance of the Snipps Voice platform were actually higher or on par with Google's cloud based services. So the key to reach such performances is really in the way our ASR engine works. The the thing is instead of building a generic, speech to text engine capable of understanding anything you would say, provided it's proper language, which is what Google and and others are doing. We made an ASR engine which can be trained to understand the domain and vocabulary that you are interested in. So it is customized for your use case. And the idea behind that is that if you reduce the scope and the mission of the ASR engine to what is strictly necessary, you can reach very good performances.
So this is called language model adaptation. On the NLU side, we also benchmarked our solution against the main alternative, which were both cloud based and also non kind based, like, open source alternatives. And we found that we were on par with, with the rest of the competitions. If you look at, only the NLU, the the performances are quite close from, from each others and, quite close from the state of the art, actually. There is actually a last aspect to take into consideration, which is the the footprint in terms of, yeah, more related to to hardware.
So this is only relevant for our non class based solutions, like us. So I'm talking about, memory footprint and size as well as speed. So I don't have all the number in mind, but, as we are targeted as we are targeting embedded devices with limited resources, we we have optimized as much as we can the components so that they could run with with a very, very small footprint. And in terms of speed, we have a big advantage compared to cloud based solutions as we save, a round trip at least of potentially several 100 milliseconds, simply by by running locally.
[00:20:57] Unknown:
And as far as cloud based solutions in particular, I know that they leverage the fact that they're getting a huge amount of data from multiple users in order to be able to extend their voice training model to try and improve their overall accuracy. And as you mentioned, you're focusing more on just being able to train each individual agent to be able to handle the specific set of vocabulary terms and the specific set of languages that they're actually going to be used for. But I'm wondering how you managed to gather feedback and maybe additional training data to improve the capabilities of your engine as you go forward where you don't have this capability of being able to leverage multiple users and, massive amounts of data? So
[00:21:42] Unknown:
regarding the the ASR engine, the way we work is that we train for each language. We train each ASR engine on about, a 1000 hours of, transcripted audio. So I think that's not a real issue on the ASR parts because you already have a lot of of, data available publicly. You can't just think of audio subtitles, which are, like, transcript of of audio that you get almost for free. But, of course, you have you have data which is much much cleaner than that. So for the the the acoustic modeling parts, we don't really need to collect, users' data to improve our model as we can use publicly available data. So, for the NLU parts, essentially, the way it works is that we provide this data generation tool that you can leverage to increase the amount of data that you will use for training. But, again, the idea is that we don't plan to to cover general use cases, like, very generic queries. We want to tackle specific, assistance use cases. So it's a bit different because when you when you create your assistance with Snipps, you already know what you want to cover. So it's it's much simpler to describe the data, and you can still iterate on it back and forth into iterations on the console and adding new utterances.
But, again, we do not prevent you as a as a user of Snipps. We do not prevent you from using the data as you want. It's just that what we want to ensure is that our solution, is completely off can run completely offline and and be private by design. You are free to use the output of our, our system, however you want. But in practice, on the benchmarks that we've run, we've seen that we reach performances, as good, even better as, as competitors without collecting users' data. So that make us think that it's not it's not necessary, for what we are targeting. And have you been able to take advantage of things like the Mozilla, common voice project as additional sources of data for training your agents on? I'm not sure about that. I've I mean, I've heard about, this project, and I think that the, yeah, the ASR team, probably looked into it.
I cannot, answer very precisely on that actually. And
[00:24:08] Unknown:
from the conversational UI perspective, I know that 1 of the biggest challenges overall is the ability to handle multi turn conversations or being able to maintain context as you're, interacting with the agent. So being able to do things like what's the weather in Paris tomorrow, and then getting a response, and then being able to say, well, what about the next day and have the agent understand what it is that you're talking about. So I'm curious
[00:24:36] Unknown:
how you have approached that problem and how well you're handling it. So for now, we we focus on giving full control to the developer, with a lot of flexibility on the dialogue. So there is a mechanism of sessions, which allows to keep track of the different steps of an interaction. But again, the the goal is really to provide as much as we can to the developer, so that he's not limited in terms of, of the dialogue flow that he that he wants to support. But in terms of, yeah, conversation and and context, for now, we we have, we stopped a bit earlier. We just give you the tools so that you can implement the dialogue flow as you want. The reason for that is, mostly because it's it's quite complex, and, it's quite easy to make, wrong choices about that. So we prefer to focus for now on on on really, building the having the the building blocks
[00:25:33] Unknown:
that very I mean, having them very robust so that you can build on top of them. But yeah. And reaching the conversational capabilities is definitely on the on our roadmap for the, yeah, for the future. And another aspect of the NLU engine that I came across in 1 of your post recently is the fact that as you mentioned, you recently ported it from Python to Rust. So I'm curious what your overall experience has been in the process of doing that both from the technical perspective of being able to maintain feature parity and also your overall experience of Rust as it compares to Python.
[00:26:09] Unknown:
So yeah. So the main motivation for that was, portability. The the objective of the Snipps platform is to run on as many devices as possible, and Python was not a good choice in this regard. So I'm not sure, actually, you can if you can run Python code easily on iOS nor Android, for instance. So we decided to choose Rust as it is a model language, which, offers, high performance and low memory overhead. And you also have a memory safety and a cross compilation, as first class citizens. So there is no overhead, in terms of performances compared to to c. So I would say that coding in Rust has been an extremely rewarding journey, which doesn't mean it was easy. It's quite the opposite actually. But in many aspects, Rust is the opposite of Python. And yet, many people in the Rust community are also Python programmers. I think that learning Rust is much more than learning yet another language, as it forces you to understand a bit more what you do and how you manipulate the objects. It can be a bit frustrating at the beginning, but, in the end, you understand why it's been designed the way it is. But we did not port the whole Python code to Rust, as we actually did not need to.
The way it works is that we use our Python library for training, and we use the Rust library for the Inference parts, which is the 1 running on device. The reason for that is that Python is still, extremely valuable for, for what we do as, its machine learning access ecosystem is amazing, and, it allows fast prototyping. So it's 2 things that Suresk cannot offer at the moment. So that means we we had to maintain 2 codebase, 1 in Python and 1 in Rust. And we had to build a serialization, interfaces a serialization interface between these 2 libraries. But so initially, the inference part of the Python pipeline was missing some bits compared to the Rust equivalent.
And the reason was that it was, the Rust library which was used in the platform. So we did not really need to have the the full implementation on the Python side. But, following our choice to open source the Python library, we decided to implement the missing pieces in the Python lib, as we anticipated that users of the Python lib may not want to use the the Rust library in conjunction. So now, as a result, we are maintaining 2 Inference pipelines. And I have to admit that it has been it has been a bit painful, but we built some automated tools that help us ensure feature parity between the the 2 implementations.
And, on the other hand, as our technology is becoming more and more mature, maintaining, disparity has also become less and less complicated as everything is getting more stable and changes a bit less than it's used to change. We have had discussions regarding the possibility to port the training parts into Rust so that we could only keep the Rust implementation. The advantages are quite clear. Essentially, you have a unique pipeline to maintain and also, you you have the ability to to train to perform training of the NLU on on more devices. But the difficulties are quite clear as well.
Many Python dependencies that come for free on the Python pipeline would have to be reimplemented. So for this reason, we are not there yet, it's in our head, but it's something that needs to be thought quite carefully before we start working on that.
[00:30:00] Unknown:
And as you mentioned, the, Python implementation of the NLU engine and correspondingly the Rust 1 have been open sourced. So I'm wondering if you can just talk a bit about your overall open source strategy at Snips and which portions of the pipeline have been made publicly available and which ones you are continuing to maintain as proprietary, particularly given that I know that you have a commercial offering as well? The first component that was open sourced,
[00:30:29] Unknown:
was the NLU. So both the the training in Python and the rest implementation for the inference. We also, open sourced recently something called Hermes. So Hermes, the Hermes library, is actually responsible for exposing all the all the messages that, that the Snipps platform can, can actually send and receive so that, it's, it helps you write your action code properly. So this part was quite important for us to open source as it, it gives transparency on the on these objects, and it also makes the life of developers much easier when it comes to to writing action code. For the rest, I think that, 1 of the plan is to to open source the the part of the Snipps platform, which is not, related to to training. So, essentially, everything which runs locally, but, I'm not sure about the the the what is our strategy on, for instance, the training of the ASR and this kind of things. But I think what runs locally on your device, so this NIPS platform software that you install, is something that we want to open source, in order to allow, people to to know what they what they what they are actually running on their device, which is quite important in terms of transparency. The the strategy, behind the open sourcing of the NLU has been that, essentially, you already have competitors that, that do, NLU either on the on the cloud or others are already open sourced.
So we knew that it was not a real problem to disclose our technology. And, also, we we had committed to transparency and and and building a privacy preserving, technology. As as part of that, it it made sense to to expose, and to make our code base, public so that people would would actually could actually trust us instead instead of just believe what we what we were saying. I think the open sourcing of the of our natural language and understanding library has been a success. We've had, we have now many many users using it. And, it's also a way to to improve your your your library. You have feedbacks from from, from users. You have people making, I mean, creating issues, reporting bugs.
In the end, what you what you gain, in that case is probably, totally worth it compared to, yeah, disclosing some propriety, yeah, code code base. It's not necessarily the case for everything that we do. But at least for the NLU, it was, it was the case, and it's probably also true for the core code base of these teams platforms.
[00:33:42] Unknown:
And as far as the overall usability and user experience aspect of the system, I'm wondering how you manage to simplify that particularly for nontechnical end users and how you approach the discoverability problem of letting somebody understand what capabilities are enabled on a given Snips agent?
[00:34:05] Unknown:
So yeah. So far, we have, around 26 1, 000 developers that have been experiencing with Snipps. And I think that the console makes it quite accessible to to define and train an assistant, even for someone non technical. Then you have the deployment process, which can require a bit more skills. But the thing is, thanks to the Snipps app store, if you want to create an assistant for a kind of standard use case, like, I don't know, smart lights assistance or building a weather assistant. All of these apps already exist on the Snipps app store. So all you need to have all all you need to do is just, pick these apps on the the App Store and install them on your assistant.
And they already come with the action code, which means you you won't have to write a single line of code. All you need to do in that case is just follow the the the documentation on, which will guide you through deploying your your assistant on something like a Raspberry Pi for instance. So I would say that if you want to play with voice technology without any technical background, it's definitely possible, and I'm not sure it's actually so complicated if you stay on standard use cases, which are already covered by what the community has done. If you want to do specific things, then you have to, yeah, put your hands a bit more in the in the in the code.
But, yeah, I think that Snipp stays quite user friendly, on this perspective.
[00:35:58] Unknown:
And for somebody who wants to add new capabilities on the action side, what's the overall process of implementing new capabilities and deploying it as an option in the Snipps marketplace? And also given the fact that the communication protocol is based on MQTT, I'm assuming that the implementation language is up to whomever
[00:36:20] Unknown:
is actually creating it. Yeah. So as a developer, your interface with Sniff, is the the web console. So this is where you you define the capabilities of of your assistant. So either you are, working for your assistant or, you you may want to just create an app and publish it on on the on the store. So if you if you're working on a personal assistant and you want to update and change its capabilities, what you wanna do is just, add new apps to it, through the console, and then, and then deploy that to your device. So it can be done either manually by just replacing a resource directory on your device by the new 1 that you will download from the from the console, or, automatically on certain devices such as the Raspberry Pi by using the the companion app that we built.
On the other hand, if you want to to publish, an app for a new capability, which doesn't exist yet, what you can, do is so you can, of course, define your intents, your ontology for the app, but you can also define the action code directly in the console. So that allows users of your app to to to use it directly without having to do the extra mile of of coding the the the action code. I think right now, the what we support natively is, action code written in either a Python or JavaScript. So these 2 language, if you plan to use these 2 language, it will definitely be quite easy to do that directly on the console. For the for for the other languages, I think it's, it would it's possible, but it's you will probably have to do it, directly on your device. I mean, you won't be able to use the the console for that. But, but it's, it's definitely doable because of the the fact that we use NQTT, and it's not it's not language specific.
You just have to to catch the the the messages that are sent in, in I mean, using this messaging protocol and and react to them. So, yeah, that's that's that's actually a choice that we made, very early, in the development of the platform. And that has proven to be a a very good choice even though, yeah, some people want to use their their their own brokers and so on for the the messaging, protocol. It's been quite, stable, and, I think people are are happy with with, with that.
[00:39:17] Unknown:
And focusing on an offline first deployment capability is beneficial particularly from the privacy preserving aspect and allowing you to reach markets that would other otherwise be unable to use a voice platform because of requirements for Internet conduct connectivity, which, despite people liking to think so, isn't yet globally available. So on the other side, I'm wondering what you have found to be the limitations of running everything offline and on device, and any cases where you think that Snips is the wrong choice for a voice platform.
[00:39:52] Unknown:
So compared to running on the cloud, yeah, indeed, running on device comes with several challenges. So first, the the models that are used in the core components must be small enough to fit on the device, and then the algorithms that are used, must run on on, on these devices. So this restricts, the language that you can use, for instance. So that's that's what I mentioned earlier when I said that, Python, was not suitable for the the inference part as, it would prevent us from running on several devices such as, yeah, smartphone, like, with iOS or Android.
So that's, first, restrictions. But I think, yeah, the the biggest limitation is is from the yeah, from an engineering point of view. You you have your public API is is more complicated than a than a simple REST API as you need to ensure compatibility between, on 1 side, the Snipps platform software, that you that is installed on on on your device. And on the other side, the models trained on the web console, which are then downloaded and and are supposed to be loaded by the software. So this, interface is actually quite tricky to maintain and to to version. But yeah, as you said, on the on the on the other hand, running on devices comes with many advantages.
And on top of, the the privacy preserving feature, it also opens the way to very interesting opportunities to completely transform interactions. So 1 thing that we are very excited about is the possibility to run, an always on speech to text engine. So this, if you think about it, it's completely impossible if you if you are a cloud based solution. You cannot imagine that you would you will tell your user that, you have something which is constantly listening to you, and not only the wake up, but everything, that you say. And this would be streamed automatically to your servers and processed and and so on. But if you run locally and if if everything is is local, that's not a problem. So you can start having a speech to text engine, which runs locally which runs constantly.
And this could potentially remove the need for the wake word, meaning that you would be able to express queries, in a more natural manner without having to explicitly trigger, the the wake word to start interaction. And I think this is only made possible, through, the fact that we run offline and on device. And so, yeah, regarding the second part of your question about when is Snipps the wrong choice. I think the answer is dictation. And, more generally, all the situations where there is no context and the user could be talking about strictly anything. So this is due to how we how we approach ASR, by customizing it to your to your use case. If you cannot define clearly the use case, then the Snipps ASR won't be good at that. But, at the same time, all voice solutions today are deceiving in that respect, which is why we chose to focus on verticalized assistance, in which, there is 1 or several defined domains of interest, even if there are large vocabulary.
[00:44:00] Unknown:
And as far as your experience of working with Snips, I'm wondering what you have found to be some of the most interesting or useful or unexpected lessons that you've learned in the process and some of the biggest challenges that you've been faced with?
[00:44:16] Unknown:
So I think 1 of the most exciting parts, of what I've been working on is actually the when we open sourced the NLU library, powering the powering the Steam platform. That was my first experience in open sourcing a Python library. And when you open source your code, at the beginning, it feels like a lot of work because you have to polish everything, write a bit of documentation, make sure your APIs are clear and easy to use. Yeah. All that stuff. But the thing is once you've done that, you realize you should have done it right from the beginning as it makes your life so much easier. So what appeared to me as a constraint in the beginning ended up being a way to improve our productivity and our code base.
And when you help other people jump into your code base and make contributions easy, you are also, targeting yourself as well in the process. This is, something which is quite obvious when you jump into code, issues or pull requests that were created a while back. At this point, you really enjoy and benefit from the fact that back then, you took the time to do things correctly. So I learned many good practices in this process, and I'm now applying them to every new project I start either closed source or open source. I think I I would I would sum up the lesson I learned, like, you should write code as if it was open source.
And so regarding the the the most challenging aspects, I think that building a robust, stable, open source, and then your pipeline that interoperates between Python and Rust is definitely the most challenging thing that we've, we've had to tackle. Having to maintain a serialization API between 2 languages was something completely new, for me. Like, figuring out how to version this API was not clear at the beginning. And then, as snips started to become more and more mature, we were asked to make as little breaking changes as possible. This is much easier when your API consists in public functions or classes or maintaining a REST API.
But when you need to ensure that models that were trained few months ago using an old version of the library could still be loaded and run with the current version, then it's a different story. But the good part is that, again, you you learn many things tackling this challenge.
[00:46:44] Unknown:
And what have been some of the most interesting or innovative or unexpected ways that you've seen the various technologies that Snipps used by end users?
[00:46:54] Unknown:
So recently, we we launched a competition on Hackster, where we we we asked our community to come come up with, with projects using, using Snipps technology. And so this competition is now over. And 1 of the the some of the winning projects have been really really fun and unexpected, I would say. So 1 of them was a project that was, called, Sleep Smile Flower, and it consisted in building a voice assistant that would help taking care of plants by performing some tasks such as watering the plants, emptying its tank, and checking, for instance, the moisture level.
There was also another project that was quite interesting and could be very useful, which consisted in a in a voice assistant that brings, an easy way to request, help in emergency situations for older people or impaired people living alone. So it it consists in several satellites placed in your home. And in case of emergency, you can ask your assistant to call someone. And, yeah, Personally, I'm I'm also using, Snipps at home for a very simple task, which is, switching, the switching on and off the lights of, in my bedroom. And I'm doing that using 1 feature that we did not mention before, which is the ability to customise a wake word.
So that what I've done is simply create 2 different wake calls. 1 for the turning the light on and the other 1 for turning the light off. And that allows me to simply say a word to turn the light on and say another word, to turn the light off. And I don't have to to to first trigger the the wake word and then, perform a query as the wake word is also already the the the trigger.
[00:49:04] Unknown:
And so what is in store for the future of Snips, and what are some things that people can keep an eye out for?
[00:49:11] Unknown:
So first, we want to scale what we have. So, we want to go from 6 geographies to 42 over the next, 2 years, I think. So essentially covering, all these new languages. So it's it's 1 it's kind of task that, requires a lot of defaults, but, that we know is, is required by many businesses. On the more machine learning and technical parts, there's something that we will probably ship during the summer, which is what we call speaker identification. The idea is that, we we are building, new components that will be able to identify the person who is speaking and giving giving him or her an idea an ID.
So you have a you have a quick onboarding where you have to say each user has to say a few utterances. And then, once, once you're done, each time someone, interacts with the assistance, the the the system, will recognize, who it is and will provide this information to the action code, making the interaction even more personalized because you can then react differently to the same query, if it's, if it's, someone or or or some other person. Then, as I mentioned a bit before, we wanna pioneer in building the next generation of voice interfaces by removing the the wake word, which is, possible as long as you as you you run locally.
This is a bit more long term as it raises many technical challenges. And also, running the ASR, having an ASR running all the time is a bit, consuming. So we'll have to to do, some optimization on what we we have at the moment. But, at the end, we believe this is the the way, the way to go in order to make the interactions, even more natural. Yeah. Yeah. Yeah. That's that's it. I think there are also many, many optimization coming for supporting, a new new new platforms, new devices, lowering the size of of all our models so they they can fit on even smaller, chips and so on.
And yeah.
[00:51:49] Unknown:
And are there any other aspects of the Snips platform and technologies or use cases that we didn't discuss yet that you'd like to cover before we close out the show?
[00:51:59] Unknown:
Yes. There is 1 thing I didn't mention which, which is quite useful and that we've developed quite recently, which is the ability to to have, what we call satellites. The idea is that you have a main device, powering the the Snipps platform, which has a bit of processing power. It can be a Raspberry Pi 3. And then, for instance, if you are trying to build a smart home assistant, you will you will have many different satellites in your in your home, in your home. And these satellites, will typically be much smaller devices such as the Raspberry Pi 0, for instance.
And they will be there, only to capture the audio and stream it to the to the base device. So this allows to cover, many many rooms without having to to duplicate your setup. And it also allows for a multi room, dialogues, so you you can have, like, several people in in in different rooms interacting, with Snipps with the same assistant. But because they are interacting with different satellites, the base device will know which which satellite is activated and it will know how to address, all these queries in parallel.
[00:53:25] Unknown:
Alright. Well, for anybody who wants to get in touch with you or follow along with the work that you're up to and what's been happening at Snips, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us on into the picks. And this week, I'm going to choose the Chrome OS operating system. I've been trying it out as a computing platform for when I'm out and about, and it's been coming a long ways lately. I particularly like the recent addition of Linux support. So, if you're looking for something that you can get some work done on some fairly inexpensive hardware, it's definitely worth taking a look at, some of the different Chromebooks that have come out. And so with that, I'll pass it to you, Adrian. Do you have any picks this week?
[00:54:11] Unknown:
Yes. My personal pick, is about privacy and what has been happening recently during the Google IO conference, and the Facebook FA 8 conference. Facebook said the future is private, while Google said the present is private. But in the end, both conferences show that big players are finally realizing that privacy matters. I'm actually proud that at Snipps, we have been engaged in protecting and advocating privacy as a human right since the very beginning of the existence of the company. And it is a bit ironical because before this privacy awakening, statement from the the GAFAs, they had been implicitly delivering a message, which was that AI products were made possible only by sharing and accessing users' data.
What we have been saying on our side for a while now is that privacy does not have to be traded, for the best AI technologies. And now, Google starts to ship products that run offline, proving that privacy can be respected. And this is becoming more and more important as the kind of data which is collected is becoming more and more personal. Think about voice, for example. So I think that, as users, as users of these technologies and by the choices we make, we have the power to influence and drive these technologies in a direction where our digital life and our privacy are protected.
[00:55:43] Unknown:
Alright. Well, thank you very much for taking the time today to join me and discuss the work that you've been doing with Snips. It's definitely a very interesting platform and 1 that I plan to experiment with on my own. So thank you for all of your efforts there, and I hope you enjoy the rest of your day. Thank you for inviting me.
Introduction to Adrian Ball and Snips
Adrian's Journey with Python and Rust
Overview of Snips and Its Mission
Getting Started with Snips
Training Data Generation and Entity Extension
Architecture and Offline Capabilities of Snips
Comparison with Competitors
Gathering Feedback and Training Data
Conversational UI and Context Management
Porting NLU Engine from Python to Rust
Open Source Strategy and Components
Usability and User Experience
Adding New Capabilities and Action Code
Limitations of Offline and On-Device Processing
Lessons Learned and Challenges
Innovative Uses of Snips
Future of Snips
Satellite Devices and Multi-Room Dialogues