Building A Privacy Preserving Voice Assistant

Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great.

When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode.

With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you get everything you need to scale up.

And for your tasks that need fast computation, such as training machine learning models or building your CI pipeline, they just launched dedicated CPU instances.

Go to pythonpodcast.com/linode.

That's l I n o d e. Today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show.

Bots and automation are taking over whole categories of online interaction.

Discover.bot

is an online community designed to serve as a platform agnostic digital space for bot developers and enthusiasts of all skill levels to learn from 1 another, share their stories, and move the conversation forward together.

They regularly publish guides and resources to help you learn about topics such as bot development, using them for business, and the latest in chatbot news.

For newcomers to the space, they have a beginner's guide that will teach you the basics of how bots work, what they can do, and where they are developed and published.

And to help you choose the right framework and avoid the confusion about which NLU features and platform APIs you will need, they have compiled a list of the major options and how they compare.

Go to python podcast.com/

discoverbot

today to get started and thank them for their support of the show.

And you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis.

For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season.

We have partnered with organizations such as O'Reilly Media, DataVarsity, and the Open Data Science Conference.

Go to python podcast.com/conferences

to learn more and to take advantage of our partner discounts when you register.

And visit the site at python podcast.com

to subscribe to the show, sign up for the newsletter, and read the show notes. And to help other people find the show, please leave a review on iTunes and tell your friends and coworkers.

Your host as usual is Tobias Macy. And today, I'm interviewing Adrian Ball about SNPs, a set of technologies to make voice controlled systems that respect users privacy. So, Adrian, could you start by introducing yourself? Yes. Of course. So

I'm Adrian Baul. I'm a French data scientist and machine learning expert.

I'm 29 years old, and I've been working at Snipps for the past 4 years where I've been building a voice assistant technology

focusing on the natural language understanding part of it.

So my academic background is about applied mathematics and computer

science, a mixture of which back then was not yet called data science. So I love playing with data, trying to extract information from it. And for this very reason, I've been a Python fan for a while now.

And more recently, I jumped into Rust, and I've been enjoying it a lot. And do you remember how you first got introduced to Python?

So it's hard to remember exactly when it started, but, I remember what really convinced me that it was very nice language. And I think that was about 5 or 6 years ago when a mobile game named 2, 048

became viral.

So this game was, 4 by 4 puzzle where you had to merge squares

containing identical numbers, which were powers of 2, to form a new square, which value was equal to the sum of the 2 squares. So this game was very popular, and,

I realized that the game was fun because the strategies were completely not obvious. So I decided that,

I would try to build my own algorithm to see if it could beat me

at this game. And so I I seized the opportunity to to discover a new language

and, figured out that Python was

apparently very well suited for that.

And so

the algorithm I developed

turned out to be

actually much better than than me, at the game, as it's

managed to to reach the the goal of 2, 048 at almost each run. So, yeah, that I think was, what's selling my my adventure with Python.

And so you've been working at Snips for a little while now. So can you start by giving a bit of an overview about what the Snips project is and how it got started and how you got involved with it? Yes. So Snipps was created about 6 years ago,

but back then it was more a machine learning consultancy,

doing missions and building products for big companies around smart cities. And at some point, the founders realized that they wanted to build their own products and to start implementing the long term vision that they had in mind, which was to make technology disappear while ensuring privacy by design. And the idea behind

this mission statement

is that by leveraging artificial intelligence and contextual information in a privacy preserving manner, We will eventually make technology so intuitive

that it would actually disappear in the background. So as a result, the beginning of Snipps as a product company,

was about building

a context aware technology

that would remove frictions in our day to day use of technology while protecting privacy at the same time. And quickly, we realized that 1 of the biggest frictions

was about the interactions

between

humans and machines. And it became clear that understanding natural language,

both text and voice, was the first step towards making making technology

disappear. So since then,

that was a bit more than 2 years ago,

the goal has been to build a voice interface technology,

which consists in,

3 main components. You have a wake word engine, which is the

Snipps equivalent of the Okay Google or Alexa that you probably know.

So this component is responsible for,

detecting when a user wants to start interacting

with a device.

Then, you have what we call the automatic speech recognition engine.

This component is responsible for processing the

the audio from the user and converts

the speech into text.

And, finally, the last component that I've, been working on is the natural language understanding component,

which extracts the meaning from the text,

given by the ASR. So these 3 components

are the the basic,

the basics of what makes the the Snipps platform.

So we've been,

on on par on the parallel side, we've been

shipping a web console that you can use to build your assistant before download it. I mean, you can train it on the web console and then download it, embed it in, in several devices that that are supported.

And so for somebody who's interested in getting started with the Snips platform, can you just talk through the overall onboarding process

of going from

starting with setting it up and training it and then getting it deployed onto a device? Yeah. So

the first step is always the same. You start

by going on the Snipps, web console.

Here, you you will have to

create your assistant. And based on your use case, you will define

a list of intents,

which correspond

to actions that you you want to capabilities

that you want to cover. You also have the choice of using

already built

applications.

So we have what we call an app store, which contains applications

made by other developers that are, that can be used off the shelves. So once you have created your assistant and defined

all the intents

that that are in it, you you will have to train it on the console.

And then you have the ability to to test it in order to see if it works correctly and if it covers what you want it to cover. So in this process, usually, you will make some iterations and,

adapt a bit your your ontology,

to to make it better and better at understanding what you what you want. And once you are happy with, the the the results, you will actually move to the

deployment stage.

So you have several different options.

You can

either do it manually, like, you have the options to download your assistant. And then,

from the the file you get, you will just

copy paste it into the device that you target

and follow the instructions.

So for instance, if we take the the Raspberry Pi, which is the the device the device that we generally use for our for our showcase,

You have,

a companion app that you can use from your your laptop and which will help you directly move your SD stand that you trained on the console to your Raspberry Pi. So all you need is just a Raspberry Pi with a microphone

and optionally a speaker if you want to have

audio feedback. And then

then that's it. You you if you

if you already have written your action code on the console, it will walk

out of the out of the out of the shelf. Otherwise, you have to, yeah, to to write it, on your on your side.

And the action code is really the parts that make the connection between the outputs of, what Snipps

has processed and,

what you want to to give back to the user. So for instance, if you want to if you are building

a weather assistance,

Snipps

will provide you the query of the user containing,

for instance, the time range,

the dates where the user is interested to to have the weather, and also the locations, for instance. And

what will be left on your side is to plug the the weather API that you want to use and to give an audio feedback to the to the user.

But, fortunately,

developers have already built such use case for you on the on the app store, so you can already

use that, as is, and it will work directly.

And when I was looking through the documentation,

I thought it was interesting that you have this option for automatically generating training data in order to

fit a number of different potential phrasings for a given action that you want to achieve. So I'm curious if you can just talk through a bit about how that training data generation functions.

Yes.

So a bit of context before that. So as we do not collect any user data as opposed to our cloud based competitors, we

indeed offer a way to improve the recognition accuracy of,

of your assistant in the form of data generation. So this,

actually consists in 2 different things. The first 1 that you may not have seen,

I don't know, is, what we call entity extension.

So, essentially, this helps you enrich the vocabulary of an entity

by automatically finding similar words.

And this leverages,

external databases such as Wikidata.

So you can think of of, the example where you want to build an application for smart lights,

and you don't you don't know in advance, what will be the color,

of the lights that are spotted. So what you wanna do is just cover as many colors as you want. And using the entity extension

tool, you'll be able to just define a few colors, and it will automatically

suggest

many other colors. So the second thing that you mentioned before is, indeed what

what we call automatic query generation or the data generation in a simpler way. The idea behind that is that the more utterances you provide,

the more diverse they are, and the better

will be the recognition performances of your assistant. So

the data generation tools that we've built help you generate new formulations automatically

with a meta crowdsourcing platform that

starts by generating tasks automatically

based on your your

the the utterances that you have already provided.

It will deploy them to the right crowdsourcing platform,

gather the results,

and then post process these results with semi supervised

machine learning algorithms

in order to clean to clean up everything. So,

basically, you can collect in a few hours or days enough data to reach,

the NLU performance plateau.

So the the key here is to leverage

crowdsourcing platforms. But it's not as easy as it may sound

because

you you get, in this process a lot of noise. So we have,

as I mentioned, we have

a step of cleaning, which is made automatically

and,

in which you

you essentially,

used the

the natural language understanding component to

to make some,

partial trainings on the data and find

potential errors and do a kind of automatic cleaning in that way. And at the end, you get

a a list of, utterances that should

actually be, good enough so that it will improve,

your your assistance.

And so as far as the overall architecture of the platform, you mentioned that there's the automated speech recognition and the natural language understanding and then the actions capabilities. So I'm wondering if you can just talk through a bit of the overall architecture of how that fits together and what the end to end interaction looks like as it flows through the system

and particularly

how you manage to fit everything

both size wise and functionality wise onto a device for enabling this offline capability?

Yes. So the Snipps platform relies on multiple components,

as you mentioned, which communicates

with each other through

a messaging protocol,

which is called MQTT.

So, essentially, each component

can

react to messages

and then send send back also,

messages based on the processing that it can make. So 1 of the components is the dialogue manager, and this component is responsible for orchestrating

the processing flow.

So, typically,

this is the component which will toggle on and off the other components at the right time. So the typical,

flow

that a user will trigger when interacting with the Snipps platform is that, so initially, you have the white cord component, which is always listening. That's the purpose of of this component. The other components are kind of idle. Then you start by saying something like hey, Snips. This will trigger

the wake word. The wake word will recognize it will recognize that you have

said this specific

word. The dialect manager will then activate the automatic speech recognition engine

and start listening for what you have to say. Once so the then the ASR engine will detect when you stop talking, which is called end pointing. So at some point, you will finish your sentence.

In parallel, it will already

have started to decode what you said. And at the end of your sentence, it will send back,

using MQTT, it will send back your decoded

sentence

to the dialogue manager, which will hand it over to the last component, which is the natural language understanding component.

And this 1

will try to extract the the intents corresponding to your query. So it could be turning on the light or asking for the weather and also the parameters of your query. And, and this is

the final result that will be

sent to

to through MQTT to the to the action code. And the action code is actually the responsibility of the developer. So you will have, as a developer,

to decide on how you want to handle this final information and move forward with it. And so as far as the automated speech recognition

and natural language understanding components in particular,

I know that there are a number of other competitors in the market, both commercial and open source. And I'm wondering if you can characterize your overall position in the market as far as how you compare to some of the leading competitors.

Yes. So I think that the the biggest difference between,

us and what is available

available in the market is that,

our focus is on having lightweight solutions that can run anywhere. So there are 2

aspects to take into consideration when comparing what we do with the competition.

And the first 1, linked to what I just said, is the offline capabilities.

So the the the landscape is divided between cloud based competitors and non cloud based competitors.

And the first category is actually much bigger than the second 1. So today, white label, it would be

large

large vocabulary,

ASR engines run-in the cloud. So that's typically

Google, Nuance, Houdify,

Cortana,

and so on. On our side, we believe that there is a huge value in running locally

because by providing such a technology, we bring privacy by design as well as support for offline use cases.

And these are things that cannot be claimed by a cloud based player. So, the second aspect is, of course, regarding the recognition capabilities.

So, for us, the metrics which makes sense is the end to end accuracy.

So, this directly captures,

the proportion of voice interactions

that work perfectly.

And in the end, as a user, that is what really matters.

So for this reason, we are not focusing

on the word error rate when assessing the quality of our speech to text engine. And so, a few months ago, we we benchmarked

speech to meaning pipeline. So that means

the the whole pipeline from ASR,

to NLU.

So we benchmarked it against other competitors

such as Google

on datasets,

that we made available publicly. These datasets

were actually corresponding to

very

complex use

cases, for instance,

music assistance, because you have

a very open vocabulary.

And so,

when, the outcome of these benchmarks were that,

the performance of the Snipps Voice platform were actually higher or on par

with Google's cloud based services.

So

the key to reach such performances

is really in the way our ASR engine works.

The the thing is instead of building a generic,

speech to text engine capable of understanding anything you would say, provided it's proper language,

which is

what

Google and and others are doing. We made an ASR engine which can be trained

to understand the domain and vocabulary that you are interested in. So it is customized

for your use case. And the idea behind that is that if you reduce the scope and the mission

of the ASR engine to what is strictly necessary,

you can reach very good performances.

So this is called language model adaptation.

On the NLU side, we also benchmarked our solution against the main alternative,

which were both cloud based and also

non kind based, like, open source alternatives.

And we found that we were on par with, with the rest of the competitions.

If you look at, only the NLU,

the the performances are quite close from, from each others

and, quite close from the state of the art, actually. There is actually a last aspect to take into consideration, which is the the footprint

in terms of,

yeah, more related to to hardware.

So this is only relevant for our non class based solutions,

like us. So I'm talking about,

memory footprint and size

as well as speed. So I don't have all the number in mind, but,

as we are targeted as we are targeting embedded devices with limited resources, we we have optimized as much as we can the components

so that they could run with with a very, very small footprint.

And in terms of speed, we have a big advantage compared to cloud based solutions as we save, a round trip at least of potentially several 100 milliseconds,

simply by by running locally.

And as far as cloud based solutions in particular, I know that they leverage the fact that they're getting a huge amount of data from multiple users in order to be able to extend their voice training model to try and improve their overall accuracy.

And as you mentioned, you're focusing more on just being able

to train each individual agent to be able to handle the specific set of vocabulary terms and the specific set of languages that they're actually going to be used for. But I'm wondering how you managed to gather feedback and maybe additional training data to improve the capabilities

of your engine

as you go forward where you don't have this capability of being able to leverage multiple users and,

massive amounts of data? So

regarding the the ASR engine, the way we work is that we train for each language. We train each ASR engine on about,

a 1000 hours of, transcripted audio. So

I think that's not a real issue on the ASR parts because you already have a lot of of, data available publicly. You can't just think of

audio subtitles, which are, like, transcript of of audio that you get almost for free. But, of course, you have you have

data which is much much cleaner than that. So for the the the acoustic modeling parts, we don't really need to collect,

users' data to improve our model as we can use publicly available data. So, for the NLU parts,

essentially,

the way it works is that we provide this data generation tool that you can leverage to increase the amount of data that you will use

for training. But, again, the idea is that we don't plan to

to cover

general use cases, like, very generic queries. We want to tackle

specific,

assistance use cases. So it's a bit different because when you when you create your assistance with Snipps, you already know what you want to cover. So it's it's much simpler to describe the data, and you can still iterate on it back and forth into iterations on the console and adding new utterances.

But, again, we do not prevent you as a as a user of Snipps. We do not prevent you from using the data as you want. It's just that what we want to ensure is that

our solution,

is completely off can run completely offline and and be private by design. You are free to use the output of

our, our system,

however you want.

But in practice, on the benchmarks that we've run, we've seen that we reach performances,

as good, even better as, as competitors without collecting users' data. So that make us think that it's not it's not necessary, for what we are targeting. And have you been able to take advantage of things like the Mozilla,

common voice project as additional sources of data for training your agents on? I'm not sure about that. I've I mean, I've heard about, this project, and I think that the, yeah, the ASR team,

probably looked into it.

I cannot, answer very precisely on that actually. And

from the conversational UI perspective,

I know that 1 of the biggest challenges

overall is the ability to handle multi turn conversations

or being able to maintain context

as you're,

interacting with the agent. So being able to do things like what's the weather in Paris tomorrow,

and then getting a response, and then being able to say, well, what about the next day and have the agent understand what it is that you're talking about.

So I'm curious

how you have approached that problem and how well you're handling it. So for now, we we focus on giving full control to the developer,

with a lot of flexibility on the dialogue.

So

there is a mechanism of sessions, which allows to keep track of the different steps of an interaction. But again, the the goal is really to provide as much as we can to the developer, so that he's not limited

in terms of, of the dialogue flow that he that he wants to support.

But in terms of, yeah, conversation and and

context, for now, we we have, we stopped a bit earlier.

We just give you the tools so that you can implement the dialogue flow as you want. The reason for that is,

mostly because it's it's quite complex, and, it's quite easy to make, wrong choices about that. So we prefer to focus for now on on on really,

building the having the the building blocks

that very I mean, having them very robust so that you can build on top of them. But yeah. And reaching the conversational capabilities is definitely on the on our roadmap for the, yeah, for the future. And another aspect of the NLU engine that I came across in 1 of your post recently is the fact that as you mentioned, you recently ported it from Python to Rust. So I'm curious what your overall experience has been in the process of doing that both from the technical perspective of being able to maintain feature parity and also your overall experience

of Rust as it compares to Python.

So yeah. So the main motivation for that was,

portability.

The the objective of the Snipps platform is to run on as many devices as possible, and Python was not a good choice in this regard.

So I'm not sure, actually, you can if you can run Python code easily on iOS nor Android, for instance. So we decided to choose Rust as it is a model language, which, offers,

high performance

and low memory overhead.

And you also have a memory safety and a cross compilation,

as first class citizens.

So there is no overhead,

in terms of performances compared to to c. So I would say that coding in Rust has been an extremely

rewarding journey, which doesn't mean it was easy. It's quite the opposite actually. But in many aspects, Rust is the opposite of Python. And yet, many people in the Rust community are also Python programmers. I think that learning Rust is much more than learning yet another language,

as it forces you to understand a bit more what you do and how you manipulate the objects. It can be a bit frustrating at the beginning, but, in the end, you understand

why it's been designed the way it is. But we

did not port the whole Python code to Rust,

as we actually did not need to.

The way it works is that we use our Python library for training, and we use the Rust library for the Inference parts, which is the 1 running on device. The reason for that is that Python is still, extremely valuable for, for what we do as, its machine learning access ecosystem

is amazing, and, it allows fast prototyping.

So it's 2 things that Suresk cannot offer at the moment. So that means we we had to maintain

2 codebase, 1 in Python and 1 in Rust.

And we had to build a serialization,

interfaces

a serialization interface between these 2 libraries.

But

so initially, the inference part of the Python pipeline was missing some bits

compared to the Rust equivalent.

And the reason was that it was, the Rust library which was used in the platform. So we did not really

need to have the the full implementation on the Python side.

But,

following our choice to open source the Python library, we decided to implement the missing pieces in the Python lib,

as we anticipated that users of the Python lib may not want to use the the Rust library

in conjunction.

So now, as a result, we are maintaining 2 Inference pipelines.

And I have to admit that it has been it

has been a bit painful,

but we built

some automated tools that help us ensure

feature parity between the the 2 implementations.

And,

on the other hand, as our technology

is becoming more and more mature,

maintaining,

disparity

has also become less and less complicated as everything is

getting more stable and changes

a bit less than it's used to change.

We have had discussions

regarding the possibility to port the training parts into Rust so that we could only keep the Rust implementation.

The advantages are quite clear.

Essentially, you have a unique pipeline to maintain

and also, you you have the ability to

to train to perform training of the NLU on on more devices.

But the difficulties are quite clear as well.

Many Python dependencies

that come for free on the Python pipeline

would have to be reimplemented.

So for this reason,

we are not there yet,

it's in our head, but it's something that needs to be

thought quite carefully before we start working on that.

And as you mentioned, the,

Python implementation of the NLU engine and correspondingly the Rust 1 have been open sourced. So I'm wondering if you can just talk a bit about your overall open source strategy at Snips and which portions of the pipeline have been made publicly available and which ones you are continuing to maintain as proprietary, particularly given that I know that you have a commercial offering as well? The first component that was open sourced,

was the NLU. So both the the training in Python and the rest implementation for the inference.

We also,

open sourced recently

something called Hermes.

So Hermes, the Hermes library,

is actually

responsible

for

exposing

all the

all the messages

that,

that the Snipps platform can, can actually send and receive

so that,

it's,

it helps you write your action code properly.

So this part was quite important

for us to open source as it, it gives transparency on the on these objects, and it also

makes the life of developers

much easier when it comes to to writing action code. For the rest,

I think that,

1 of the plan is to

to open source the the part of the Snipps platform,

which is not,

related to to training. So, essentially, everything which runs locally, but,

I'm not sure about

the the the what is our strategy on,

for instance, the training of the ASR and

this kind

of

things. But I think what runs locally on your device, so this NIPS platform software that you install,

is something that we want to open source, in order to allow, people to to know what they what they

what they are actually running on their device, which is quite important in terms of transparency. The the strategy,

behind the open sourcing of the NLU

has been that,

essentially, you already have competitors

that,

that do,

NLU either on the on the cloud or

others are already open sourced.

So

we knew that

it was not a real problem to disclose our technology.

And,

also, we we had committed to

transparency and and

and building a privacy preserving,

technology.

As as part of that,

it it made sense to

to expose,

and to make our code base, public so that people would

would actually

could actually trust us instead instead of just believe what we what we were saying.

I think the open sourcing of the of our natural language and understanding library has been a success. We've had,

we have now many many users using it.

And,

it's also a way to to improve your your your library. You have feedbacks from

from, from users. You have

people

making, I mean, creating issues, reporting bugs.

In the end, what you what you gain,

in that case is probably,

totally worth it compared to, yeah, disclosing some

propriety,

yeah, code code base. It's not necessarily the case for everything that we do. But at least for the NLU, it was, it was the case, and it's probably also true for the core code base of these teams platforms.

And as far as the overall usability

and user experience aspect of the system,

I'm wondering how you

manage to

simplify that particularly for nontechnical end users

and how you approach the discoverability

problem of letting somebody understand

what capabilities

are enabled on a given Snips agent?

So

yeah. So far, we have,

around 26 1, 000 developers that have been experiencing with Snipps.

And I think that the console makes it quite accessible to to define and train an assistant,

even for

someone non technical. Then you have

the deployment process, which can require a bit more

skills.

But the thing is,

thanks to the Snipps app store,

if

you want to

create an assistant for

a kind of standard use case, like, I don't know,

smart lights

assistance

or building a weather assistant.

All of these apps already exist on the Snipps app store. So all you need to have all all you need to do is just,

pick these apps on the the App Store

and install them on your assistant.

And they already come with the action code, which means you you won't have to write a single line of code. All you need to do in that case is just follow the

the the documentation

on,

which will guide you through

deploying your your assistant on something like a Raspberry Pi for instance.

So I would say that

if you want to play with voice technology without any

technical background,

it's definitely possible, and I'm not sure it's actually

so complicated

if you stay on standard use cases,

which are already

covered by what the community has done.

If you want to do

specific things, then you have to,

yeah, put your hands

a bit more in the in the in the code.

But, yeah, I think that Snipp stays

quite user friendly,

on this perspective.

And for somebody who wants to add new capabilities

on the action side, what's the overall process of implementing new capabilities

and

deploying it as an option in the Snipps marketplace?

And also given the fact that the communication protocol is based on MQTT, I'm assuming that the implementation language is up to whomever

is actually creating it. Yeah. So as a developer, your interface with Sniff,

is the the web console.

So this is where you you define the capabilities of of your assistant. So

either you are,

working for your assistant or, you

you may want to just create an app and publish it on on the on the store.

So if you if you're working on a personal assistant and you want to update and change its capabilities,

what you wanna do is just,

add new apps to it, through the console,

and then, and then deploy that to your device. So

it can be done either manually by just replacing

a resource directory

on your device

by

the new 1 that you will download from the from the console,

or,

automatically

on certain devices such as the Raspberry Pi by using

the the companion app that we built.

On the other hand, if you want to

to publish,

an app

for a new capability, which doesn't exist yet,

what you can,

do is so you can, of course, define your intents, your ontology for the app,

but you can also

define the action code

directly in the console.

So that allows users of your app to to to use it directly without

having to do the extra mile of

of coding the

the the action code.

I think right now, the what we support

natively is,

action code written in either a Python or JavaScript.

So these 2 language,

if you plan to use these 2 language, it will definitely

be quite

easy to do that

directly on the console. For the for for the other languages,

I think it's, it would it's possible, but it's

you will probably have to do it,

directly

on your device. I mean, you won't be able to use the

the console for that. But, but it's, it's definitely

doable because of the

the fact that we use NQTT, and it's not

it's not language specific.

You just have to to catch the the the messages that are sent

in, in I mean, using this messaging protocol and and react to them.

So, yeah, that's that's that's actually a choice that we made, very early, in the development of the platform. And that has proven to be a a very good choice even though,

yeah, some people want to use their their their own brokers and so on for the the messaging,

protocol.

It's been quite, stable, and, I think people are

are happy with with, with that.

And focusing on an offline first

deployment capability

is

beneficial

particularly from the privacy preserving aspect and allowing you to reach markets that would other otherwise be unable to use a voice platform because of requirements for Internet conduct connectivity, which,

despite people liking to think so, isn't yet globally available.

So on the other side, I'm wondering what you have found to be the limitations

of running everything offline and on device, and any cases where you think that Snips is the wrong choice for a voice platform.

So compared to running on the cloud,

yeah, indeed, running on device comes with

several challenges.

So first, the the models

that are used in the core components

must be small enough to fit on the device,

and then the algorithms

that are used,

must run on on, on these devices.

So this restricts,

the language that you can use, for instance. So that's

that's what I mentioned earlier when I said that,

Python,

was not suitable for the the inference part

as, it would prevent us from running on several devices such as,

yeah,

smartphone, like, with

iOS or Android.

So that's,

first,

restrictions.

But I think, yeah, the the biggest

limitation is

is from the yeah, from an engineering point of view. You

you have

your public API is is more complicated than a than a simple REST API as you need to ensure

compatibility

between,

on 1 side, the Snipps platform software,

that you that is installed on on on your device.

And on the other side, the models trained on the web console,

which are then downloaded and

and are supposed to be loaded

by the software.

So this, interface

is actually quite tricky to maintain and to to version.

But yeah, as you said, on the on

the on the other hand,

running on devices comes with many advantages.

And on top of, the the privacy

preserving feature,

it also opens the way to very interesting opportunities

to completely transform interactions. So

1 thing that we are very excited about is the possibility to run,

an always on

speech to text engine.

So this, if you think about it,

it's completely

impossible if you if you are a cloud based solution. You cannot

imagine that you would you will tell your user that, you have something which is constantly listening to you,

and not only the wake up, but everything,

that you say. And

this would be streamed automatically to

your servers and processed and and so on. But if you run locally and if if everything is is local,

that's not a problem. So you can start having a speech to text engine, which runs locally

which runs constantly.

And this

could potentially remove the need for the wake word, meaning that

you would be able to

express queries,

in a more natural

manner without

having to explicitly trigger,

the the wake word to start interaction.

And I think this is

only made possible,

through,

the fact that we run offline and on device.

And so, yeah, regarding

the second part of your question about when

is Snipps the wrong choice.

I think the answer is

dictation.

And, more generally,

all the situations where

there is no context and the user could be talking about strictly anything.

So this is

due to how we

how we approach

ASR,

by customizing it to your to your use case. If you cannot define clearly the use case,

then the Snipps ASR won't be good at that. But,

at the same time, all voice solutions today are deceiving in that respect,

which is why we chose to focus on verticalized

assistance,

in which,

there is

1 or several defined domains of interest,

even if there are large vocabulary.

And

as far as your experience

of working with Snips, I'm wondering what you have found to be some of the most interesting or useful or unexpected lessons that you've learned in the process and some of the biggest challenges that you've been faced with?

So I think 1 of the most exciting parts,

of what I've been working on is actually the

when we open sourced the NLU library,

powering the powering the Steam platform.

That was my first experience in open sourcing a Python library.

And when you open source your code, at the beginning, it feels like a lot of work because you have to polish everything,

write a bit of documentation, make sure your APIs are clear and easy to use.

Yeah. All that stuff. But the thing is once you've done that, you realize you should have done it right from the beginning as it makes your life so much easier.

So what appeared to me as a constraint in the beginning

ended up being a way to improve our productivity and our code base.

And when you help other people jump into your code base and make contributions easy, you are also, targeting

yourself as well in the process.

This is,

something which is quite obvious when you jump into code,

issues or pull requests that were created a while back.

At this point, you really enjoy and benefit from the fact that back then, you took the time to do things correctly.

So I learned many good practices in this process,

and I'm now applying them to every new project I start either closed source or open source.

I think I I would I would sum up the lesson I learned,

like,

you should write code as if it was open source.

And so regarding the the the most challenging aspects,

I think that building

a robust, stable, open source, and then your pipeline

that interoperates

between Python and Rust

is definitely the most challenging thing that we've, we've had to tackle. Having to maintain a serialization API between 2 languages

was something completely new, for me. Like, figuring out how to version this API was not clear at the beginning.

And then, as snips started to become more and more mature, we were asked to make as little breaking changes as possible.

This is much easier when your API consists in public functions or classes

or maintaining a REST API.

But when you need to ensure that models that were trained few months ago

using an old version of the library could still be loaded and run with the current version,

then it's a different story.

But the good part is that, again, you you learn many things

tackling this challenge.

And what have been some of the most interesting or innovative or unexpected ways that you've seen the various technologies that Snipps used by end users?

So recently,

we we launched

a competition on Hackster,

where we we we asked our community to come come up with,

with projects using, using Snipps technology.

And so this competition is now over. And

1 of the the

some of the winning projects have been really really

fun and unexpected, I would say.

So 1 of them was

a project that was, called,

Sleep Smile Flower,

and it consisted in building a voice assistant that would help taking care of plants

by performing some tasks such as watering the plants,

emptying its tank,

and checking, for instance, the moisture level.

There was also another project that was quite interesting

and could be very

useful,

which consisted in a in a voice assistant that brings,

an easy way to request, help

in emergency situations

for older people or impaired people living alone.

So it it consists in several satellites

placed in your home. And in case of emergency, you can ask your assistant to call someone.

And, yeah, Personally, I'm I'm also using, Snipps at home

for a very simple task, which is, switching,

the switching on and off the lights of, in my bedroom.

And I'm doing that

using

1 feature that we did not mention before, which is the

ability to customise

a wake word.

So that what I've done is simply

create 2 different wake calls. 1 for the turning the light on and the other 1 for turning the light off.

And that allows me to simply say a word

to turn the light on and say another word,

to turn the light off. And I don't have to

to

to first trigger the the wake word and then, perform a query as the wake word is also already the

the the trigger.

And so what is in store for the future of Snips, and what are some things that people can keep an eye out for?

So first, we want to scale what we have. So,

we want to go from 6 geographies to 42

over the next, 2 years, I think.

So essentially

covering,

all these new languages.

So it's it's 1 it's kind of task that, requires a lot of defaults, but,

that we know is,

is required

by many businesses.

On the more

machine learning and technical parts,

there's something that we will probably ship

during the summer,

which is

what we call speaker identification.

The idea is that,

we we are building,

new components that will be able to identify

the person who is speaking

and giving

giving

him or her an idea

an ID.

So you have a you have a quick onboarding where you have to say

each user has to say a few utterances.

And then, once, once you're done,

each time someone,

interacts with the assistance,

the the the system,

will recognize,

who it is and will provide this information to the action code,

making the interaction even more personalized

because you can then

react differently to the same query,

if it's, if it's, someone or

or

or some other person.

Then,

as I mentioned a bit before, we wanna pioneer in building

the next generation of voice interfaces

by removing the the wake word,

which is, possible as long as you as you you run locally.

This is a bit more long term as it raises many technical challenges.

And also,

running the ASR,

having an ASR running all the time is a bit,

consuming. So we'll have to to

do,

some optimization on what we we have

at the moment.

But, at the end, we believe

this is the the way, the way to go in order to make the interactions,

even more natural.

Yeah. Yeah. Yeah. That's that's it. I think there are also many, many optimization coming for supporting,

a new new new platforms, new devices,

lowering the size of

of all our models so they they can fit on even smaller,

chips and so on.

And yeah.

And are there any other aspects of the Snips platform and technologies

or use cases that we didn't discuss yet that you'd like to cover before we close out the show?

Yes. There is 1 thing I didn't mention which,

which is quite useful and that we've developed

quite recently, which is the ability to to have, what we call satellites.

The idea is that you have a main device,

powering the the Snipps platform, which has a bit of processing power. It can be

a Raspberry Pi 3.

And then,

for instance, if you are

trying to build a smart home assistant,

you will you will have many different satellites in your in your home, in your home.

And these satellites,

will typically be much smaller devices such as

the Raspberry Pi 0, for instance.

And they will be there, only to capture the audio and stream

it to the to the base device.

So this allows to cover,

many

many rooms

without having to to duplicate your setup.

And it also allows for a multi room, dialogues, so you you can have, like, several people

in in in different rooms interacting,

with Snipps

with the same assistant.

But because they are interacting with different satellites,

the base device will know

which

satellite is activated and it will know how to

address,

all these queries in parallel.

Alright. Well,

for anybody who wants to get in touch with you or follow along with the work that you're up to and what's been happening at Snips, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us on into the picks. And this week, I'm going to choose the Chrome OS operating system. I've been

trying it out as a

computing platform for when I'm out and about, and it's been coming a long ways lately. I particularly like the recent addition of Linux support.

So, if you're looking for something that you can get some work done on some fairly inexpensive hardware, it's definitely worth taking a look at, some of the different Chromebooks that have come out. And so with that, I'll pass it to you, Adrian. Do you have any picks this week?

Yes. My personal pick, is about privacy and what has been happening

recently during the Google IO conference,

and the Facebook FA 8 conference.

Facebook said the future is private, while Google

said

the present is private.

But in the end, both conferences show that big players are finally realizing that privacy matters.

I'm

actually proud that at Snipps, we have been engaged in protecting and advocating privacy

as a human right since the very beginning of the existence of the company.

And it is a bit ironical because before

this privacy awakening,

statement from the the GAFAs,

they had been

implicitly delivering a message, which was that AI products were made possible only by sharing and accessing users' data.

What we have been saying on our side for a while now is that privacy does not have to be traded,

for the best AI technologies.

And now, Google starts to ship products that run offline,

proving that

privacy can be respected.

And this is becoming more and more important as the kind of data which is collected

is becoming more and more personal.

Think about voice, for example.

So I think that, as users,

as users of these technologies and by the choices we make, we have the power to influence and drive these technologies

in a direction where our digital life and our privacy

are protected.

Alright. Well, thank you very much for taking the time today to join me and discuss the work that you've been doing with Snips. It's definitely a very interesting platform and 1 that I plan to experiment with on my own. So thank you for all of your efforts there, and I hope you enjoy the rest of your day. Thank you for inviting me.

The Python Podcast.init

Summary

Announcements

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__

Summary

Announcements

Interview

Keep In Touch

Picks

Links

The Python Podcast.init