Luminoth: AI Powered Computer Vision for Python with Joaquin Alori

Hello, and welcome to podcast.init,

the podcast about Python and the people who make it great.

When you're ready to launch your next app, you'll need somewhere to deploy it, so you should check out Linode. With private networking, shared block storage, node balancers, and a 40 gigabit network, all controlled by a brand new API, you get everything you need to scale.

Go to podcast in it.com/lino

to get a $20 credit and launch a new server in under a minute.

For complete visibility into your application stack, deployment tracking, and powerful alerting, Datadog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integration and distributed tracing, you'll have everything you need to find and fix bugs in no time. Go to podcast then it.com/datadog

today to start your free 14 day trial and get a sweet new t shirt.

And to get worry free releases, download Go CD, the open source continuous delivery server built by Thoughtworks.

You can use their pipeline modeling and value stream app to build, control, and monitor every step from commit to deployment in 1 place. Go to podcastthenit.com/gocd

to learn more about their professional support services and enterprise add ons.

And visit podcastthenit.com

to subscribe to the show, sign up for the newsletter, and read the show notes. Your host as usual is Tobias Macy. And today, I'm interviewing

Joaquin Aluri about Luminoff, a deep learning toolkit for computer vision in Python. So, Joaquin, could you start by introducing yourself?

Yeah. My name is Joaquin, and I'm a developer working at Trail Labs, which is a software company in Montreal, Uruguay. And I've been working on the Luminos team for about,

3 months now.

And how did you first get introduced to Python?

I was finishing my degree in what we call industrial engineering in Uruguay, but it's actually more similar to what you guys call manufacturing engineering in the US. And I have been working in the industry for about,

2 years at that point. And I was starting to get a bit bored with it. So I wanted to try something new. And then just by chance, I watched an interview of Elon Musk. And there he said that AI,

was a danger for humanity, which I found super odd because the way in which I thought about programming was that you have to write all these rules for the machine to follow.

And there was no no way for me to, like, to write every single rule that a machine has to follow to be considered smart.

So but, you know, I respect Elon a lot. So I just started reading about AI.

And then I found about machine learning, which I knew nothing about at the time. And that kind of changed my perspective a bit because in machine learning, you write models instead of rules. And these models just take data and start to learn insights about the the data. And I kind of was amazed with with that. And I just, quit my job and started reading about AI full time. And that's when I met, Python. I just started, reading about Python a lot and learning, and I love it.

And as you mentioned, you started working with Trio Labs recently and working on Luminoth. So I'm wondering if you can talk a bit about what the Luminoth project is and what was the original motivation for creating it? Yeah. Luminous is a Python,

toolkit for computer vision. So what you do is you feed an image or a video to Luminous

and you get a prediction of what each object in the image is and also a bounding box surrounding each object. And the reason we developed it was because we had a lot of projects at Treelabs where we had to do object detection. And initially, we just make a custom code for each 1. But at some point, it made sense to just write a single toolkit with all the

tools. So, yeah, we we built Luminos, and we we decided to make it open source because

there wasn't

an easy easy way,

open source to to make object

detection.

So we thought people might want want to to try it and maybe want to contribute to it. So we just open source it. And computer vision is a very large field, and it sounds like the Luminoth project is primarily concerned with the subtopic

of object detection and potentially object recognition. Is that accurate?

Yeah. It's mostly

at this moment, object detection. So at the moment, we only, like, give you each object, its label, and its location. But we plan to

add object segmentation also. Image segmentation. So each pixel in the image is labeled.

And maybe even

image label

making text from an image. So we describe the image with text just by looking at it, but that's maybe in the future.

And what is it about the utility of object detection that made it worthwhile

for that to be the main focus of Luminoth, at least to begin with as opposed to any of the other potential applications for computer vision? Well, it was just a personal thing for us because we have projects that demanded that specific task. But I don't think there's a particular thing about detection.

All the other applications sound very useful. But, yeah, it was just something custom to our our needs at our trail apps. And

the topic of computer vision

has been a pretty major focus of research in AI and machine learning for several decades now, and I'm wondering how the current approaches that use deep learning and neural networks compare to some of the previous generations

of tooling and models that were used prior to the current generation approach?

Yeah. Well, using machine learning for vision is nothing new. In fact, most of the big algorithms

are quite old. Like, for example, neural networks were created in the fifties by the US Navy, I think.

And convolutional neural networks were created in the nineties, but it were popularized in the nineties by Yann LeCun. And, back propagation, the algorithm we use to train these networks was popularized

by Jeff Henson in GHS. So, like, the big ideas were already set, but they weren't working too well at the time. And,

there needed to be some kind of small improvements to make them work well.

And these improvements came in the shape of just smaller, more recent breakthroughs.

Stuff like dropouts, radio activations, batch norm,

adding shortcuts between layers.

And these algorithms don't really change the overall design of what a ConvNet is, but they modify it a bit and they allow us to make them much deeper. So with these improvements

and just having more GPU

power and more data to train

kind of made the whole, computer vision area have a a very big leap in the 5 in the past 5 years or so. And what are some of the most difficult problems that are currently

present in visual processing that still need to be solved?

I think 1 problem with the current,

state of the art is that they use combnets, and combnets are not very equivariant.

And this means that the networks react differently to the same objects in a different scale or different pose or different thickness and so on.

So this makes learning more difficult.

And this is mostly because the convolution operation itself

itself, it's only a key variance to translation.

But, so each time the the network sees a rotator or a scaled version of an object, it has to learn to find it again.

So that's really, really bad.

And, well, you can kind of get around this problem by fitting the network images of objects in all different scales and poses, but this is not really optimal as it takes a lot of examples to to learn.

Yeah. Well, there's been actually some interesting work being done to fix this problem.

Geoff Hinton, who I mentioned

earlier,

has been working,

on something he calls capsule capsule networks.

And these capsules are inherently a key variant.

So they understand things like rotation of the objects, its position, its scale, thickness.

So this really is is promising,

but it still doesn't quite work as the state of the art comNets do.

But if you think back to how comNets took a lot of time to really start shining,

I suspect that caption nets,

will eventually find this kind of slow success

of their partners.

And 1 of the topics of conversation

that's becoming more prevalent in the field of computer vision is the idea of

adversarial networks and generative adversarial networks where you introduce a little bit of noise into the source image and the

machine learning model suddenly is unable to recognize what it has been trained to pick out from a given image. So I'm wondering

if you have done any work with that with Luminoff to see how well it handles the introduction of noise into those images or some of the,

or if you can talk a bit to some of the potential approaches for overcoming that limitation of computer vision.

Yeah. We haven't done any work on that, but, Illumina uses just standard models like faster R CNN and SSD.

So these models don't have any consideration for our certain examples.

So I suspect they're very susceptible to that kind of attack. But, yeah, I I don't really have a way to to fix this. I mean, it's a very intense, area of research at the moment, but I suspect most of the comets used today are very susceptible to that kind of attack. And what are some of the other limitations

of Luminoth for building computer vision applications,

and how do those limitations compare to the tooling that existed

and is still used from prior generations such as OpenCV or similar projects?

Yeah. Well, OpenCV is much older than Luminous and it's like a more general library.

So it's quite huge, but it doesn't really focus on machine learning.

So you can do old machine learning stuff in OpenCV like extracting histogram of gradients of or or the gradients

and then running that through an SVM.

But it doesn't really support the the new deep convnets.

So you won't get state of the art results with that in object detection or or image classification,

stuff like that.

But still, if you want to do detection or something simple like a face or something and you want to run it on a CPU,

it could make sense to use OpenCV

because it's, like, lighter than a big Cognite.

And for somebody who wants to use Luminoth in their project, what does the workflow look like for getting it set up and trained against the types of material that you might be trying to detect?

It's quite simple. You just install it by running pip install Luminoft, and then you just type, Lumi predict, and you give a path to the image or the video.

And then you get a prediction,

which comes in the way of a JSON or or maybe the the image or the video with the box drawn on. And yet I if you don't specify a particular model you want to use, Luminaut will automatically download a faster CNN network,

which is pretrained on 80s,

common categories.

Yeah. And if you want to train on your own data, you have to arrange your data according to the COCO or the Pascal VOC datasets,

which are just very popular datasets used for training in the common community.

And then just run Lumi datasets transform

and the path to to the folder of your data. And that will serialize your data into a TensorFlow record, which is what TensorFlow uses to to train. And then finally, you just run a Lumi train and the path to your config file, if you want to use a custom configuration.

And that's it. It should just start training.

Yeah, this config file allows you to change everything about the network, like the type of the network, the amount of layers,

the activation layers, the auto augmentation.

And, yeah, that's just on the command line. You also can run a a server. You just run Lumi server web and that pops up a web server on your browser.

And you can just send pictures to that and get the prediction back from it.

Yeah. And we're working at the moment in adding a a Python API

and a more robust, rest RESTful API too. And what are some of the types of applications

that you have worked on at TriLabs or that you have seen other people use Luminoth for?

The the most popular area is is drones. We have

several applications with drones like,

1 particular 1 was to detect on the floor some some markings.

And with that, the drone can kind of create a map with a a real sense of the distance of the area.

So, like, focus, like, the focus of Luminous was just detecting in pictures and videos these markings on the ground and it worked, really well. And you mentioned too that with tools such as OpenCV,

they're potentially a bit lighter weight than Luminoth. So for somebody who wants to be able to deploy Luminoth in a production environment, what are some of the resource considerations

that they should be thinking about as they're provisioning the servers that it'll be running on?

Yeah. It depends on what you want to do. If you want to do video, you have to have a powerful GPU like a GTX

1060

at least, I think. But if you want to do pictures, any kind of GPU is

it's I mean, depends on the amount of flow you want to give to Luminous,

but,

just a a decent GPU is what you need.

But, if you want to just use a CPU, that's gonna be quite hard because it takes, like, several seconds to predict 1 image on a CPU.

So I don't think that's gonna be anywhere near usable in production.

And when you're incorporating it into an application, would Luminoth generally be embedded as a library

into the overall structure of the project or is it generally something that's used as a service via the API that it exposes?

I think at the moment, we want to make it just improve our Python API and just allow you to embed it into your code.

But in the future,

maybe just improve our web clients and make it maybe more robust, like allowing 1, 000 of connections,

you know, at the same time, stuff like that. But I don't think there's a a very clear road map because our use at the moment internally is just, using the Python API.

But we haven't decided yet what maybe the the open source community wants, to to have. So that's

still left undecided at the moment.

So 1 of the advantages of using Luminoth

is that you have complete control over the data that you're using to train it with as well as the generated model. But what how does it compare to some of the other offerings that are cloud based and API oriented such as recognition from Amazon

and they send you back a response with the data. And Luminous on the other hand is just an open source project and you just download it and run it locally. And also,

through an API, you won't get the trained model from your data. So that stays on their servers, which is quite bad because, you're giving them your data, which is very valuable

and you won't get all the insights that your data produces. So I think it's actually more fair to compare Luminos with something like Facebook's Detectron or TensorFlow's

object detection API. And these are these are both open source projects which were built by the research companies,

research teams of their companies.

But they weren't really developed to be easy to use. So that's kind of where Luminoft shines. The open source, easy to use, toolkit.

And another area that would be potentially challenging is in terms of testing the models that you're generating

and ensuring that you're getting consistent results out of them. So I'm wondering what are some of the approaches that are available for doing that kind of testing and the challenges that still exist for that particular endeavor?

Yeah. Currently, our testing is quite basic. Just, have the model not explode and stuff like that. We don't really really have a deep testing which gives you like, tells you if if the comment is really working

and predicting the images correctly. I'm not aware of if there's a library to do that at the moment.

But, yeah, we we don't really have anything to do, like, very good testing on on machine learning at the moment, at least in Luminoth. And digging a bit deeper into the library itself,

I'm wondering if you can talk about how Luminath is actually structured internally and how the design of the project has evolved over the course of its life. Yeah. Well, we we use SONET, which is

a a a library which DeepMind, created I think about a year or 2 ago. And that allows us to,

like, separate the networks in in in in modules.

So you can create like

a a proposal network with proposes,

areas of the picture to to predict and then just fit that network onto

a head of a network will which will let you predict if that proposal is actually an object and which object it is. So it's kind of, very modular. You can, like, put the pieces together how you want. And that's like the initial initial design we had. It kind of never really changed too much. We just been focusing lately on adding new models and improving the the APS. And what are some of the highest priority features that you're focusing on implementing in Luminoth going forward?

I think our biggest priority at the moment is providing good demo projects and improving the documentation.

These are 2 areas that have been a bit neglected by us lately.

And maybe down the road,

we're thinking about supporting image caption, as I mentioned, on image segmentation. And particularly,

I'm interested in in getting into the new DeepLab 3 plus network, which was released 2 days ago.

But, yeah, just

focusing on documentation and and demos. And what have you found to be some of the most challenging aspects of building Luminoff

and using it for computer vision projects? I'm sorry. I didn't catch that. It cut off a bit. What have you found to be some of the most challenging aspects of building Luminoff and using it for computer vision projects, whether technical or social or in terms of just managing the overall development?

Yeah. Well, this was my my first, like, big machine learning project.

So,

my my my biggest problem was that in machine learning, usually if something fails, like you have a bug or something,

it doesn't really pop up. Like, the network still works and and it learns quite well. But you are, like, like, 5% of the market for where you want to be on precision

and you don't know why. And there's, like, no warnings or exceptions. So you have to just get into the code and look at everything.

So that was a big change for me because I was used to just run the code, it explodes, look at the error, and go on. And, yeah, in machine learning, you cannot do that at all. And when would Luminoth be the wrong choice for a computer vision application?

And at that point, what have you found to be some of the strongest alternatives?

Well, if if you don't want to, host your object detection service,

you could use, an an API

like, Google's API or Amazon's API, I suppose.

This is good if you are not worried about giving your data away and if you don't want to have much customization

over what the the model does. And also if you want to learn about, RC detection and you prefer CAFE over TensorFlow,

it could make sense to use Facebook's detection projects for that. And are there any other topics of discussion whether specifically related to Luminoth or computer vision in general, that we should talk about before we start to close out the show?

No. I think I'm I'm fine.

So for anybody who wants to follow what you're up to or get in touch about the work you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us on to the picks.

And this week, I'm going to choose

PyCon US, which is happening this year in Cleveland. And just like last year, I'm going to be sharing a booth with, Michael Kennedy from Talk Python TO Me and Brian Aukin and a few other people who are creating content for the community. So if you are able to make it out to PyCon, feel free to stop by and say hi, and I look forward to talking to you. And with that, I'll pass it to you, Joaquin. Do you have any pics for us this week?

Yeah. I've been watching lately a YouTube channel, which is called, 3 blue, 3 blue, 1 brown, which is just a math channel. And they do,

like, very good visualizations

of the concepts they explain.

And to me, it's always been very, much easier to understand something if I can see it in some way. So it helped me a lot. And especially because I had forgotten a lot of my algebra from from university.

So when I came into machine learning, I had to do a big review of all that. And they have a very good playlist on linear algebra. It's like, I think, 12 videos,

and it gives you a very good ramp up on the subject. So, yeah, it's a very good channel. Alright.

Well, thank you for taking the time out of your day to join me and talk about the work you're doing with Luminoff. Computer vision is definitely a very interesting and growing field and 1 that has a lot of complexity. So it's good to see you trying to simplify it a bit to make it more accessible to more people. So I appreciate that and I hope you enjoy the rest of your day. Oh, thanks for having me.

The Python Podcast.init

Summary

Introduction

Interview

Keep In Touch

Picks

Links

The Python Podcast.__init__