Summary
Computer vision is a complex field that spans industries with varying needs and implementations. Scikit-Image is a library that provides tools and techniques for people working in the sciences to process the visual data that is critical to their research. This week Stefan Van der Walt and Juan Nunez-Iglesias, co-authors of Elegant SciPy, talk about how the project got started, how it works, and how they are using it to power their experiments.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable.
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.podastinit.com/linode?utm_source=rss&utm_medium=rss and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
- Visit the site to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Your host as usual is Tobias Macey and today I am interviewing Stefan van der Walt and Juan Nunez-Iglesias, co-authors of Elegant SciPy, about scikit-image
Interview
- Introduction
- How did you get introduced to Python?
- What is scikit-image and how did the project get started?
- How does its focus differ from projects like SimpleCV/OpenCV or Pillow?
- What are some of the common use cases for which the scikit-image package is typically employed?
- What are some of the ways in which images can exhibit higher dimensionality and what are some of the kinds of operations that scikit-image can perform in those situations?
- How is scikit designed and what are some of the biggest challenges associated with its development, whether in the past, present, or future?
- What are some of the most interesting use cases for scikit-image that you have seen?
- What do you have planned for the future of scikit-image?
Contact Information
- Stefan
- @stefanvdwalt on Twitter
- Website
- Juan
- @jnuneziglesias on Twitter
- Website
- jni on GitHub
Picks
- Tobias
- Stefan
- Juan
- Matilda the Musical
- Water Rower Rowing Machine
- Bored Elon Musk OMG: “News app that connects to a blood pressure monitor and adjusts your feed accordingly.”
Links
- scikits.appspot.com
- Sphinx Gallery
- SciPy Conference
- Minimum Cost Paths
- Image Stitching Tutorial
- Elegant SciPy
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. I would like to thank everyone who supports us on Patreon. Your contributions help to make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode at ww w.podcastinnit.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your app or experimenting with something that you hear about on the show. You can visit the site at www.podcastinit.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. To help other people find the show, please leave a review on Itunes or Google Play Music, tell your friends and coworkers, and share it on social media. Your host as usual is Tobias Macy. And today, I'm interviewing Stefan VanderValt and Juan Nunez Iglesias about scikit image. So could you guys introduce yourselves? How about you start first, Stefan?
[00:01:03] Unknown:
Hi, everyone. Stefan here. I'm the founder of scikit image, and, I'm currently employed as a researcher at the Berkeley Institute For Data Science. And, Juan, how about you?
[00:01:13] Unknown:
Yeah. So, I'm Juan Enes Iglesias, and I'm employed as a researcher at the University of Melbourne. And before that, I was at Janelia Research Campus in the US, which is where I got started with second image analyzing neuronal images.
[00:01:27] Unknown:
And going in the same order again, how did you first get introduced to Python?
[00:01:31] Unknown:
So, yeah, I guess I've I've always been dabbling in all sorts of programming language, at school languages at school, and, I used Python way back for organizing my MP 3 collection. I really started learning it in all seriousness at a at a holiday job at a tech company. And that's also where where I was exposed for the first time to, extreme programming, for example. Scientific programming, I started doing sort of around 2, 001. I worked on the Octave project for a long time, and I was trying to switch our team over to an open source language for science for a long time. That didn't didn't happen. But then, my study advisor in Boulder met a young postdoc, and he brought him to South Africa to teach a Python course.
The team got very excited about Python. They wanted to switch, and I was just too delighted to to help him switch to an open source project. So all my attention moved from Octave onto, Python. And that post doc, by the way, was, Fernando Perez.
[00:02:27] Unknown:
I, interviewed Fernando a while ago to talk about IPython. So it's funny how these things come full circle. Yeah. And, Juan, how about you? Do you remember how you first got introduced to Python?
[00:02:37] Unknown:
I don't remember getting introduced to it. I remember I was so so I started out as a biologist, and I took a couple of electives at university to learn c and Haskell. And that was that. I was, you know, off to do biology, and then I needed to do some data analysis. And I started using c plus plus for that because that's what I've learned. And at some point, I switched to Python, and I don't remember that decision. But, obviously, it was a good 1. Yeah. And so so I finished my PhD in Python code with a bit of r for data analysis and so on. And then in my postdoc, I moved to a lab that was using only MATLAB. And, I learned that, and that was fine. But, eventually, I wanted to do some more general stuff that MATLAB wasn't so good at. That's just working with, graphs. And so I just ported the whole pipeline to Python, and that was just a huge effort, which again paid off. But, yeah, I I've kind of discovered Python twice, and the second time is, yeah, it stuck.
[00:03:32] Unknown:
So for anybody who isn't already familiar with it, could you guys give a brief overview of what scikit image is and how the project first got started?
[00:03:40] Unknown:
Yes. So scikit image started off as a collection of image processing snippets. We early on, we just, I wrote a lot of code from my PhD, implement a lot of algorithms and, people would ask on the mailing list, oh, you know, does anyone have, anything that can do, x or y? So I would I would email the code snippets and lots of other people did that too. And eventually, it just became clear that, you know, we had we repeatedly sent the same code to the list over and over. So we should just package all of that up. Initially, the, the goal was sort of to fully duplicate or reimplement the MATLAB image processing toolbox. And we even had a little table of compatibility to sort of make sure that we have all the right features.
That turned out to be a rather tedious process. The moment we moved away from that and decided we're rather just going to build a Pythonic library in the style that we really want to, the whole project became a lot more fun. Fairly early on already, we got a fairly big code injection from the Broad Institute's cell profiler team with whom we still collaborate. So they they provided many of the, algorithms along with the ones that I I and others provided, to kick start the project.
[00:04:54] Unknown:
So it seems like the primary focus is for being able to do manipulation of imagery that would be used in the sciences. So wondering if you can do a bit of compare and contrast in terms of the overall project focus between scikit image and other projects like SimpleCV and OpenCV or the Pillow library that people might be familiar with? Yeah. So these sort of, they cover fairly,
[00:05:15] Unknown:
different, use cases. In the in the Python world, we have these marvelous, NumPy arrays with a rich, semantics for slicing and and handling, you know, multidimensional data. So I think often when you when you work on a c plus plus library, that's sort of the first layer that needs to be built. We just assume that NumPy is there to start off with. So many of the the basic image processing operations can be done with with pure NumPy arrays. I think when you look at something like, pillow which is the replacement for the, Python imaging library, That's sort of a very generic, mechanism for doing things like cropping, adjusting, rotating. On top of that or in a sense an extension of that is like it's, scikit's scikit's nd image which, vastly expands build for scientific purposes. But it's still a fairly low level set of tools And scikit image then sort of tries to build on top of that. OpenCV is more aimed at vision, but it's more similar in scope in a sense to scikit image. And in fact, scikit image used to be an OpenCV wrapper before they had good Python wrappers that could expose NumPy arrays without copying. But as their wrappers improved, we no longer needed, to provide those ourselves. So we we took that out and we started focusing on implementing, useful basic scientific image processing building blocks. The real goal was just to get scientists up and running as quickly as possible. I think I checked recently, and my entire PhD project, I could rebuild in a weekend with scikit image. And that's sort of power you get by just, you know, having access to a vast array of basic building blocks. OpenCV itself is great. I think it's phenomenally fast. So it's perfect for processing videos, for example. But I don't think its documentation or its API is very good specifically for Python programmers.
And scikit image aims to be very Pythonic, to have good docs, doc strings, and to be easy both to use and to contribute to.
[00:07:07] Unknown:
Yeah. When I was doing my senior project for school, I was, trying to do some face detection in an image and using OpenCV directly. And, yeah, their their documentation is rather dense and hard to parse if you're not already familiar with a lot of the mathematical principles at play with image manipulation and image analysis. So fortunately, I happened to chance across SimpleCV for that. So it gave me the high level that I needed, But I can definitely see where a scikit image would be useful as well for the particular problem domains that it's focused on. So I'm wondering too if you can just draw a comparison between things like the facial detection or,
[00:07:45] Unknown:
simple image recognition stuff that people might use for use OpenCV for versus the types of use cases and problems that scikit image is more focused on? Yeah. So there's there's not a 100% overlap. But for this specific problem, we actually had a Google Summer of Code student, Daniel Pocomoffie. He worked on implementing cascade filters. Now the cascade filters in OpenCV, I haven't I haven't checked over the past month but last time I checked, they were still an implementation of a patented algorithm. So that's 1 of the things that we pay very careful attention to. You know, when we make sure that the BSD license that we have, it not only gives people the right to use our code but we also make sure that if people use the code on the BSD license, they're not gonna get themselves into into trouble. So we reimplemented the cascade filters using the same principles as as the OpenCV implementation and, you know, that work, we're still in the process of of finalizing all of that but it's available. But I would say, you know, OpenCV, 1 of the biggest challenges, the, challenges is getting it installed but that problem is maybe less hard than it used to be. And I think more and more my view is just that we should combine tools as necessary, you know, to get the job done. And we don't so we don't see ourselves as, as direct competition. We're building something that's maybe not perfectly,
[00:09:04] Unknown:
overlapping but but definitely somewhat different in scope. I would say also it's 1 of the greatest assets of the scientific Python community that we all seem to have converged on NumPy arrays. And I believe OpenCV these days does take NumPy arrays as input. Do you know if that's right? Yeah. That's that's correct. Yeah. And so it it's easier than ever to combine the tools that Stefan was saying and and just use the tool that you need for the function that you need. And and Scikit image provides some of those functions, and that will provide perhaps different ones. And the Scikit, sort of branding, I'm wondering what the relationship is between scikit image and scikit learn or any of the other scikit packages that other people may have used.
[00:09:46] Unknown:
The idea originally was that, we would have a name space that would contain all of these scientific toolboxes. So SciPy is, you know, SciPy is a wonderful set of tools. The problem is how do you easily experiment with with, a new algorithm that might not be mature enough to be included in SciPy? How do you if you feel like you want to fairly rapidly develop something, the SciPy development model is more geared towards slowly moving forward and producing high quality code. So the idea of the scikit was to give people the opportunity to more rapidly iterate on small specific pieces of functionality. We originally had it under single namespace. It was scikid dot whatever your package was called. There were some technical issues associated with that. So we just became like, SK image, SK learn, etcetera. And I think those 2 packages kind of rose out as examples. I mean, I think scikit learn is an exemplary package of of how you should, develop documents and publish a scientific code. And, you know, we're we're trying to do something very similar with scikit image. There's an index of, scikits available.
So if you go to scikits.appspot.com, you'll get a whole list of everything that's that's published out there. There's no overarching control or group trying to exert, you know, any control over how these packages are developed. I think what they share is this very powerful notion of like Juan mentioned, the NumPy array. We have a set documentation format that completely turned around the way scientific Python is is being used. It made it possible for people to learn it in classrooms. The the NumPy documentation format was sort of a crucial part of making, the scientific Python tools more accessible to newcomers And that's the way we sort of package and, you know, format our code. That's sort of what we share between these, various scikids.
[00:11:40] Unknown:
And do they all share a common dependency on scipy? And if so, does that provide an easy way of interoperating between the different scikit packages?
[00:11:51] Unknown:
Again, there is no requirement on how these packages should look. I think most of them depend on NumPy. But anything that plays well inside of the scientific Python ecosystem, which usually implies
[00:12:01] Unknown:
exposing NumPy arrays. But I guess nowadays it could also mean a pandas data frame or even an x array. Any anything would qualify basically. Yeah. I mean, I think in the in the same vein, if if you had some package that was something to do with natural language processing, then then the set of dependencies might be completely different. But it would still qualify to be a scikit, I think. As Stefan said, the the documentation format is 1 of the most important requirements, and and that is something that goes above and beyond what even core Python functions provide.
[00:12:29] Unknown:
Does it go beyond just using the docstrings in the Python? Do you have additional sort of framework or documentation format that's used to be able to generate the documentation that the scikit namespace requires?
[00:12:42] Unknown:
I also don't know if it actually is a strict requirement, but it should be. If you search for NumPy how to document, that has the source document specifying this. And and it does go above and beyond PEP 257, which is the Python docstring standard. And 1 of the things that it provides for is that every argument should be annotated with the types and what that argument does. And if you look at some of the core Python function documentations, the docstring is just free form text that sort of describes what the function does. But there's often not any specification as to what each argument should be, the type of that argument and the possible options. For example, I think in in 1 of the encoding if you're trying to save a string in Python or sorry, encode a string, it doesn't tell you what the valid encodings are. In in the NumPy docstring standard, you would have to provide all of the possibilities for that.
[00:13:33] Unknown:
Yeah. I think this is stated by the Python core team that that's not the primary source of, the documentation, you know, that that you find on the web in the in the Sphinx document. So but because we often spend time in the scientific world inside of an IPython terminal working interactively. You know, my workflow is typically import NumPy and then maybe, you know, look around for the function I wanna use, look at the docstring, find the input and output parameters, experiment until I find something that that does what I want it to, and then copy that into my or use that in my script, which is which is perhaps very different from from the way or from the workflow that that other developers
[00:14:07] Unknown:
use. Yeah. I would say, actually, that that's a very good point that in scientific Python, you've been, probably most of your time inside of an IPython terminal. And so it's very handy to just be able to type the function name and a question mark and have that function completely described for you, which might not be as true of core Python, functionality.
[00:14:24] Unknown:
I also want to mention the the Sphinx gallery, which is the kind of tool that gets developed sort of as a a side effect almost of doing something like scikit image or scikit learn. So so scikit learn, developed the swings gallery that's now also being used in scikit image, which is the script that basically executes code and then built a little page with resulting plots and a little pop out that shows some text. When you click on that, you get a piece of Python that you can paste straight into your terminal or your notebook and execute it and get that exact same image back. So that sort of infrastructure work is often done by 1 of the scikits or in collaboration between various scikits, but then to the benefit
[00:15:05] Unknown:
of everyone around. So going back to the scikit image package, 1 of the things that I was noticing in the documentation is the fact that because it's based on the NumPy n d array, it allows for multidimensional images. So I'm curious what some of the ways in which that can manifest as far as exhibiting higher dimensionality than just the standard 2 dimensional plot that most people are used to. And what are some of the kinds of operations that Psycodimage can perform in those kinds of situations?
[00:15:34] Unknown:
Yes. So this might be more my domain. And and this was 1 of my I I think this is the contribution that landed me into the Psyco image core team was making 1 of the algorithms work for for 3 d images. I still refer back to that pull request because it was just insane trying to get it to work in 3 d and trying to get it to pass, you know, different versions of NumPy and so on. So it's, slick, which is a a segmentation algorithm. So segmentation is just finding different, objects in an image. And the reason I needed things to do in 3 d is that in biology, we very often have 3 d images. So, 1 example is serial section EM where you take, a slice, in in our case of of a brain, and you image the top of it, and then you slice it. Sorry. Not a slice. A cube of tissue. So you image the top of it. You slice it, and then you image the next thing, kind of like salami. And so you you can then reveal the inner structure of the object in 3 dimensions. And so to find the connections of all the neurons in the brain, this is what you have to do. And then you apply some segmentation algorithm to, figure out the limits of all of the neurons and and where they're all connected. Another more common form of 2 d image that people might be more familiar with would be MRI. So, again, that's looking inside someone's brain using magnetic resonance.
And so you see those images in movies and so on of of people's brains in 3 d. There's lots of other, images, but because I'm a biologist, this this has been a bugbear of mine to try to get as much 3 d functionality inside an image. And when when we say n d, what that means is that the algorithm will work equally well for 2 d and 3 d, and that requires kind of more careful coding. But once you get used to it, it's actually very easy and and elegant to to write that way. I think there was 1 thing that I converted recently, which was Hessian filters, and it went from specifying this 2 d kernel and and applying the kernel to the image to a very elegant solution where you compute the gradients in 3 d, and then for every gradient, you compute a second gradient. And that was that was the end of that. So I guess what I'm trying to say is that n dimensional sounds like it's going to be harder, but it often ends up with easier code to read than just 2 dimensional code.
[00:17:42] Unknown:
So a few years ago, Juan started pushing for better support for indigemtionality. And I think at the time, I didn't have a very good feeling for for the kinds of data on which that or problems that that require that kind of processing. But it's been remarkable for me to see the uptake of scikit image since, we've improved in that regard. It was 1 of the road map goals that we set out in our 2014 paper, and the package has has had significant increase in 3 and higher dimensional coverage, since then. And whenever I speak to people from the biologies or, you know, microscopy is a good example where this is it's crucial to be able to do that sort of processing. So like Juan mentions, we've got there are a few patterns or paradigms that you that you learn and and then, you know, apply those, throughout to get access access to that sort of data. And it's just been it's it's been fantastic to see people actually use this and come back and say, like, you know, this this is what makes the package useful to them. Yeah. And 1 of the other examples of higher dimensionality that was mentioned in the documentation
[00:18:45] Unknown:
is film or sort of time annotated imagery as well and using the time as an additional dimension for processing images?
[00:18:54] Unknown:
Yes. So I I would say that one's a little bit trickier. So 1 of the issues that you have with higher dimensional images is that typically the 3rd dimension the distance between pixels in the 3rd dimension is stretched relative to the other 2. So for time, that becomes even more complex. How how far is 1 frame from another? Because at least in space, you know that 2 micrometers is a a certain ratio different from 0.5 micrometers if if you look at the x y and the the zed dimensions. But with time, that conversion doesn't isn't as obvious. But, certainly, there are many algorithms where you could just take time to be a a third dimension, have some some conversion factor, and and do it. So, for example, if you do segmentation, which, again, is picking out optics in an image, and you you do that in 3 d in a video, then that that gives you tracking for free. So you can track objects as they move simply as as being segments in a in a three-dimensional lobby. But that that may or may not work depending on on your frame rate and so on.
[00:19:51] Unknown:
And for the implementation of scikit image, what does the internal design look like, and what has been some of the biggest challenges associated with the development process, whether past, present, or future?
[00:20:05] Unknown:
I think like with many of these packages, scikit image is a collection of routines that scratch various people's edges, you know. So I wrote this code as part of my PhD. Lots of people, you know, contribute whatever they need to to do their work. And it's often easier well, that's probably not true. But if you want to develop in the long run, it's often easier to contribute it back and have other people maintain it than to do it yourself. Of course, there's an initial barrier to contributing. You need to clean up the code, document it, etcetera. But I think there's a certain you know, it's quite rewarding to go through that whole process and and get it integrated into into the package for others to see. You asked about the challenges. I think for a long time, for me finding time to consistently spend on the project used to be hard. The project used to be done mainly on evenings, weekends, etcetera. Everyone knows that story. At least for the past 2 years, it's been a lot easier because since, I'm working at the Berkeley Institute For Data Sciences, this is sort of a part of my job. So I've got a lot of freedom in that regard. That said, you know, I'm still very interested in finding a consistent funding source for the project. I would love to have at least a person, a project manager slash programmer who can ensure, contiguity sort of who watches PRs to make sure that the project is always moving forward. Not necessarily rapidly, but at least consistently.
[00:21:22] Unknown:
Yeah. I wanna say another just an aside from Stefan's talk of the development model and and having people contribute what they need. It really is rewarding and and, paradoxically, 1 of the most rewarding moments I had was when I contributed. This was early on in my contributions to 2nd image. And I think I made slick simple linear iterative clustering, which is segmentation algorithm. And I made that 3 d, and I accidentally broke something of the 2 d functionality that I can't remember. But, you know, as soon as we release that next version, someone complained on the Internet, some random stranger. And to have someone complain on the Internet about your code, it's actually a really good experience because it means more than 1 person is using your code, which when you're doing a PhD, it it can often seem like you're the only person in the world who cares about what you're doing. Yeah. And particularly with open source, sometimes the only way to get any feedback about the work that you're doing is if it's
[00:22:12] Unknown:
broken. Yeah. That's right. That's absolutely right. Yeah. I think you read a lot about people who get like complaints about the code they write and, and who have a tough time with users, expecting them to do almost like customer support, that sort of thing. I think we've been very lucky. We have a fantastic user community. The Python user community is incredibly friendly overall. And this was this was 1 of the things that stood out for me right from the beginning is I I spoke to strangers, and they would be welcoming. They would take time to teach. They would take time to to help you get started on the project. And that welcoming spirit that I experienced back then, I think it's still very much alive and well today in the scientific community, scientific Python community.
And, of course, we we try and keep it alive. Yeah. I'll I'll say
[00:22:59] Unknown:
something else about that, which is how I got into second image, which is when I first went to I mentioned that I ported this whole thing of MATLAB pipeline to Python during my postdoc, and then I went to SciPy 2012 to present that work. And and at that point, you know, the whole conference was not the whole conference, but many of the presentations were by people who were the authors of PIPHY and IPython. Fernando Perez gave a keynote, which was amazing. And I was just kind of using all of these packages. And when I went to do that presentation by the way, everyone should go to sci fi because that that conference really did change my life. I got an email from Stefan saying, hey. I'm the author of Scikit Image. Would you like to come to the sprint and and do some coding on, you know, tomorrow, I think it was? And so that's how I got my first that that was the time that I did my very first GitHub pull request. I made my very first open source contribution.
That wasn't just my code put on GitHub. And, yeah, that's that's how I got started. And and contributing to open source, is something that isn't obvious at the beginning, but then very quickly becomes second nature. And and it's a great way to get your code in more users' hands, and it's extremely rewarding, basically. So what are some of the most interesting or unexpected uses of the scikit image package that either of you have seen? Early on when we first published the paper, 1 of the guiding papers was someone doing satellite reconnaissance, which is obviously far out of either of our wheelhouses, and and that was great. More recently, someone commented on the paper about their use. So so the paper is on on p o j, which allows and, like, many social platforms is relatively sparse, but but we got this 1 comment. I'm an ecologist who was using minimum cost paths, which is an algorithm that you use in images, for example, to find themes and images if you wanna stitch 2 images together. Stefan has a great tutorial on image stitching that uses it. And they were using it on maps where the cost was essentially like a movement cost, and they were using minimum cost paths as a way to predict species range using using some model for for how species can move. So, again, that's something that's completely different to what I think any of the previous contributors to 2nd image would have envisioned, which again is very, very gratifying to see when someone uses your project in a completely different way than you expected.
[00:25:12] Unknown:
A fun project I worked on in South Africa was the identification of white sharks. And the researcher there tried to count the number of individuals on the coast and that was via visual identification. So we used like an image to sort of detect the the edges of the fins, which acted like a fingerprint for each individual. Now the other example I thought of was, Neil Yeager posted a blog post a while ago showing how when you have a visual identification system, say for unlocking a gate or a door, people can use rubber masks to to fill the system and so he added via Raspberry Pi and some scikit image code, a detector that would figure out when is someone using their actual face or a rubber mask to present a certain identity. That's pretty cool. Yeah. I also just wanna mention that there are some very interesting novel algorithms being implemented nowadays in in deep networks.
And it's going to be interesting for us implementing more traditional algorithms to see how this impacts on our work. Some of these algorithms like the purely convolutional networks, we might be able to eventually integrate into scikit image. But this opens an entirely new frontier of exploration.
[00:26:23] Unknown:
It's going to be interesting to see how it plays out. Given the somewhat complicated nature of the algorithms that are in play for doing the different image analysis, do you find any difficulty in attracting contributors to the project?
[00:26:36] Unknown:
No. I don't I don't think so. I mean, we we just released version 0 13, and we had on the order of, did the release so he would know, was it 82, 83 people maybe? Yeah. So I didn't count how many of those were new contributors but that's certainly much larger than our standard team of about 10 or 15 people. I still run into researchers who tell me, oh, thanks for the package. We we use it on a daily basis. We don't often see those people. They're not all mailing list. They're not we're not in active conversation with all of them. In fact, it's quite challenging to figure out who is using the package, what's what purpose are they using it for. And in a sense, that's also what makes it hard to guide the package and why we rely on contributions
[00:27:19] Unknown:
to some extent to determine which direction we have to go. Yeah. If we could, find a way to increase user engagement in that way, that would be really good. I mean, as I mentioned, I started out just being a user. It's like like a image. And so and it was an amazing change to to start contributing and and to realize that contributors are just people like you and me. They're not these professional developers who are, you know, completely out of this world and and who to do this day in, day out. So there are people who work on image processing and write the code that they need to do on their work. So it would be really great to have more users come forward and say, oh, it would be great if this function did it's a little bit different. And and we can help them to to solve that problem.
[00:28:01] Unknown:
Yeah. I think at every scientific conference scientific Python conference I go to and we have a sprint, you have this amazing engagement with people who want to do image processing and maybe even want to contribute to the package but it might just seem a bit intimidating or scary from the outside. And, I mean, that's no small wonder. I mean, you need to know a fair variety of tools. You need to understand Git. You need to know, you know, how to write the documentation. You need to understand Python fairly well, no NumPy. So all of these things can be a bit overwhelming. But I think it's 1 of the most satisfying experiences to sit with people, help them to to succeed in implementing what they want to implement.
Not getting only getting their own results, but also contributing what they've done back, whether that is in terms of code or perhaps in terms of just a small patch to the documentation or the installation instructions. Inevitably, that's my most rewarding, experience at each, SciPy conference. Yeah. And, couple of
[00:28:58] Unknown:
things that, Juan mentioned are recurring themes in a lot of the conversations I have in terms of the difficulty of getting feedback from users, particularly if they don't have anything necessarily to complain about or bugs to report. And And that it it's maybe that it's easy to contribute easier than than you think. Right. Yes. And also the fact that it is easy to put a lot of the people who create different open source packages up on a pedestal and do and succumb to hero warship. But 1 of the things that I've learned while doing the podcast is that people in general, regardless of the level at which they contribute, are far more approachable than you may at first assume. And that's been a pleasant surprise in terms of inviting people onto the show and the willingness with which they will come and join and share the work that they've done. And then in terms of the feedback, 1 of the things that I actually was just thinking about is the possibility of adding an API into the package of, you know, something like s k image dot feedback that would open the user's browser to, like, a type form survey so that people could voluntarily contribute information about how they're using the package, what their thoughts are on it, things like that, and just using that as maybe a general practice for people who are interested in hearing back from the users. We we've had that discussion a couple of times, if I could imagine. It's such a hard balance strike between, you know, annoying the users
[00:30:20] Unknown:
and trying to to get more feedback. And I think the iOS app store has this debate of, like, apps bugging you for reviews, and and some people have the policy that if they they get bugged for review, they'll leave it once for review. So so it's yeah. I I don't know what the right solution is. I think if SKM is done feedback might work, if but but there's a catch-twenty 2 if everyone has to do it before everyone knows about it. Right. Having a feedback mechanism,
[00:30:44] Unknown:
could work very well. I think the 1 thing we couldn't do in the past that would have helped us to discover the kinds of usages we see is to have some kind of automated, you know, phone home system. We can't do that sort of thing in in open source libraries. We would not want to and I think that's where, you know, commercial packages have a much easier time. So they just do it without caring about the user's privacy. But, you know, there might be a few other opportunities for doing it. I know some people around bits have been thinking about this problem and, you know maybe integrating a feedback mechanism into Jupyter or something similar. We just don't have anything like that,
[00:31:16] Unknown:
built right now. Yeah. I can definitely concur that having projects or, products that I use sort of nagging me to leave reviews is generally not the best experience. But if it were just an API call and particularly given the fact that a lot of the work that people do in Python is done at the terminal and sort of exploring the different APIs available. Just having it present may be enough to at least get some people to, out of curiosity, call the API or explore what the API is intended to do and then end up, giving you some information about how they're using the package. Yeah. The other the other thing I think we talked about most scientists just use the IPython terminal or a Jupyter Notebook. If we did adopt the unified feedback mechanism, like,
[00:31:59] Unknown:
feedback, NumPy dot feedback, etcetera. If only IPython mentions this in their header, which is already there, that might actually get quite a bit of adoption. I just thought of that. I don't know if that's a good idea or not.
[00:32:11] Unknown:
You know, I've thought that, for packages like this there might not be enough on Stack Overflow for, for scikit image, but for something like Pandas, which has a vast API, I think it would be fascinating to analyze all the posts or perhaps all the successful or the accepted answers on Stack Overflow and analyze them, for example, usage or to see which which functions were used. I've been having in my back of in the back of my head to, like, do a version of the pandas docs where you, set the alpha value of of the, the function based on how many times it gets, mentioned on on Stack Overflow.
[00:32:47] Unknown:
So are there any plans for the future direction of scikit image that that you'd be interested to sharing here?
[00:32:54] Unknown:
So I mentioned we just released, version 0 13. 1 was the manager for that. So that's that's out the door. In fact, he pushed it out right as I was giving a tutorial, which made me a bit nervous, but that all worked out well. I think we, we've spoken a bit about user feedback. So, you know, that's that's our most valuable input in, moving the project forward. For the rest, I've been having more and more, conversations with people from other tools like, ImageJ, Elastic, CellProfiler, etcetera. I'd really like to expand the ways in which we we interact with those packages, in which we advertise those packages. You know, there are many different workflows out there that we never see where scikit images combined with perhaps Elastic to do some segmentation.
There are tools for annotation. There are tools for pre processing, post processing. And people who want to solve problems in the most efficient way possible, they use whatever tools they need to. But we have very little insight into that. So so that's part of my mission is to figure out, you know, how we can improve the package both to work with those tools and to provide the the necessary set of functionality.
[00:34:02] Unknown:
Yeah. In that vein of this year, we're participating in, our GSOC, which is Rail Girl Summer of Code. And 1 of the projects that was proposed was to develop an applications gallery, which is different from our current gallery, which shows the functionality of saccade image. Applications would not be limited to the set of dependencies that saccade image has. So you could do something like have an application that combines saccade image and saccade learn together, which doesn't fit in the normal gallery. And so having more documentation as to how PsycAnimation can work well with other packages would be really good in that regard. My own bugbear, of course, continues to be n dimensional processing, and there's still a lot of library that doesn't work well with, n dimensional images. So I've got a pull request open now converting moments to n dimensional data. But as I would say, making sure that that the API is consistent in in supporting MD wherever it's possible, not just where it's been convenient in the past. And there was also this work from this guy called Adrian Seeber. We don't actually know who he is, but he built this iQedemix cheat sheet. And he did that using an automated script. And by using an automated script, he was able to find a lot of the inconsistencies in the API because the scripts break when when they're trying to do something and and the result is something unexpected. And so he submitted an issue talking about all of the different small inconsistencies in the MCI. And on the road to 1.0, it would be really great to eliminate all of those and really make scikit image as perfectly consistent package as possible.
[00:35:35] Unknown:
Yeah. I think the the other story we keep hearing is that, you know, I'd love to use scikit image, but I currently use maybe OpenCV because, the speed isn't quite there and I need to process, 200, 000, 000 images. It's funny. We've also heard this story that, you know, we have 2, 000, 000 images, to process, but we don't care about letting it run overnight. So and and we prefer the the API. But I think there's a there's a huge number of people who who want to use the package, but but they can't sit around and wait for the result. So there's a question as to how to how do you deal with that situation? We don't have GPU experts and I'm not sure we wanna go that direction. You know, we don't wanna make the code even harder to access. At the moment, we're pure Python or Cython. If you know those those 2, you can contribute to any part of the library. And I don't want to introduce sort of knowledge dependencies, but I do think we need to to address this problem. Now there are some very interesting players, in that field, and I think the solution is going to come from, optimizing Python compilers.
So, you know, the the just in time compilers like Numba or like I mean, there are so many of, different packages exploring that that space right now. And maybe we'll get something for free. Maybe we'll have to to do something, with Halide, something more custom. But I hope I hope technologically that becomes feasible within the next year or 2. Yeah. And the, Pigeon project that has recently been merged into mainline for 3.6
[00:36:59] Unknown:
may help with that as well in terms of people developing custom JIT, implementations that would be able to plug in specifically to things like image processing.
[00:37:08] Unknown:
Yeah. We're definitely watching that space with great interest. Absolutely.
[00:37:12] Unknown:
So for anybody who wants to get in touch with either of you or follow what you're up to, I'll have you send me your preferred contact information to include in the show notes. And with that, I will move us to the picks. My pick for this week is a game that I got recently for myself and my family called Set. And it's a set of cards that have a series of images that have 4 different attributes. They can be of different colors, shapes, quantity, and different sort of filling for the images. And the idea is to find a set of 3 cards that are either all the same or all different across each of those attributes. So you could have 3 cards where there are 1, 2, and 3 of the same shape that are all different colors, but all have the same filling. And so it's a very simple concept, but very, sort of nefarious in terms of how the gameplay actually pans out. So it's a lot of fun. It's won a lot of best games awards, and it's definitely worth picking up. I I've played that. It's a really, really awesome game. Yeah. It's a lot of fun. Stefan, do you have any picks for us this week? Well, yeah. Since we're on the, topic of games, I don't play very often, but there's a genre that kind of captivate captivates me,
[00:38:23] Unknown:
the old, point and click adventures. And I used to when I was a kid I played monkey island all the time. Recently Ron Gilbert, funded by Kickstarter, published thimbleweed park. And it's just a beautifully done game. It's sort of done in the style of a 1987 adventure point and click. Lots of humor. Very much along the veins of the of the, old LucasArts, adventure games. So lots of humor and, just gorgeous graphics. When I I kept up to date with the blogs as they developed the game and there are a few very cute engagements with the community that they made during the development. Like they have a phone book that's got messages from a lot of people who backed the game for example. But the 1 thing out of an image processing perspective that struck me as interesting is that, during the last week, they had to do the artwork for Steam and for Gog. And, they mentioned there that they needed to do the artwork for all the different sizes of displays. And I thought, why don't you just resize the art? And I realized, well, when you do pixel art, that's no longer an option. You actually have to redraw the graphics to fit, with a specific resolution. So, just magnificent to see the kinds of technical challenges that that, that team faced in developing the game. So, yeah, I've been wasting a lot of time on that this week.
[00:39:38] Unknown:
And, Juan, do you have any picks for us? So the first 1, it's not it's not very current, but but it's still touring. And it's Matilda the musical, and, it's just fantastic, really beautiful message. It's also a bit of a, you know, tribute to nerds all around. So so I think the audience will appreciate it. So if if you find it in your city soon, definitely go check it out. And the other 1 might be a bit, unconventional. It's it's fitness. So if you like rowing, I think rowing is a good workout. The most popular rowing machine is the Concept d 2, and and I consider it to be vastly inferior to this, thing called WaterRower, which uses a pad of water for resistance, and it's made of beautiful wood. And this this field looks really a lot nicer, a lot closer to to the feeling of being on the water. So, yeah, if you if you use the concept of these, have a look at WaterRower.
[00:40:29] Unknown:
So I I often, have some of my best ideas in the shower. So this week, I got, a pad of AquaNotes and they're fantastic. You put them up against a wall and you can make notes while you're showering. No no more of those great ideas, Gordon. I'm sure this will bode well for the future of second image.
[00:40:49] Unknown:
That's excellent. There's this Twitter account called, bored Elon Musk, which is all these kind of random inventions. And and this there was a very similar 1, which was, you know, a glass shower glass door where if you scribbled on on the steam, it it would send it to your Evernote.
[00:41:04] Unknown:
Oh, nice.
[00:41:07] Unknown:
So with that, I would like to thank you both for taking the time out of your day to join me and tell me more about the work you've done with scikit image. It's definitely an interesting package and, 1 that I'm sure a lot of people will be happy to learn more about. So thank you both, and I hope you enjoy the rest of your day. Thanks for inviting us. Thank you very much.
Introduction and Guest Introductions
First Encounters with Python
Overview and Origins of Scikit-Image
Comparing Scikit-Image with Other Image Processing Libraries
Use Cases and Applications of Scikit-Image
Multidimensional Image Processing
Challenges in Development and Community Contributions
Interesting and Unexpected Uses of Scikit-Image
Future Directions and Improvements
Contact Information and Picks