Summary
Digital cameras and the widespread availability of smartphones has allowed us all to generate massive libraries of personal photographs. Unfortunately, now we are all left to our own devices of how to manage them. While cloud services such as iPhotos and Google Photos are convenient, they aren’t always affordable and they put your pictures under the control of large companies with their own agendas. LibrePhotos is an open source and self-hosted alternative to these services that puts you in control of your digital memories. In this episode the maintainer of LibrePhotos, Niaz Faridani-Rad, explains how he got involved with the project, the capabilities that it offers for managing your image library, and how to get your own instance set up to take back control of your pictures.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- This episode is sponsored by Mergify. It’s an amazing tool to make you and your team way more productive with GitHub. Mergify is all about leveling up your pull requests with useful features that eliminate busy work. Automatic merges allow you define the conditions for acceptance and Mergify will take care of merging the pull request as soon as it’s ready. Automatic updates take care of merging your pull requests serially on top of each other, so there is no way to introduce a regression. With a merge queue you can merge your urgent pull request first, organize your Prs as you wish and Mergify will merge them in that order. Mergify’s backports feature will even copy the pull request into another branch once the pull request has been merged, shipping your bug fixes on multiple branches automatically. By saving time you and your team can focus on projects that matter. Mergify is coordinated with any CI and fully integrated into GitHub. They have a Startup Program that offers a 12 months credit to leverage Mergify (up to $21,000 of value). Start saving time; visit pythonpodcast.com/mergify today to sign up for a demo and get started! Or just click the link in the show notes.
- Your host as usual is Tobias Macey and today I’m interviewing Niaz Faridani-Rad about LibrePhotos, an open source, self-hosted application for managing your personal photo collection
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what LibrePhotos is and the story behind it?
- What are the core objectives of the project?
- What kind of users are you focused on?
- What are some of the major features of LibrePhotos?
- There are a number of open source and commercial options for different photo oriented use cases. What are the main capabilities that influence someone’s decision to use one over the other?
- Many people’s baseline expectations will be around services such as Google Photos or iPhotos. What are some of the challenges that you face in trying to provide a comparable experience?
- One of the features that users rely on with these services is backup/disaster recovery of their photo library. What is the recommended approach for users of LibrePhotos?
- Can you describe how LibrePhotos is architected?
- How have the design and goals evolved since you first started working on it?
- How have recent advances in machine learning algorithms and related tooling improved the availability and quality of advanced features in LibrePhotos?
- How much improvement of accuracy in face/object recognition do you see as users invest in cataloging and organizing their collections?
- Is there a minimum quantity of images/iindividual people that are necessary to start using the ML powered features?
- What kinds of storage locations are supported?
- What are the interfaces available for extending/enhancing/integrating with LibrePhotos?
- What are the most interesting, innovative, or unexpected ways that you have seen LibrePhotos used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on LibrePhotos?
- When is LibrePhotos the wrong choice?
- What do you have planned for the future of LibrePhotos?
Keep In Touch
- derneuere on GitHub
- @der_neuere on Twitter
- Website
Picks
- Tobias
- Uncharted movie
- Niaz
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
- LibrePhotos
- Self-hosted Sub-Reddit
- OwnPhotos
- Google Photos
- Google Takeout
- Digikam
- x265
- HEIC Files
- RAW Image Format
- ImageMagick
- Panorama Photograph
- Lytro light field cameras
- rq asynchronous task library
- Typescript
- Redux Toolkit
- MobileNet v3
- DLib
- ARM Processor
- Docker Compose
- LibrePhotos Comparison Page
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. This episode is sponsored by Mergify. It's an amazing tool to make you and your team way more productive with GitHub. Mergify is all about leveling up your pull requests with useful features that eliminate busy work. Automatic merges allow you to define the conditions for acceptance, and Mergify will take care of merging the pull request as soon as it's ready. Automatic updates take care of merging your pull requests serially on top of each other, so there's no way to introduce a regression.
With a merge queue, you can merge your urgent pull request first, organize your PRs as you wish, and Mergeify will merge them in that order. Mergeify's back ports feature will even copy the pull request into another branch once the pull request has been merged, shipping your bug fixes on multiple branches automatically. By saving time, you and your team can focus on projects that matter. Mergify is coordinated with NECI and fully integrated into GitHub. They have a startup program that offers a 12 months credit to leverage Mergify up to $21, 000 of value.
Start saving time and visit python podcast.com/mergify today to sign up for a demo and get started, or just click the link in the show notes. Your host as usual is Tobias Macy. And today, I'm interviewing Niaz Faradhanirad about Libre Photos, an open source self hosted application for managing your personal photo collection. So, Niaz, can you start by introducing yourself? My name is Niaz. I'm a software developer, obviously.
[00:02:19] Unknown:
I'm 25 years old. I'm currently living in Berlin. 1 thing about me is that I'm always working on some kind of side project. Libre photos is 1 of them. And do you remember HeyFirst got introduced to Python? The first time I got introduced to Python was in the 10th grade. I learned it in an IT class. I only remember that it was easy, but I didn't understand what classes actually do. So I understand that concept. Then I studied for a bit and then started to work. I always learned Java at university. So, yeah, for a couple of years, I just did corporate Java route, used Spring and stuff like that. I basically started working with Python again because of Libri photos. I didn't pick the pick it up. Like and then I found it quite easy. Like, it's
[00:03:06] Unknown:
easier than Java, for sure. And you mentioned that libre photos is just 1 of your side projects. I'm wondering what your kind of criteria is or sort of selection process for deciding where you wanna spend your time outside of work and the types of investments that you wanna make and the side projects that you engage with?
[00:03:24] Unknown:
This is usually something really, spontaneous. Like, I'm bored on the weekend, and I'm just thinking, hey. What do I want to do? And then I start a new project, basically. That's usually how it goes. It's also not always like programming related. Sometimes I do something completely different. But, yeah, usually, I just decide on a whim which project sounds fun, and then I do it. And, obviously, like, a lot of side projects don't take off like EvoFoTos did, but some do, like Libri photos. Yeah.
[00:03:57] Unknown:
And so in terms of the Libri photos project, can you describe a bit about what it is and some of the story behind it and how you decided that this was an area where you wanted to spend your time and focus and how it is that it maybe took off more than some of your other side projects?
[00:04:12] Unknown:
Yeah. At the start of the pandemic, I decided that I want to learn Linux again. I also wanted to learn DevOps. So I started experimenting with, like, Docker containers and stuff. I was also always browsing on r/selfhosted, which is a subreddit for, like, every project, basically, that has anything to do with with self hosting. Yeah. Then in October 2020, I decided, like, that I want to try out own photos, and I got it to run. I then wanted to implement a small feature. I created an issue, I think, for it. But then I realized, okay. Nobody maintains this project. The original author, Huram Nam, stopped working on it in 2019, but I really wanted that feature. So if you do face recognition on photos, then back then, you had to click on each single face, and you couldn't select multiple faces. And this is something I wanted to do.
So I just decided, okay. I'm going to do that. It took me a while to get it up and running because, like, 2 years is a long time in the developing world. Yeah. But I got it up and running. I shared it on the own photos Discord, and multiple people pitched the idea of a fork to me. I wanted to prevent the situation that happened to own photos. Right? So it was only, like, 1 person working on this project, and and nobody knew how to get the code up and running. So when I started, I always wanted to, like, create an organization for that. So we decided on the name, Libri photos. And, yeah, now we have a couple of people working on it, on and off. And, yeah, I think it worked out pretty well. And I think it was successful because there was already, like, a community around it, but nobody knew how to program it for it. So that helped.
[00:06:00] Unknown:
And in terms of the core sort of objectives and priorities around the project, I'm wondering how you think about what the guiding principles are and how you decide what features are worth inclusion and what are the capabilities that are maybe better left to other projects?
[00:06:20] Unknown:
Right. So the core objective is basically that I want to create a community of people around people photos so that it's self sustaining. Right? So that it's always maintained, and you can always improve it. If multiple people work on it, you can always expand the scope. So, usually, if an idea sounds good to me, then then somebody can, like, implement it. But 1 important thing for me is that everybody only works on the stuff he or she wants to work on. Right? So that way, you always get, like, basically, a self directed road map because, obviously, some users have, like, really ambitious ideas, but nobody wants to implement them. So they never happen.
The obvious stuff obviously gets implemented pretty fast because it's easy to do and obvious.
[00:07:05] Unknown:
Management of photographs is, at face value, a fairly straightforward effort, but there are different sort of categories of people or categories of use case that might push a project in vastly different directions where, you know, Libre photos, it seems, is largely focused on individual users and personal photo management, whereas there are other projects that are more focused on professional photographers and being able to manage a portfolio or being able to manage access to individual customers. And I'm wondering if you can just talk to the sort of general category of use cases and users that you're focused on supporting and some of the ways that that has informed the prioritization or selection of the features that are core to Libre photos?
[00:07:55] Unknown:
Right. So if you go to the demo instance of Libre photos, you can obviously see it's very similar to Google Photos and was the starting point for the user experience. And that is I think we shouldn't divert from it too much because the people at Google and at Apple and at Amazon, they all implement the same user experience. So it can't be that wrong to implement that too. But outside of that, I think that we should also support large collections of photos, obviously. Also, everything around, like, managing your photos on your local hard drive. I'm not a fan of, like, uploading stuff into the cloud, and yet then you never get the data back.
For example, I try to do it with my Google Photos collection, and you get go to the Google takeout. And then you get, like, 20 different parts in in zip files, and you don't know where your actual photos are or how they're sorted and stuff like that. I'm not a fan of that. Yeah. So every feature that evolves around, like, managing all the file systems, obviously, also core to libre photos. Another issue we have, obviously, if if we think now, okay, we have a file system and users who already work with the file system, how can we import the whole photo structure into libre photos and extract all the data from that? So that's also an important questions we have to ask ourselves.
Right? Because people obviously have maybe sorted, like, by year and then maybe for the holiday. And then they have, like, a category of pictures within it. And how do we extract the data from it? And we obviously are not allowed to move these folders or pictures in any way. Otherwise, people get very angry angry very fast. So, yeah, these are, obviously, the questions around it. Yeah. And the kind of users currently, we're looking for the kind of users who are technical. Right? Because we're currently only, like, a couple of people working on it, obviously, part time. So there are a lot of technical issues. Right? So if you're working with a file system and, like, with a new website and yeah, a lot of stuff can go wrong there. And you have to be able to look at an log file and see, okay. There's an error there, and send it to me. If you're, like, a user who who doesn't know how to do that, then it's kind of hard for me to do the debugging, obviously.
[00:10:17] Unknown:
In terms of the decisions that people go through where they say, I wanna be able to manage my photos and, you know, they're looking at Libre photos, or I know that there are a few other open source web based options. And then, of course, there's the multitude of desktop oriented solutions. What do you see as the key decision points that will lead somebody to saying, Libre photos is the right choice for me versus I actually wanna go and use, you know, Digicam or what have you?
[00:10:45] Unknown:
It depends. Like, I think it's not that clear cut. Most of the time, it's like 1 feature that is missing from 1 solution. Right? So a lot of people looking at Libre photos, for example, because they want to look at their photos but also want to manage them on their file system. And that's something DigiCam, for example, can't do, right, because it's a local app. And then you could do, like, remote on the desktop and then click on it. That's obviously not a nice user experience. Sometimes it's like a missing file format. Right? So x 26 5, for example, for video files is, like, kind of problematic because most browsers don't support it. So you need, like, a transcoding system of some kind. And especially if you're, like, an app user, you need that because every video is encoded in x 265.
Same goes, for example, for dot iheic files. Right? So the photo file, the proprietary 1 from Apple. If you use an Apple device, you obviously have to understand that. Yeah. So the users have, like, a very weird features that they want. The kind of users that that come to Libri photos are not always, like, looking for an open source alternative, but sometimes only because Google Photos or Apple Photos doesn't support a specific use case for them. So you get requirements that are basically all over the place.
[00:12:09] Unknown:
So in terms of the user experience, you mentioned that Google Photos and Icloud are what a lot of people are going to be looking for in terms of the kind of base expectations of how the software is supposed to operate. And libre photos is modeled a bit after the Google Photos user experience. And I'm wondering what you see as some of the challenges that are presented by the fact that people are coming in with these expectations of, I just want a replacement for Google Photos, and you know, now my expectations are set based on all the features that are available in that system and being able to manage those expectations and understand, you know, which ones you want to fulfill and which ones you say are actually, you know, completely out of scope or on the road map but not implemented yet. And just that overall aspect of maintaining a dialogue about how close to Google Photos we want to be and how much do we want to be our own system?
[00:13:06] Unknown:
So so, obviously, the use cases Google, like Apple, and so on, like, focused on are very important to the user. The most important 1 is definitely, like, not backing up the photos, but, like, getting them from the phone somewhere in the cloud. So that's the basic use case most people are using it for. That's something we want to do. We started doing that. You can now upload via the website, basically, your photos, which obviously isn't the right user experience, but it's like, it partly implements the use case. So if somebody asked me, hey. When can I automatically back up my photos in my Libri photos instance? I always say, soon, but I don't know when because, like, it just takes time. It it's way harder than you think it would be. Another big issue is search engine. Right? People use Google Photos and Apple Photos as a search engine for their personal photo collection. For example, a lot of people take notes, and then they make a photo of them, put it into Google Photos, and then they're searching, basically, with Google Photos through their notes.
They didn't know at first that this was possible, but I guess it is. And it also, supports a lot of, like, fairy language that is pretty complex. And, obviously, the ranking is pretty good. And I always tell the people, yeah, I'm working on that, but I can't promise anything. Right? Because, like, search is such a big topic, and you can burn, like, years on that topic alone. Automated photo management's obviously another 1. They will not probably support, like, something like the automatic video feature that Apple now has. Like, they take your pictures and put, like, cool music under it. So that's, like, way out of scope. For that feature alone, I would take probably a year for me to do that. So that's not in it. But, like, other basic automated photo management things are obviously in scope. For example, clustering photos that that look similar and, like, maybe if you have, like, raw pictures and, like, JPEGs from them, that they're basically 1 picture and stuff like that. So that's obviously in scope.
Sharing is another thing people are using it for, And I think that's a use case we can definitely improve on. That's not too hard to do. And we can also basically add new features that Google Photos doesn't have. Because, like, that's a whole, like, photo publishing scene. Like, when you're a photographer, you want to share your current album with, like, the correct copyright header and stuff. That's currently just not possible with Google Photos, so they have to use, like, Flickr for that. But Flickr also has now very expensive data plans and stuff like that, so they're looking for a solution. So that's obviously the use case in Slice we can take and improve on. Another
[00:15:50] Unknown:
constantly moving target is some of the different extensions of a still image that's, you know, in JPEG or PNG format where different smartphones will have the concept of a live photo where it will actually start capturing motion of an image prior to actually taking the snapshot. And then if you hover over it, you can actually see, you know, 5 seconds of a kind of the silent video, for instance, or, you know, some of them will automatically encode audio into the still image so that there's, you know, no motion, but there's audio. And then there's, like, panorama shots where you're taking a 360 degree image, and then it's stitched together into a single file. And I'm wondering what types of challenges those different extensions to the concept of just a photograph are introduced when you're building something like LibreOffice, and how much of it you're able to maybe just lean on browser support to be able to manage some of that replay?
[00:16:45] Unknown:
So the pipeline works basically like that. We have, like, a couple of libraries where we take the, original, and then we put it through that, and then we have a bunch of thumbnails. We use ImageMagick, which is pretty well known for that, and that's another photo library, which is, like, way faster than ImageMagick. So every format that gets supported by it, that goes through that first. And now we basically have, like, a viewable raw image or, like, viewable hike image that the user can view on the browser. But for all these new kinds of photo types like panoramas and live photos, you obviously also have to implement, like, the presentation for that and the clustering. So that yeah. The presentation is basically the same as on your phone.
And the problem with that is that there isn't really a standard for that as far as I can tell. I'm still looking into that. It's some kind of sign up, but I didn't find anything. So you basically have to implement the different conventions the smartphone makers make, basically. Yeah. And then think of how you can integrate this into Libri photos. I haven't started on that yet. I'm still looking into the cleanest possible way to do it, but there's basically no third party library that can do that yet. And there's also no JavaScript library for that yet. So you have to implement basically both. So that's a future topic. Panoramas are also, like, easier because, like, they have been around longer. Right? So you have, like, libraries in JavaScript for that, but still, it's not that plug and play yet. There are also, like, other formats that I would like to spot, like 3 d objects, for example.
That would be cool. And maybe, like, some kind of machine learning where you can take a 2 d image and add some depth to it with the machine learning algorithm. That would be cool too. Yeah. But that can do a lot of with, like, experimental file format. I find that really exciting.
[00:18:37] Unknown:
Yeah. Another interesting 1 that I came across a while ago is I don't even know if there's still a thing anymore, but the lightro cameras where it's actually taking a light field image where it's, you know, capturing the the vectors of the light as it's entering the camera so you can actually change the focus of the image after you've already taken the picture and things like that. And then they've got their own application for being able to view and manipulate the images that are taken with their cameras.
[00:19:03] Unknown:
Right. Right. That that would be also very awesome to view these images then. But then you basically have to implement the whole pipeline, I guess. If you want to change the focus on the browser, that's, like, for now, out of scope, maybe in a couple of years.
[00:19:19] Unknown:
And so digging into the Libre photos project itself, can you talk through how it's architected and some of the overall kind of design philosophy that you have around how to structure and maintain the overall application so that it is, you know, appropriately compartmentalized so that you can focus on individual bits without having to, you know, work across the entire scope of the code base whenever you wanna change a feature?
[00:19:46] Unknown:
Currently, we have, like, a very basic setup. We have, like, a setup. We have, like, a front end, a back end, and proxy between it, which routes between the front end and the back end. We also have, like, a separate image similarity process. Yeah. In the back end itself, we work with our key workers. Right now, I'm only looking for that the code is written in a coherent way because, like I said, the original author worked on it alone. And if you set up a Django app for the first time, you have, like, your model, your view, and your serializer of pi. He just continued to always code in that. So you had, like, very, very large files, tens of thousands of lines. I split that in in smaller chunks.
So and I think we can work for the next couple of years. Definitely, like, in this paradigm way, like, have apps within your Django application, which we then load, and I think that will scale well in the back end. In the front end, we are trying to split up the API and, like, the whole TypeScript part from the actual front end itself because we now also want, like, a mobile app, and it makes sense that these 2 code bases share, basically, the code. Yeah.
[00:21:02] Unknown:
And as you have been working on the project and starting to expand on the feature set and investing more of your time in it, what are some of the ways that the design and goals of the project have evolved since you first started working on it? Yeah. Like I said, so the the code was on a pretty rough spot. We basically did what
[00:21:21] Unknown:
everybody does. When he sees that code, you start refactoring it. I'm not a fan of, like, rewriting stuff because you usually introduce more bugs with it. So I usually go about it that if I work on something, I also start refactoring it, testing it if it still works, and then add the functionality I want to add. We then introduced, like, linting and such steps to improve, like, the code readability in general. And yeah. So we do this process over and over and over again until the go code base gets well enough. The issue we currently have is is that we started too many refactoring projects, so it's kind of for new people to know what the right style is.
So I always have to explain that. For example, we're currently migrating to TypeScript from JavaScript, but we're also moving from Redux to real Redux toolkit. And if you read that code, you don't you wouldn't know which 1 is the correct 1. And in the back end, it's basically where does the code belong because some of the old classes are still, like, in the main folder, and I want to go into subfolders with them. So that's also kind of hard. But I like this approach of, like, in incremental progress. Right? So that you don't introduce too much change to a code base all at once just because you want to the code to to look nice. But instead, just work continuously on it, and then the code will improve.
[00:22:42] Unknown:
A number of the capabilities that you have in Libre photos are powered by different machine learning workflows. So I know that, for instance, you have facial detection and facial recognition for being able to cluster photographs based on who's in it. You have seen recognition for being able to understand, you know, this is the general type of picture. So maybe it's a landscape photo versus a portrait and things like that. And then some geocoding capabilities to be able to say, okay, based on the EXIF metadata, I can put this on a map so that you can see, you know, visually on a map, these are the different pictures that were taken here. And I'm wondering if you can just talk to some of the ways that the recent advances in machine learning techniques and the general availability of libraries to support them have improved the availability and quality of these types of advanced features that are available and accessible to you and other people who are working on Libre photos.
[00:23:41] Unknown:
The biggest issue that you currently have with, like, machine learning is that mostly researcher write this code, and they always love their graphs where they explain how their algorithm is the best algorithm, but basically never explain how to actually use the thing they developed. So we have, like I said, like, a couple of libraries that we actually use. The face recognition API, which is basically the standard for every Python app that uses face recognition. There are a couple of different ones, but they only perform, like, a bit better and the API is, like, not that user friendly. The recent advances, like the newest thing we currently use, is for semantic search that we use. I think it's called Open Clip that allows us basically to search for any kind of phrase, and then we get the images back, which is really nice, right, because that's a really advanced feature.
We basically only have to run this library and save all these clips, these clip embeddings, basically, and then we have to search for them. And then we get quite good results back. It's not yet activated by default because we don't know yet how to mix that with, like, searching for metadata. So it's kind of exclusive right now. So you either want to search for metadata or you want to search semantically, basically. Because as soon as you have both, you need some kind of ranking, and that's pretty hard to do right. But we also use, like, a lot of different things.
I'm also very excited about MobileNet v 3, which is an object detection algorithm, which we want to implement that was developed for, like, smartphones to detect objects. That's really resource friendly, and and it's also good enough for most people.
[00:25:28] Unknown:
To the point of resources, that's another interesting question. Because of the fact that Libre photos is intended to be self hosted, by adding these machine learning capabilities, you might be introducing some constraints on who can run it or how expensive it is to run it. And I'm wondering how you think about that as you're deciding what features to include, how to implement them, you know, whether to make them core or just an optional extension to the platform, and some of the other ways that that self hosted sort of target audience influences the ways that the project is designed and the ways that you think about evolving it? Right. So compatibility
[00:26:09] Unknown:
is obviously key. So I'm always looking, like, for a project where I can, like, recompile the stack if I have to. For example, Dlib doesn't work on, like, old processors, like, from 2011 or something. And, obviously, there are people who use their old computer for, like, their first server, and they are then complaining that it doesn't run. So we have to compile that from scratch because to to get basically all the AVX 2 instructions out of there. We also always have to think about, like, ARM compatibility. Right? ARM is, like, still pretty new. It gets better now that we have m 1 Processor, but you still need to be able to compile stuff, basically, from source. Otherwise, you can't put that in your pipeline.
Usually, the machine learning algorithms take up resources. The biggest complaints are basically about image size currently. The new photos images are still pretty big. So I probably have to rewrite it in the future that we extract all the models out of there so that you only have to download them once and that you also can decide if you want to use it or not. That's obviously, like, a lot of work. I have to think about, like, infrastructure suddenly, and I currently don't. Yeah. So the constraint itself from the algorithms isn't usually that high. If you don't do training, most normal CPUs can work pretty well and in first off.
There are, like, a couple of algorithms where it takes, like, forever, obviously. But most are, like, done in a couple of seconds. So if you run-in the background, it doesn't really impact the workflow.
[00:27:46] Unknown:
And then the other challenge with machine learning based features is that a lot of times, the quality of the performance is going to depend on the quality and quantity of the data that you have to train it with. And I'm wondering what you see as the performance trade off of being able to let the algorithm train or retrain on the actual images that somebody is trying to manage and just the overall sort of minimum baseline quantity that's necessary to be able to actually bootstrap that model off of somebody's own collection versus just having to come with a pretrained model and just rely on whatever might be baked into that as far as parameters or optimizations?
[00:28:29] Unknown:
Right. We usually just stick to the default models. So, usually, the researchers provide, like, a model which you can download and run, and it works in most cases. If you have, like, a big company where you can retrain, basically, on the available data you get back, That's obviously great, but that's not yet really possible in a self hosted context. We use, like I mean, you could name it machine learning algorithm. These are basically very light clustering algorithms to decide, okay, which case belongs to which person. But these algorithms are so light that every processor can run it. So that's not the issue. And I think you really shouldn't retrain stuff on an individual system because some users will find a way to break that, and then they will complain that the machine learning doesn't work. And that's obviously not a great user experience.
[00:29:21] Unknown:
In terms of actually hosting this yourself, what's involved in getting it set up and managing the upgrade process and figuring out the storage systems to make sure that you actually have all your photos available to Libre photos to be able to manage them?
[00:29:38] Unknown:
Right. So the preferred version to install libre photos is with Docker Compose. We provide, like, exemplary YAML file and environment file, which you just have to edit and then to rename. And then you basically just do the Docker Compose up, and then it should be running. Right? But a lot of people have an issue with docker compose itself. The issue they have is that they never used it before, and then they see, for the first time, like, a YAML file and the dotenv file. The biggest mistake newbies make is basically forgetting to rename the libre photos example dotenv to justenv. And then they say it doesn't run. But by default, everything is saved to the file system. So the database is on the file system. The thumbnails on the file system, and we basically read also all the images from the file system.
That's how this system works. We also provide, like, a couple of different ways to use liba photos. We have Linux install script, which currently doesn't work, so please don't use it. And in a couple of weeks, hopefully, we also provide, like, a singular Docker image so that it's even easier to run.
[00:30:44] Unknown:
In terms of the storage solutions, I know that by default, it uses the on disk file system, but that it also has an integration with Cloud for people who are using that for their overall sort of system management. I'm wondering what are some of the other options for storage locations or being able to expand the capabilities to maybe include things like object storage.
[00:31:07] Unknown:
Right. So we currently work basically on Django itself. So there are a lot of solutions where you can save to different systems. We currently have not implemented that yet because I don't find it personally that interesting. Right? Because I'm currently the main developer of it. I'm not looking forward to, like, setting up AVSS 3, only to test it, basically. So I'm not interested in that. If somebody wants to pick it up, he or she can do that. I have no problem with that. Yeah. The Nextcloud thing is basically an import. It's like a feature from the author.
A lot of people misunderstand that. I think they can work directly off of Nextcloud. Now it basically downloads everything from Nextcloud to your Viber Photos instance, which is, like, useful if you have, like, a separate Nextcloud instance. But it's not useful if you work on the same system, then you can just hook into the same location where your photos are. That's easier to do.
[00:32:03] Unknown:
Capabilities of being able to extend and modify the Libre photos project, what are some of the interfaces that are available for being able to add new capabilities or extend existing functionality?
[00:32:15] Unknown:
We obviously provide a REST API because that's how the front end communicates with that. We're also working basically, on the upper part so that you have, like, a React TypeScript Redux library, which automatically has all the API endpoints programmed in the correct way. We want to use that basically then for the mobile app. I don't know when that's coming, but it's gonna be worked on. But it's a huge effort. And, yeah, other than that, the system's open source. So if you know Python and you want to add a function, it's not that hard. The code base isn't that huge. We have only, like, a handful of models.
So if you want to add, like, some kind of back end capability, you can probably do it. And if you have questions, you can always ask me. I'm pretty easy to reach.
[00:33:04] Unknown:
Walking through a hypothetical, if somebody wanted to be able to add support for being able to maybe import your geolocation data from your Google account based on your smartphone activity and then automatically geotag your photos that you've uploaded to Libre photos, what would be the sort of steps and components of the application that you would need to interact with to work with that kind of a feature.
[00:33:31] Unknown:
So you would only basically need, for now, basically, an endpoint for the back end. Right? So some kind of endpoint where you basically can say, okay. For this API key, which you probably then get from Google, I want to load the geolocation data for these images. And we already have, basically, all the data structures in place for, like, EXIF data, so you can just load it in there. Then you obviously have to expose it in the front end, basically, with a button. You can probably take the scan button for as an example for that and just copy paste it and rename it. And then you have, like, your import ex of data from Google.
Yeah. So these are the 2 components where you have to work on that. If you actually wanted to, like, basically, create your own geo tagging solution within the Libre photos mobile app, Yeah. You would have to work on that too. I don't know how the permission or how that works yet because another guy works on the app, basically. So maybe there's more involved with that, but I think you can add this there just too.
[00:34:36] Unknown:
And so in terms of the applications of Libre photos, you know, given that you have been working on this for a while now, what are some of the most interesting or innovative or unexpected ways that you've seen it used?
[00:34:49] Unknown:
I'm always, like, really amazed what kinds of data collections people have. So 1 guy was asking me if Libre photo supports half a 1000000 faces, and I'm just like, what's the use case for that? Like, are you surveilling, like, a small country or something? That was really interesting. So especially now that Google Photos basically costs money, now I have, like, a lot of people who have, like, 200, 000, like, or 500 photos. So I have to make sure that everything fetches in a fast way. Yeah. That was really unexpected at first. But now I know about it, and I can ask even the people everything performs well because getting a dataset set up and running with, like, half a 1000000 images is obviously a lot of work.
[00:35:36] Unknown:
As far as that scaling aspect of it, as you have been getting people who are working with larger libraries or, you know, maybe different requirements as far as being able to batch process all of their images to add new information to it or do face recognition across all of them? What are some of the scaling bottlenecks that you've run into or some of the types of performance optimizations that you've had to work in?
[00:36:01] Unknown:
Yeah. That's still an ongoing effort. So the issue we had was, like, we basically looked at every photos on its own. So we looked, okay, Create a new photo instance, and then create a thumbnail, and then look all the faces, and then look for all the scenes and stuff like that. But you're basically loading and unloading a lot of stuff which you don't have to, and you basically don't do any batch processing. So what we're now doing instead for a couple of these processes is basically saying, okay. 1st, add all the instances, then create all the thumbnails, and then you can display stuff faster. That doesn't always improve the actual execution, but it definitely definitely improves, like, the responsiveness of the system.
With the displaying itself, usually, it scales pretty well unless it wasn't implemented yet. So we basically support that you can have, like, a huge timeline like in Google Photos, where you can scroll up and down if you want how you want to. So that part works yet. But, for example, for face recognition, it only goes up to 50, 000 images because the query is kind of naive right now. So it just queries the first 50, 000 images. So it gets slower and slower and slower because it fetches 50, 000 images at once. But yeah. So so we have to rewrite that part, but, usually, it scales pretty well. We use, like, Postgres as a database back end, and that forms really good.
[00:37:26] Unknown:
In your own experience of working on this project and adopting it as your own, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:37:36] Unknown:
The hardest thing right now for me is still figuring out how I can deliver stuff in a regular interval. Right? So I can't go silent for, like, a month or 2. I have to, like, keep people in the loop. So I always have to, like, come up with real estimates, like, how you do basically every job. But now you are really committed to them because you then have to write something about it and, basically, also on time. And that's challenging because, like, for example, implementing deleting images took me, like, an hour, but uploading images took me, like, 2 weeks. There's no consistency to it because you run into, like, unexpected functionalities or unexpected ways something was implemented. And sometimes it takes a long while to debug that, especially if you work with a project that you took over and not implemented everything yourself.
Yeah. That's really challenging to chunks that are small enough that you can deliver something in a regular pace.
[00:38:39] Unknown:
For people who are looking for solutions on how to manage their photos and they want to maybe have different use cases around sharing or automatic album creation or, you know, facial detection. What are some of the cases where Libre photos is the wrong choice and maybe they're better suited with 1 of these other self hosted options, or they're actually better suited with a commercial option or just keeping everything on their desktop? So do some of the some of the situations where you would steer people away from trying out Libre photos?
[00:39:09] Unknown:
Couple of people who are really, like, intolerant, like, to errors. And if you're somebody like that, then the performance is not the right choice for you. Because I try to, obviously, not create any bugs, but I can't have, like, a QA pipeline where I check everything twice or 3 times. So stuff will break from time to time. If you can't handle that or you you are not allowed to with however your system works, then that's not the right solution for you. I also created, like, a comparison page. So that's basically in our documentation right now, and then you can look over, okay. This is my use case actually implemented yet.
And I also put there the other open source solutions. So you can quickly compare, okay, that solution can do this, but not that, and you can figure that out yourself.
[00:40:01] Unknown:
Given the fact that all software is a moving target, how have you been approaching trying to keep that as accurate as possible without sort of losing your sanity in the process?
[00:40:11] Unknown:
And just go from week to week. I'm not trying to develop a huge road map. I'm not a fan of that because, like, stuff is constantly changing, and I'm just trying to make it basically to the end of the week and have something that I can show off, like, maybe a bug fix, maybe a feature. That's, like, a better way to work on that than have grand plans to, I don't know, have the best machine learning platform of every self hosted solution. Because that sometimes, you you can plan for that. You don't have the resources for that. So just implementing stuff each week, and then you will get there.
[00:40:49] Unknown:
And as you continue to evolve and maintain the projects, what are some of the things you have planned for the near to medium term or any areas that you're excited to explore and decide whether or not you want to take the project in a certain direction or add new major features? And in that vein as well, any particular areas of contribution or support that you're looking for help with?
[00:41:11] Unknown:
Right. So any technical help is appreciated. So if you know Python, if you know know JavaScript, if you know, Docker or Docker Compose, then your help is more than welcome. I'm always looking into lowering the bar of entry. So I'm not a fan of, like, getting requests each week. How does Docker Compose work? And can you send me the installer for Windows? We don't have that yet. So that's obviously something I'm going to work on. The Docker the single Docker containers is in the near term future, the next feature. And in the long term, make it even simpler. Like, maybe even like a flat pack for Linux or like Windows installer, which will probably be a huge pain, but that would be great.
Obviously, like, extending the functionality we currently have, we have, like, a lot of, like, partly implemented features, and really nailing these features would be great. Especially, like, the app is, I think, like a huge thing if we get this right. If backing up stuff automatically works, then this will be a system you can really work with. I'm definitely looking forward to working on that too. And I'm also absolutely looking forward to working with more people together because most stuff I learn on this project doesn't come from me googling stuff, but other people who come into this project and say, hey. Maybe we should do this that way or this way, which helps me grow. Right? And that's obviously a huge part of open source development too.
[00:42:41] Unknown:
Are there any other aspects of the Libre photos project or personal photo management or building self hosted services that we didn't discuss yet that you'd like to cover before we close out the show?
[00:42:53] Unknown:
I wouldn't have thought that this project would be so successful when I just took over own photos branch, basically. I think if you're a software developer looking for something for, like, a real side project where you can be proud of, just look for a solution which isn't maintained anymore, but still has a lot of users. And I think you can provide a lot of value there.
[00:43:13] Unknown:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. This week, I'm going to choose a movie I watched over the weekend called Uncharted. Just a fun kind of adventure movie. It's based on, you know, set of characters who are exploring, looking for lost treasure from Magellan when he was circumnavigating the globe. So just a fun, entertaining movie. And so with that, I'll pass it to you, Nyaz. Do you have any picks this week? Yeah. So I got my Steam Deck now, finally.
[00:43:47] Unknown:
I'm the q 2 guy and right behind me. And, yeah, it works really well. I'm really happy that Linux really is now up to a point where it's so stable and polished. Like, if you're using the system, you don't think about Linux, which means they have done everything right. And, yeah, I'm really looking forward how the whole ecosystem is evolving.
[00:44:08] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you've been doing on Libre photos. It's definitely a very interesting project and 1 that I plan on taking a look at for my own photo management purposes and being able to potentially add in some new capabilities. So thank you again for all of the time and effort that you've put into building and maintaining that project, and I hope you enjoy the rest of your day. You too. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com for the latest on modern data management.
And visit the site of pythonpodcastdot com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction to the Episode and Guest
Niaz's Journey with Python and Side Projects
The Birth and Growth of Libre Photos
Core Objectives and Community Building
User Experience and Feature Prioritization
Challenges with Advanced Photo Formats
Architectural Design of Libre Photos
Machine Learning in Libre Photos
Resource Constraints and Compatibility
Storage Solutions and Extensibility
Interesting Use Cases and Scaling Challenges
When Libre Photos Might Not Be the Right Choice
Future Plans and Contributions Needed