Summary
The companies that we entrust our personal data to are using that information to gain extensive insights into our lives and habits while not always making those findings accessible to us. Pascal van Kooten decided that he wanted to have the same capabilities to mine his personal data, so he created the Nostalgia project to integrate his various data sources and query across them. In this episode he shares his motivation for creating the project, how he is using it in his day-to-day, and how he is planning to evolve it in the future. If you’re interested in learning more about yourself and your habits using the personal data that you share with the various services you use then listen now to learn more.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
- Your host as usual is Tobias Macey and today I’m interviewing Pascal van Kooten about his nostalgia project, a nascent framework for taking control of your personal data
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing your mission with the nostalgia project?
- How did the topic of personal data management come to be a focus for you?
- What other options exist for users to be able to collect and manage their own data?
- What capabilities were lacking in those options that made you feel the need to build Nostalgia?
- What is your target audience for this set of projects?
- How are you using Nostalgia in your own life?
- What are some of the insights that you have been able to gain as a result of integrating your data with Nostalgia?
- Can you describe the current architecture of the Nostalgia platform and how it has evolved since you began work on it?
- What are some of the assumptions that you are using to direct the focus of your development and interaction design?
- What are the minimum number of data sources needed to make this useful?
- What are some of the challenges that you are facing in collating and integrating different data sources?
- What are some of the drawbacks of using something like Nostalgia for managing your personal data?
- What are some of the most interesting/challenging/unexpected aspects of your work on Nostalgia so far?
- What do you have planned for the future of the project?
Keep In Touch
Picks
- Tobias
- Pascal
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- timeliner
- qs_ledger
- Nostalgia
- Shrynk
- Whereami
- R Language
- Duck Duck Go
- Caddy
- Perkeep
- Dark Programming Language
- Pandas
- Neo4J
- Pandas Extension Arrays
- Parquet
- ElectronJS
- Zincbase
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. And they also have a new object storage service to make storing data for your apps even easier.
Go to python podcast.com/linode, that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. And you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media, Corinium Global Intelligence, ODSC, and Data Council.
Upcoming events include the software architecture conference, the Strata Data Conference, and PyCon US. Go to python podcast dotcom/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
[00:01:39] Unknown:
Your host as usual is Tobias Macy. And today, I'm interviewing Pascal Van Cooten about his nostalgia project, a nascent framework for taking control of your personal data. So, Pascal, can you start by introducing
[00:01:50] Unknown:
Yeah. Hi. So my name is Pascal. I'm 30 years old, and I'm currently a freelance Python software developer. Before that, I've been a lead data scientist and consultant for 5 years, having worked at, over 25 companies. I really love innovation, and I've been an active contributor in over source projects, mostly machine learning. So, like, for example, I made Shrink. It's a package that uses machine learning to predict which compression to use for your data. And another 1 is where Mi, which uses Wi Fi signals and machine learning to predict where you are. And, yeah, I have also spoken at conferences such such as, EuroPython.
[00:02:24] Unknown:
And do you remember how you first got introduced to Python?
[00:02:26] Unknown:
Yeah. I started, programming, using our programming language 7 years ago while studying methods and statistics. And I got an opportunity to work at CITO. It's a Dutch organization that, amongst, other things, calibrates high school tests, and they needed a GUI for 1 of their expert systems. And they knew that I didn't have much programming experience, but it was fine. And, well, I knew R wouldn't be good, for building a a GUI. So I asked a mentor of mine, what he what I should use. And even though he used himself PHP, he made a great suggestion for me to use Python. So, yeah, soon after, I started using it to compete in Kaggle machine learning competitions. And, I started using it, daily, both professionally and for side projects. So, yeah, I've, I've done something like a 100 side projects.
[00:03:15] Unknown:
But, yeah, nostalgia is definitely my most ambitious 1 yet. And so in terms of the nostalgia project itself, can you just start by
[00:03:30] Unknown:
Python, framework, like a lot of repositories. But yeah. So the main mission here is, that nostalgia is pursuing 3 goals. Get the data back in your own hands, ensure privacy and control for you, that is the individual, and allow the user to utilize the data to explore more about themselves, answer questions, and help them achieve personal goals. So very often, people say that not tracking is the solution to to privacy. So they come up with solutions like DuckDuckGo for browsing, which avoids tracking. But I think that's suboptimal. So I still want to have that data. I want to use it however I want. And at the same time, I acknowledge that it's a very difficult issue, and it requires major changes in how we handle data. But, I believe we are going in that direction, though. And so at this point, nostalgia is all about obtaining your existing data, keeping it only on your own devices, and allowing you to use it. So you mentioned that
[00:04:23] Unknown:
1 of the things that you're trying to do with it is being able to gain some value of your own personal data that you're already sending off to some of these third party services. And I'm wondering if you can just describe a bit more about how the overall topic of personal data management came to be a focus for you. Okay. So,
[00:04:40] Unknown:
it started with wanting to get insights in how I spent money for me, actually. And, I realized I wanted to filter on location, but, obviously, the bank does not provide you that as a filter. So then I realized I should be able to use my GPS data. And I started scraping my Google timeline data for personal use, which has information like, you were at home between 9:0:1 PM and 10:0:5 PM. And, you were driving for the next, 35 minutes, for example. So as I further progressed, I I became actually obsessed with having access to all my data. And, another example was when I was working on multiple projects. I made sure my text editor tracks which files, I open. And so aggregating that, I had a very nice overview on which projects I worked at at what time without having to keep track of it manually.
So for a data that I felt missing, I started to think of ways to have it automatically tracked. And, yeah, Another example would be that, I created a Chrome extension, as part of nostalgia that
[00:05:42] Unknown:
monitors play and pause events on videos in your browser. And so I actually think that I'm the only 1 with a history of the videos that I've been watching at what time and for how long. Yeah. I could definitely see how that can be interesting to go back and get a more high level view of your habits and some of the day to day interactions that you're having with your technology and, you know, for instance, as you're saying, with your editor to be able to see what projects you're working on. Right. Yeah. Definitely. And so I know that there are a number of other platforms or systems that you can use for being able to track some of those types of information. And I'm wondering if you can just give a bit of an overview about some of the other tools or systems that you've tried out before building nostalgia as a way to be able to collect and manage your data and be able to get some of that same types of insights as what you're using nostalgia for? Yeah. So, actually,
[00:06:31] Unknown:
as you as you noticed, I I started kind of naturally with, with, you know, like, simple scripts here and there. But, indeed, at 1 moment, I started looking at what's what is around. And, I noticed, for example, Time Liner, which is a a Go repository repository, which, helps, collect data from sources. It was made by the Kaddy web server creator, and it actually does a couple of nice things that nostalgia doesn't do, like, deduplication of data, and, and it's very, memory efficient. And another 1 is, perkeep, which is, simply about storing data and making backups. So with the idea that, even if, like, data platforms like Twitter might disappear, you know, you'll still always have your own copy. I don't know. I I know there's a lot of, you know, apps that try to tackle 1 particular aspect, like tracking your mood or tracking, like, a single thing. But, yeah, I, I did not find really 1 that,
[00:07:26] Unknown:
allows you to track everything, let's say. And then in terms of the target audience for nostalgia, I know that initially you started building this for your own purposes. But as you have started to build this out into more of a full fledged project and release it as open source, what is the sort of, target audience of who you think would be able to most benefit from nostalgia and has the necessary
[00:07:51] Unknown:
skills to be able to make use of it? So, ideally, I want to, you know, really to target the end users and, you know, that everyone can be using it. But I think that's too early at this point. So at this moment, the beauty of the framework lies in in it in its open source nature, and it's driven by the urge to connect everything and generate insights. And to do this, I believe that, developers are actually the perfect group because they always try to optimize stuff. And so, initially, the target group would be Python developers. So, yeah, probably, there will be maybe using Docker will be available for a more generic audience. But, yeah, like I mentioned, the end goal should be to get this to as many people as possible and for them to become a natural part of their life and they enable them asking questions about their data, basically. And so beyond the initial use cases that you were mentioning of being able to see what were is your travel times at different points and the different projects that you're working on in your editor. What are some of the ways that you're actually using nostalgia now as you've started to add more capabilities and data sources to it? It's a it's a it's a tough thing. I'll I'll get back to that. But so, actually, I have a lot of examples, but, let me give you 3. So I had to, get a visa in The Hague, and, I had that done that the year before. And I was wondering how much it cost the the year, before. Like, I mean, that would be the same price as now. Right? So I I used, like, nostalgia to filter my payments in The Hague using my location data, and I immediately found the found the results. So the other 1 would be, like, comparing my heart rate data while I was in a traffic jam. And I noticed my heart rate was higher.
And, I realized that it was mostly listening to music while driving, but that, it it actually made me think. And I realized that listening to a podcast, for example, would provide just the right amount of distraction to calm me down. So, yeah, that's, that's something I did. And, so that's another another example that I have. And another example would be that, someone posted on, LinkedIn an idea for making a start up about, cloud deployments and how to make that easier and more generalized. And I remembered reading at some time in the past, some, article that describes a similar idea, but I had no idea how to find it in my, web history. But then it occurred to me that I re do remember where I was at the time when I read it. I was at the Amsterdam trans train station. So not using nostalgia, I was able to filter my browsing history being on the station using the location data. And then, again, I immediately found it. So, you know, it was rather than searching for a cloud or startup, which did not yield any results, it was actually an article about dark, a new programming language. So for many, it's not that obvious to think in a cross source way, and I even find that it's difficult myself. That's because it's very logical, that's, you know, like, we currently have no way of executing such queries. So there's no way to think in a cross source way. But I'm a 100% sure that this will be a possibility in the future. And so while it might seem radical now, you know, even in the early days, people also thought Google was not necessary. Like, you could just type a URL. Right? So, yeah, I think there will definitely be, the good use for, you know, a cross source querying.
[00:11:06] Unknown:
Yeah. There's a a quote. I'm probably gonna misattribute it, but I believe it was either Bill Gates or 1 of the heads of IBM at the early days of the Internet saying that, you know, search engines weren't really useful. Somebody would just have maybe 5 or 6 sites that they would wanna go to, and they would just keep track of those and bookmark them. And that beyond that, the Internet would wouldn't really be that useful. And now with the explosion of the web, it's they were obviously wrong in retrospect, but Yeah. Definitely. Yeah. So you've mentioned a few of the different types of data that you're working with and being able to do things like filter based on the location that you are at to be able to pull up an article that you were reading. I'm wondering if you can give a bit of an outline of the number of different data sources that you're able to pull in now and talk a bit about how nostalgia is built to be able to correlate across those different data sources? So,
[00:11:56] Unknown:
at the moment, I, just checked it. There's 29 different types of data. So that's indeed from all these different ecosystems. Think like heart rate, you know, location data, calendar, your Reddit posts, pretty much anything that I've been personally using and some others have contributed. Now I also lost where I went, wanted to go with this. So yeah. So, basically, what, what nostalgia, provides when you've combined, like, when you are having the connections is that, I I made this structure, like, I call it a nostalgia data frame. It's basically inheriting from a pandas data frame. But, we've added, like, a lot of, convenience functions so you can easily query it by, for example, natural, language times, like dates.
And, for yeah. Just to easily joins, across data sources, by time, for example. Another another is full text search, like using a dot containing or, yes. So so, like, a lot of convenience functionality that, you know, as long as you're connecting
[00:12:55] Unknown:
the source in the in that particular way, it will have the benefit of, everything that, that we found useful as a community. And is there some common data format or schema that you're coalescing everything into to be able to join across them, or are you relying on some set of common attributes among those different data sources to be able to correlate? Or is it largely just based on time and just seeing what the coincidence of time is across those different data sources? That's a very, very good question.
[00:13:24] Unknown:
And that's, something that I that this is actually why I started with, with Pandas because I thought it will give me the full flexibility. Like, the other option was going immediately for, like, a graph with Neo 4 j, which I did consider. But, I thought, like, let's just start prototyping it, with Python. So, I mean, integration, like that is is always tough. Because on the 1 hand, you want to make, data as structured as possible so it's easy to connect. But at the same time, you want to have a free schema so that it fits all the data. And, currently, Nostalgia offers both. So it is possible to use some of the data interfaces, such as chats that, you know, they require from, to, and a timestamp, for example. So, indeed, time is is definitely a a big part of this. And another 1 is, like, payments, you know, with a to, from, and an amount and a timestamp. So at the same time, many other classes, can inherit from the base object that is just a Pandas data frame. And, actually, Pandas can deal also with with JSON. So it's really up to the source connection, you know, the application and the user on how to deal with it. So you can either use this, yeah, this this common data interfaces. But, if you just want to get going with your new type of data, then, yeah, you can just, inherit from the base, basically.
[00:14:38] Unknown:
And have you looked at all into using some of the capabilities found in the pandas extension arrays for being able to build out custom data types for being able to do some of this filtering and joining across the different data frames? No. That's actually new to me. So you should probably post a link, so I can look that up. Yeah. So, well, actually, to add on to that,
[00:14:58] Unknown:
I basically also thought that instead of trying to make it very strict, the the structure is that after we've actually, like, connected something like 500 sources and, and a dozen of, applications have been integrated, I think that's why we will see really useful patterns based on actual, you know, existing data. So, so that's kind of the approach that I I think would make sense. But, but, yeah, I'm I'm still learning. And, so if there's better ways to do this, I'm all ears. Yeah. I'll definitely add a link in the show notes and actually did an interview with the person who implemented them a little while back, which I'll link to as well. But at a high level, it's a way to be able to add a plug in to Pandas data frames that has custom definition for data types for things, 1 example being geodata
[00:15:42] Unknown:
or IP addresses, and then having some custom logic for being able to process those different column types within the pandas data frame. That sounds super,
[00:15:51] Unknown:
relevant because, I was also thinking about, you know, like, these, common attributes. If you can define them in, like, you know, in 1 way and then be able to reuse them, that's, that would actually would make a very good fit, I guess. And then for the storage format for the data that you're using, what are you using to be able to actually
[00:16:10] Unknown:
keep track of all of it and maintain history across all the different data sources? Yeah. So it's kind of a custom,
[00:16:16] Unknown:
ETL process in which, it's everything is stored to disk, but, using heavy, compression. So it's, it's very heavily compressed. Basically, the idea is that, it would all be in the memory of 1 process. But, yeah, this is 1 of the things that, could could change. I think the Pandas interface is very interesting, so I'm I don't really wanna, like, jump ship that quickly and and go to a a a ship bar solution yet. But, yeah, it's definitely something to, to to think about. And, but, yeah, basically, it's it's stored as, parquet files,
[00:16:48] Unknown:
using compression. And what are some of the overall assumptions that you're using to direct the focus of your development and some of the ways that you're thinking about the interaction and user experience design of the overall project?
[00:17:04] Unknown:
Yeah. That's a that's a very, very good 1. So, you know, mostly it has been driven by, you know, personal, personal ideas. But as as I'm seeing that there's more contributors, we're seeing more, more structure. And, though, basically, I think that, you know, rather than just thinking about, only, you know, which apps do we want to add, we are also going to be focusing on making the barrier to entry as low as possible. So there can be many people, you know, adding their own source, but also getting quickly to the stage where they have their own data and they can actually start making applications for it. So I try not to have too much assumptions and and, you know, think about what apps should be created.
But, yeah, they're the ones to like, for example, creating the timeline in which you see all your data, that was a very natural 1 for me to, to want to have, to kind of to this kind of overview on everything. And digging more into the timeline in particular,
[00:18:04] Unknown:
when I was looking through preparing for this interview, I noticed that it has a UI element to it. So I'm just wondering if you can talk through some of the structure or how you approached building that 1. Is it just a web UI that you load up and you have a server process for populating it, or is it more of a desktop application? I'm just curious how you're approaching some of the more user facing elements of building out the nostalgia ecosystem. Yeah. Right. So there's,
[00:18:29] Unknown:
there's definitely a a tough 1 in which I've struggled between, you know, should we have, like, a a web, client? Should we have, mobile or or even, like, have us, like, use electron, for example, for making a desktop application? So, yeah, there that's definitely, there's no definite answer yet on that. The timeline in particular was just, like, a very small, like, JavaScript library I'm using that's just did exactly what I wanted it to, which is have a a timeline and indeed, feed it with a server process. But, you know, this is something that is created in a few days, and, and I'm actually hoping someone would, you know, take out this inspiration and would make, like, the next version of it, for example. Though I I I definitely don't think the, the maximum is, maximum potential is used there. But, yeah, I think, you know, the having a good core, that's something that I should personally be focusing on. And then I hope the the ecosystem, there will be people that are just excited to create applications on top of it, whether it be web, Android, or, you know, or, or desktop.
[00:19:34] Unknown:
And in terms of bringing more people into the community that you're trying to build around this, I'm wondering if you can just talk a bit about some of the if you can just talk a bit about how much attention it started to gain and, any feedback that you've gotten as a result of people starting to experiment with it and try to use it for their own purposes.
[00:19:54] Unknown:
Yeah. So I I so I think that the the main thing is that as soon as you're having it installed and as soon as you've had have 1 source loaded, you'll immediately get hooked onto it and want to add more. And and, basically, our goal should be that if someone's source is not there, that it should be as easy for them as possible to to be able to add that source. And, so I think that this is, like, like, it's really crucial there. I think I forgot part of the question, actually.
[00:20:23] Unknown:
Just wondering how much, community adoption you've seen and any feedback that you've gotten as a result of people starting to use it? Yeah. So so indeed, like, I have not had much,
[00:20:34] Unknown:
community adoption because I have not, like, reached out anywhere. So I've mentioned it, like, once in a in a comment on Reddit, for example. But, I plan to you know, once we have the documentation even better and it's very clear for everyone how to use it, I think that's when, you know, we'll try to expose, to a larger larger audience. And, well, I guess then the real, the real things start, like, how can we make it even easier and and stuff like that. And in terms of using nostalgia
[00:21:04] Unknown:
for tracking and correlating your personal data, what are some of the drawbacks posed by using that as compared to a managed service that has a lot of investment and time dedicated to it for things such as Google or any of the other, personal data services?
[00:21:24] Unknown:
Yeah. So it I mean, it it does not offer any guarantees on, you know, like, your your data. Like, it's, if your hard disk crashes, then everything is gone with it, in that sense. So, indeed, you know, in that sense, there's no, it's it's not creating, in that sense data for you. I mean, yeah, there there's basically no backup ideas yet, for example. So that's definitely a a drawback. That's definitely something I would compare, differently. But, I mean, at at the same time, at least you are sure that, you know, you're the only 1 having access to it. So so that that's a good thing. It's just it will only be on your local machine. Another drawback is that you don't have, like, perfect integrations with a push of a button. So some sources are much more difficult. So they require some manual work for, for, gathering it. But, yeah, the most important is that you do not have to actively record data. So that's 1 of the main principles. You might have to do something to, you know, gather it, but to record it. So in the heat of the moment, you don't have to, like, record data. So yeah. And I guess another downside is that it's it's really early in the development. So, yeah, we really don't have the answers to everything yet. But, yeah, we're working on it. And as you have started to build this out from just a few set of scripts that you ran for your own personal purposes into more of a full fledged project that you're trying to
[00:22:47] Unknown:
make usable for a broader array of people and make it fairly easy to add new data sources or
[00:22:54] Unknown:
new visualizations or integrations on top of it. What are some of the most interesting or challenging or unexpected lessons that you've learned in the process? Well, I think the thing is that it's it's a it's a big ambition here, and, there's so many options. And the more people you talk to, the more ideas they have. So I think this is 1 1 of the challenging parts is that, you know, so many things are possible because it literally contains concerns all your data. So, like, possibilities are actually endless. But, yeah, another another thing is that, you know, it's, I try to come up with useful taxonomies, for example. But, yeah, I basically gave up on on that for now, and I think it should be driven by ecosystem. Like, it kind of grow naturally.
And, well, another thing is that, you know, you always want to refactor. You always have this urge to, you know, improve stuff. But at the same time, I did, like, a small refactor, which, you know, made me lose part of it or, you know, basically, just wanting to make it available as soon as possible, then you have to cut some corners. But yeah. So I guess now it's about, getting, more requirements, you know, getting more usage, and then, we can redesign things. So then we can actually do a refactor. But I don't think that should be done too early. And 1 of the things
[00:24:08] Unknown:
too is that because this is focused on managing people's personal data, in some regards, it's easy as you start to think about what are all the possibilities to end up in a situation where you're just recreating another managed SaaS platform for people to be able to load all their data and integrations into. But at the same time, a big draw of the project as it stands right now is that it's giving control to each individual to manage their data on their own systems without having to worry what other servers or what other services are going to be accessing it. So I imagine that that's something that you'll have to keep in mind as you start to build out more integrations and more capabilities and start to think about how to make it a little bit more robust or resilient. Yeah. Yeah. Definitely. Yeah. Indeed. So,
[00:24:53] Unknown:
I mean, indeed, like, it's really about, getting the data back in your own hands. And it's definitely not something that we will, consider is to, you know, be like a a SaaS that, you know, you you you have all the data there. I think that will be a a really a big disaster. So that's definitely something that we, we want to avoid. But, yeah, indeed, at the same time, it's also something we cannot avoid is is the storage and, and, you know, ensuring that everything will be, will be safe. So, yeah, it it's very, very interesting for us. And, the the last things are not said about it for sure. And so what are some of the future directions that you have planned for the project either in terms of new data sources or
[00:25:36] Unknown:
new applications to consume the data or just overall updates to the system infrastructure and system design? Yeah. So postponing the the the redesign,
[00:25:46] Unknown:
until we actually get usage. I think it will be fine as long as we don't break any contracts with the user, it should be okay. As for, like, for the future, so instead of trying to connect sources myself, I'm trying to more and more post issues on GitHub and see who will ever who will pick it up. So another 1, very interesting topic that we wanted to be tackling is the, entity resolution. So for example, you're talking about, on Facebook about 1 person, but that person occurs elsewhere in your data as well. So would be great if we can do something about, you know, connecting the people from 1 source to another. So it's like cross source entity resolution. And and we always keep building on on the taxonomy.
Right? So where I mentioned, that's we do have some structured interfaces. I think there will be much more work on that. And as more as we add more duplicate sources, so, you know, that's, like, from different providers, I think this will be, cooler and cooler and and more useful to to everyone, basically. And as for the apps, I I did create a chatbot, on, on top of it at 1 moment. So you could ask, for example, how much did I spend in, supermarkets, last week in Amsterdam, for example, and it would be able to give you an answer. And we do this by just adding a decorator on your class, And then, basically, we'll do all this NLP processing. So that's something that was taken out because I had to ship something. Right? But I will be definitely working in the future to getting this, this chatbot bot, part back. Yeah. It was it was relying on on, like, personal scripts, basically. So Yeah. And for some of the entity recognition piece too, that's starting to sound like building a personal knowledge graph, which I'm sure would also be valuable in the context of being able to
[00:27:26] Unknown:
build connections and build understanding about how your data is working together. So it might be worth looking at. There's a project called Zincbase that I had on the show a while ago for being able to build out knowledge graphs in a fairly iterative fashion. So I'm definitely interested to see some of the directions that nostalgia goes. And for people who are interested in it, what are some of the types of contributions that would be most valuable to you right now or some of the types of skill sets or
[00:27:52] Unknown:
experiences that you're looking to, try and bring in on into the project to help bring it forward? Yeah. So there's a couple of skill sets we have identified. We even considered, like, roles. So there's now 1 other, contributor, my friend, Nikolai. And, well yeah. So, basically, he works on, like, on on 1 part of the mostly considering, like, things like security, but also on, how can we, like, how can we enable people to share their data either with others or with, for example, universities to do studies. So it's very targeted on how you want to use it. So so this is, like, 1 of the the, like, part of roles that's, that's, like, you know, taken now. But any of those topics that I mentioned before is something that we would love to see someone take initiative on. And, other than that, you know, the classic, if you want to be an owner of, let's say, you know, like Google data or, or, you know, like, 1 of those sources, that would be very valuable to us as a as a contributor. And I think it's actually quite fun. I personally do enjoy adding sources, but I should get my hands a bit off of that. And then, indeed, like, the the applications themselves. So if if someone would want to take up the timeline, for example, or, you know, wants to get involved on the chatbot or, you know, maybe if, like, configuration management is, is, like an application that we we could do.
So, yeah, there's, there's definitely a lot of, a lot of possibilities, and I I just encourage people to to to come talk to us in, either Discord or Slack, and, we can see what's a good fit. I guess, lastly, there's a lot of, need for just people taking, taking the project for, like, use it as a user. And so that means that you're you're trying to get, insights, and you would share, like, Pandas, queries, basically. You could share that with the community. That would be very helpful and very insightful most likely. So these are just some of the things that,
[00:29:40] Unknown:
that that come to mind. Alright. Well, are there any other aspects of the nostalgia project or the ways that you're using it or some of your thoughts on personal data management that we didn't discuss yet that you'd like to cover before we close out the show? I don't know. Like, I'm just thinking of something like, you know, that, like, 1 important point still is that actually,
[00:29:58] Unknown:
you know, like, why now is a is a question that, we get very very often. And, and for that, it's it's useful to to realize that because of GDPR, companies are more and more forced to, you know, provide you with your personal data. So it's a very natural moment that such a service will start occurring, you know, something that, will allow you to use your personal data. But, let let me think. I think that most things were covered, actually. I don't think, we've had a a big miss.
[00:30:29] Unknown:
Well, for anybody who wants to get in touch with you or contribute to the project or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And this week, I'm going to choose the newest Jumanji movie, Jumanji The Next Level. I watched it recently with my family, and it was hilarious and definitely worth a watch. I've enjoyed all of the Jumanji movies since the original with Robin Williams, which even if you don't watch any of the newer ones, that 1 is worth watching as well. So, if you're looking for something to keep you entertained, definitely give it a try. And so with that, I'll pass it to you, Pascal. Do you have any picks this week? First of all, I'd like to second that. I mean, I I also enjoy the first, first 1. And, and and definitely these 2, like, they're very, the more recent ones, I very much enjoyed them, as well, I think comedy is quite underrated in, you know, a movie crit by movie critics. It's it was very entertaining to me. As for my, personal pick, I would like to,
[00:31:24] Unknown:
suggest, checking out BUP, BUP, and it's a kind of an incremental backing up service. So so, basically, you you say something like, I want to have this folder backed up. And, if you if you track it, like, let's say, hourly, you can jump back to any moment in time and see how that, file looked at. So it's a it's a very kind of, nice way to actually, like, it's an extra way of, securing, you know, the the
[00:31:52] Unknown:
the code you're working on, for example, or, you know, like, to track changes of, of files, basically. Well, thank you very much for taking the time today to join me and explain the work that you're doing with nostalgia. It's definitely an interesting project and 1 that I am gonna be keeping an eye on, and I wish you the best of luck with it. So thank you for all of your time and effort on that front, and I hope you enjoy the rest of your day. Thank you very much. You too.
[00:32:16] Unknown:
Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at data engineering podcast dotcom for the latest on modern data management. And visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Message
Interview with Pascal Van Cooten
Overview of the Nostalgia Project
Existing Tools and Target Audience
Use Cases and Examples
Data Sources and Integration
Development Focus and User Experience
Community Adoption and Feedback
Challenges and Lessons Learned
Future Directions and Contributions
Final Thoughts and Contact Information
Picks of the Week