Summary
Every day there are satellites collecting sensor readings and imagery of our Earth. To help make sense of that information, developers at the meteorological institutes of Sweden and Denmark worked together to build a collection of Python packages that simplify the work of downloading and processing satellite image data. In this episode one of the core developers of PyTroll explains how the project got started, how that data is being used by the scientific community, and how citizen scientists like you are getting involved.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
- And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
- Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
- To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
- Your host as usual is Tobias Macey and today I’m interviewing Martin Raspaud about PyTroll, a suite of projects for processing earth observing satellite data
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by explaining what PyTroll is and how the overall project got started?
- What is the story behind the name?
- What are the main use cases for PyTroll? (e.g. types of analysis, research domains, etc.)
- What are the primary types of data that would be processed and analayzed with PyTroll? (e.g. images, sensor readings, etc.)
- When retrieving the data, are you communicating directly with the satellites, or are there facilities that fetch the information periodically which you can then interface with?
- How do you locate and select which satellites you wish to retrieve data from?
- What are the main components of PyTroll and how do they fit together?
- For someone processing satellite data with PyTroll, can you describe the workflow?
- What are some of the main data formats that are used by satellites?
- What tradeoffs are made between data density/expressiveness and bandwidth optimization?
- What are some of the common issues with data cleanliness or data integration challenges?
- Once the data has been retrieved, what are some of the types of analysis that would be performed with PyTroll?
- Are there other tools that would commonly be used in conjunction with PyTroll?
- What are some of the unique challenges posed by working with satellite observation data?
- How has the design and capability of the various PyTroll packages evolved since you first began working on it?
- What are some of the most interesting or unusual ways that you have seen PyTroll used?
- What are some of the lessons that you have learned while building PyTroll that you have found to be most useful or unexpected?
- What do you have planned for the future of PyTroll?
Keep In Touch
- Martin
- mraspaud on GitHub
- @MartinRaspaud on Twitter
- Pytroll
- Website
- Slack
- Mailing List
- @PyTroll on Twitter
Picks
- Tobias
- Tool
- A Perfect Circle
- Martin
- Vulfpeck
Links
- PyTroll
- Swedish Meteorological and Hydrological Institute
- Common Lisp
- Danish Meteorological Institute
- Trolls in Scandinavian Lore
- NumPy
- KISS (Keep It Simple Stupid)
- Spectroscopy
- Radiance
- Polar Orbiting Satellite
- Geostationary Satellite
- EUMETSAT
- SatPy
- PyResample
- Cartographic Projection
- Proj4
- GOES16
- [GOES17](https://en.wikipedia.org/wiki/GOES-17?utm_source=rss&utm_medium=rss
- Dask
- Data Engineering Podcast Episode
- NetCDF
- HDF5
- PySpectral
- PyCoast
- SupervisorD
- TrollCast
- European Space Agency
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast thought in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it, so check out Linode. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale up. Go to python podcast.com/linode today to get a $20 credit and launch a new server in under a minute. And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers for software engineers.
Clubhouse lets you craft a workflow that fits your style, including per team tasks, cross project epics, a large suite of prebuilt integrations, and a simple API for crafting your own. Podcast.init listeners get 2 months free on any plan by going to python podcast dotcom/clubhouse today and signing up for a free trial. And visit the site at python podcast.com to subscribe to the show, sign up for the newsletter, and read the show notes, and keep the conversation going at python podcast.com/chat.
[00:01:25] Unknown:
Registration for PyCon US, the largest annual gathering across the community, is open now. So don't forget to get your ticket, and I'll see you there. Your host as usual is Tobias Macy. And today, I'm interviewing Martin Raspo about PyTrol, a suite of projects for processing Earth observing satellite data. So, Martin, could you start by introducing yourself?
[00:01:44] Unknown:
Thank you for for having me, Tobias. So I'm a computer scientist. I'm, working at the moment at the Swedish Metrological and Hydrological Institute as a software engineer and I've been doing that since 2009. And as you said, I'm I'm 1 of the the lead contributors of Python.
[00:02:06] Unknown:
And do you remember HayFirst got introduced to Python?
[00:02:08] Unknown:
Well, it was when I started working at the Swedish Meteorological and Hydrological Institute in 2009. I was a a common lisp believer before that, and, I got introduced to Python because, for the field of, earth observation, satellite processing here at SMHI, they had started with Python already at the end of the nineties. So, that's what I got how I got into it.
[00:02:37] Unknown:
And so can you start by giving an overview of what the Pyetrol project is and how the overall effort got started?
[00:02:44] Unknown:
Yes. Maybe I'll start, on how it got started because I think that explained a lot of of of what it is actually. In the beginning, the the objective was that we wanted to have a good earth observation satellite processing system at our institute, but there were no alternatives that we could use because it was either something quite expensive that we had to buy from a company or we had to, develop our own. But since we were just 2 persons working with that at this Metro, it it felt like a daunting task. And luckily, my colleague and I were attending some international meetings and met other colleagues from the Danish Metropolitan Institute and that had the same problem as we had. There also were only 3 persons working on that, but we had basically the same needs, and we were using the same tools that is Python.
So in the beginning or in the end of 2009, we had a a meeting between our institutes, where we sat to 5 of us, and we decided that, we'd start sharing things because I had started working on a a small prototype for reading satellite data in a unified manner. And at the Danish Metro Metrological Institute, they had started working on their resampling library that is something that is very important in our field. And that's how things got started. And so from that, you can guess what Python is more or less, I suppose. It is a collection of a lot of free and open source Python modules, that we use for reading, processing, writing, or observation satellite data, and a lot of different modules that we use as helpers for 24 7 operations to get things running smoothly in our institutes.
[00:04:42] Unknown:
And is there any particular story behind the name itself?
[00:04:46] Unknown:
Yes. We wanted to, because it's it's it's, it started as as a collaboration between SMHI and and DMI. We wanted something that that remembered people that it was from Scandinavia, and trolls are a part of the Scandinavian folklore. So that's how it started, and we decided to go for 5 4.
[00:05:09] Unknown:
Yeah. When I, saw the name and then I was looking closer and saw that it was from the Swedish Institute, I assumed that that was most likely the case, but I thought it was kind of funny, particularly given the, associations that have come up with trolls and trolling in general when the Internet era had it was good to see it sort of brought back to the, the original roots of trolls and some of the more sort of positive and cultural aspects of it. Yes. Exactly. Trolls are not only
[00:05:39] Unknown:
negative in the Scandinavian folklore. They are nice trolls also.
[00:05:45] Unknown:
And so the main use cases for Pytroll, you mentioned for being able to consume and process and write the information. But in terms of the processing and the analyses that you can perform on the data once you've retrieved it, I'm curious, what are some of the main ways that people are leveraging this suite of tools?
[00:06:08] Unknown:
Yes. I think there are different kind of users for PyTorch, of course. But I would say that it's mostly 2 categories. 1st category is probably those that started the projects, which are national National Metrological Institutes, mostly around Europe, that use PyTorch for operational data processing and mostly for generating imagery that you could send later on to users or customers. So it's operational tools that make sure that the satellite data is received and batch processed without any need for human intervention. So that's 1 big part of what the PyTorch use cases are. The other use case is for interactive data analysis. You get your satellite data and your scientists who want to work with the data, look at it in different ways, plot it, generate some imagery maybe to find out some features for a given scene that you're working on. So that's that's the second use case, I would say.
[00:07:07] Unknown:
Since the overall project is an umbrella for these different modules, I'm wondering if there have been any particular development approaches or challenges that you faced in terms of ensuring that these different modules are able to integrate easily and produce an overall workflow that's fairly smooth without having any inconsistencies in terms of the style of the APIs or the ways that the data representations are expected between the different packages?
[00:07:38] Unknown:
Yes. So 1 thing that we did is decide from the beginning that we would try not to have any custom data types, try to stick to NumPy arrays in the beginning. So that was going to be the data format that we're going to use internally in the packages, but also be able to exchange between different packages through the APIs. So that was 1 very important aspect. Try to use as much as possible standard data structures so that, we wouldn't have any any problems later on. And then and then when it comes to the, to the API itself, I mean, we we really try to keep it simple and stupid. The different functions, the different methods that we use, different objects, we try to keep them very obvious in what they they are going to do, so that the user doesn't get any surprises later on and that's that's helps us a lot also for maintainability, of
[00:08:37] Unknown:
course. And as far as the types of data that you would be processing, I'm wondering if there are particular types that are more prevalent in the satellite data that you're working with as far as whether it's images or sensor readings in terms of spectroscopy or anything along those lines and the ways that the data is represented as when you retrieve it from the different satellites that you're working with?
[00:09:03] Unknown:
Yes. We we received the data from, different satellites. So, of course, it's different data formats, different data types. But what we mostly work with is data that is already calibrated and navigated for us from different imaging imaging sensors. When I say calibrated, I mean that it's physical values for radiances. So you mentioned spectroscopy. It's something related to that. So we have the radiances, and then, they can be calibrated further to reflectances that is showing how much, of the, energy that comes from the sun is reflected by the earth or brightness temperatures when we get to the infrared part of the spectrum where you actually try to compute a temperature for that wavelength that is emitted by the Earth.
[00:09:56] Unknown:
When you're retrieving the data, do you interact directly with the satellites and perform API calls against them, or is there some intermediary that you're working with that prefetches the information so that you, don't have to deal with, issues with bandwidth or latency in terms of your own workflows?
[00:10:16] Unknown:
Actually, both options are things that we do. We do not interact with satellites in the way that we cannot control them, of course. But for the polar orbiting satellites, the ones that rotate over, around the earth all the time, We receive some of those directly from an antenna that we have on on the roof of our institute, for example. And for that, the data that we receive is actually raw data, so to say. We do not process this with Spyro directly. We have some processing packages that we use that are provided by the different space agencies or institutes that are responsible for the satellites.
And these processing packages allow us, to transform this raw data into something that is a bit higher level that we can then work with easier in a in an easier manner. Now that's just a small fraction of the of the satellite data that we process at least here, at our institute. There are also geostationary satellites, for example, and those data are received mostly by the, organizations that are responsible for them and they are then redistributed, in different manners. Some of them are available on the Internet, but since it's quite large data volumes, the organization that is responsible for meteorological satellites in Europe, which is called UmedSat, is providing a service to, retransmit satellite data through a, how do you call that? A DVB broadcast service, which is a satellite service that you would have to receive, TV channels, for example.
So we have parabolas pointed in the satellite to receive a stream of data and, and this stream of data is then coded and we have the files as we expect them directly because they are preprocessed at UNSAT. And so we get the ready files then directly on our servers that we can work with. And then there are some, satellites that we just have to go polling and fetching on some Internet sites. I think, for example, the Sentinel data, which is data from the Ukraine, Space Agency, Satellites, sorry, from the Ukraine Space Agency. And, they're quite large, but they decided to dispute them, via via Internet.
So that's the way we used to retrieve them.
[00:12:50] Unknown:
And in terms of selecting which satellites you wanna retrieve data from, I'm curious what the criteria are for determining which ones have the information that you're looking for and how you locate and initiate communication with them?
[00:13:06] Unknown:
Yeah. So, of course, it's it's mostly driven by, what our customers and users want, but we try to, to keep up with different how can I say? We we try to keep up with the knowledge so that we can propose, the services to our customers. And what we do then, when we decide to retrieve data from a satellite is either we we get it, as I said, by Umet cost. But if it's something that we want to get from a polar orbiting satellite that we can receive directly with our antenna, Then we have a Pietro module that is called Pietro schedule, which allows us to determine which is the satellite data that we could that we should retrieve at 1 given moment, because sometimes several satellites are passing at the same time, so you have to choose.
And this module, PyTorch schedule, is actually it has different weights, so to say, for different satellites. So we know which satellite is better than the other, which 1 is more interesting than the other. And from there, we give it an area of interest, that the satellite data has to cover. And from there, using a quite simple algorithm, Pietro's schedule is trying to generate a reception schedule for which satellite data to receive at 1 given moment.
[00:14:31] Unknown:
And since the different satellites are produced and managed by different agencies or different organizations, and I'm sure they have different vintages in terms of how long they've been up there and the types of sensors and technology that they've got on board. I'm curious how that manifests in terms of the cleanliness and uniformity of data and some of the issues that you face as far as being able to clean it into a fairly uniform representation or any data integration challenges that you face when fetching data from multiple different satellites?
[00:15:06] Unknown:
Yes. I mean, of course, the data that we receive is most of the time quite clean. It's only the older satellites that have problems with that sometimes, but we don't have any data enhancements algorithms, that we use directly on our on our side. I think we decide to we decided to, deliver the data in a way that is mostly true to the way we receive it. So that if there are some imperfections in the data, we don't want to mask them because we don't want to have to guess what kind of data is hidden by this inconsistencies or noise and that we can have any data. So we try to keep it as true as possible.
Now the newer satellites, they don't have this problem most of the time because they come with error correction codes within the data so that you know from directly when you receive the data from the satellites if there are some errors in the data and if you can correct them. And if you can't, you just discard this small piece of the data and go on with the rest.
[00:16:25] Unknown:
And as far as the overall workflow of retrieving and processing the data, I'm wondering if you can walk through the overall process and some of the main packages and components within the Pytrol suite that you would use at the various stages?
[00:16:43] Unknown:
Yes. So for someone that wants to process the, the weather satellite data or earth observation satellite data, it it really depends on your use case. If you're if you're looking to work with, interactively with the data, then you will start with a component that's called SatPi. SatPi is the module that glues up most of the different processing modules, from PyTorch so that it's gives 1 uniform interface to many features. And what we do then, we start by it's quite compact. So you don't just need, like, 5, 10 lines of code to generate an image from, satellite data that is already supported.
And, the steps that you would do for that is first, you would be reading the data that is available in different formats. We have different readers for many different formats, and we have also documentation that explains how you can implement your own reader for Sapphire and, which is very nice because then different users, that have data that we are not used to, they can actually write their own readers for that. The second step in in that would typically be that you would resample the data. Resampling means that you have the raw data from the satellite that you received that is often in satellite projection, which means that it just presented as an array in the way that the satellite serves it.
And then for each pixel, you would have a longitude and latitude coordinates corresponding to it. What you want to do after that is probably reproject this information into another cartographic projection so that your user will not be confused by how the data looks so that they would have their usual data domain that they work with and provide them with the data the way they want. Once this is done, what you do generally is generate product, And then either you show it on screen if it's for interactive use, or you save it to disk. That's for the interactive case. But Satpaj is also used in operations, within a bigger, let's see, context, where we have packages that allow us to do batch processing, that do event driven processing for satellite data. That is we have a module that waits for the data to arrive, and when it arrives, it sends messages. We use, messaging queues for this, and the different modules then are running in a chain to do the batch processing.
And 1 of those modules is using Sapphire in the background to generate all these images, save and save them to disk. And those files then can be distributed further with all the tools that we have.
[00:19:47] Unknown:
And in terms of particularly the batch processing but also I'm sure in the interactive case, as you mentioned, some of the volumes of data can be quite large. So I'm curious how you approach scaling in order to be able to keep up with the data feeds if you're doing a, as you mentioned, event driven approach and processing the data as it comes in?
[00:20:12] Unknown:
Yes. I we we, when we send messages, through MessageCues, it's just the metadata. We do not send the data itself through, messages because that would be too much stress on the network on the local network, I mean. So we exchange the metadata. And then when the file is not on the right server, we have a module that knows how to send files via FTP or secure copy or things like that. Now Satpay, for example, has to deal with the data. And it's, as you said, it's big data. And another issue is that the users, they want this data or the imagery fast. They don't want to wait 2 hours to get information.
The metrologists, for example, that are working in our institute, if the data is is older than half an hour, then it's not really interesting anymore. So it it puts some constraints on how fast we we can process things. And, the data from all the satellites is not a problem because they are smaller. The resolution is not as good as the newer satellites. But for the newer satellites, especially the new cherished satellites that have been launched recently in the US, I'm thinking of GOES 16 and GOES 17 satellites, The data is really big.
And for that, we needed to have a look on how to optimize our processing, especially in SatpY. And what we decided after trying to fiddle with NumPy and seeing that the code was getting quite unattainable because the optimization were rendering the code quite ugly and to avoid copies of of of arrays and stuff like that. So we decided to go over to another to a a library, which is called Dask. And, Dask allows us to perform as much as possible out of memory processing. The data is computed lazily. So we avoid a lot of these problems that we have with copying big arrays, that would just crash the server because server doesn't have any memory left.
And it comes with a nice advantage also that you can work natively on multiple calls at the same time.
[00:22:36] Unknown:
And as far as the format of the data when you first retrieve it, I know that Pytrol has interfaces for being able to read from multiple formats, but I'm wondering which are some of the more common ones and some of the trade offs that are present between the different formats, particularly as it pertains to maintaining useful metadata or being bandwidth efficient for being able to interface with these satellites that might have high latencies?
[00:23:06] Unknown:
Yes. So the raw data that that that satellite sends in itself is indeed very much compressed. It is binary data, and, as such, it needs to be read in a special manner to to to be, to be possible to to process later on. But as I said, PyTorch is mostly not handling this kind of data directly. We use this preprocessing packages or we receive the data via other data providers like Humetsat. And the data that we receive are they are usually, NETZF formats or HDF 5 formats, which have quite a lot of metadata, which is very good for us and also are able to compress in in a quite good manner.
So because it's already on ground, then I think the, bandwidth limit is lifted a bit, and and it doesn't need to be so efficient that the data has to be really compressed and in binary format. But we still receive some old data, mostly in binary formats that we have to decode ourselves.
[00:24:18] Unknown:
And once the data has been retrieved and unpacked and you're proceeding through the workflow of doing different analyses, Can you describe some of the common analyses or some of the advanced capabilities of the PyTorch packages that simplify some of these processing tasks and any other tools that you might use to integrate with Pytrol to add additional capabilities. I know in the documentation, it mentions using different channel layers to reconstitute an RGB image or some of the coastline detection for maybe analyzing different weather patterns as they come into, bodies of land and how just some of the overall types of information that you're trying to obtain as you're processing the satellite data?
[00:25:12] Unknown:
Yes. I mean, we we have, we have quite a lot of tools that that are used for processing, and I I can think of of several modules that are doing different things. I mentioned Satpy, of course, but Satpy is using different underlying modules. I mentioned resampling. This is done with a module called PyResample, which SatPY is calling directly. We have other modules, for example, pispectral, which is working more in the spectral domain. For example, it has a capability to perform some atmospheric corrections so that, some effects that the, the atmosphere could be removed to see better, for example, the different features on the land. There are also other things that that we can do.
We have, as you mentioned, the coastlines. We have a small package called Pycoast, which allows you to put coastlines on your images so that you can, display it in a more user friendly manner, especially if you use different wavelengths that don't show the land or the, the seas as we as we usually see them. I'm thinking, for example, water vapor channels, which don't show the ground so much. And in this case, it could be very difficult to to locate the data. So putting the cosigns is pretty important for our users, of course.
And, yes, other tools that we use that are not necessarily part of PyTorch, I can think of, supervisor. I think it's called supervisor d, actually. It's also a Python package, which is available on on the Internet and that we use very much in, National Metrological Institutes that use Python in order to make sure that the process is running all the time. And when it crashes, it's going to be started. The log is being recorded so that we can have a look later on if something went wrong. And, yes, that's that's some of the, yeah, the tools that we use for processing.
[00:27:22] Unknown:
And as far as the overall suite of packages and capabilities of the Pytrol suite in general, I'm wondering how the overall system has evolved since you first began working on it and some of the, biggest challenges that you faced in the process of building and maintaining the various packages?
[00:27:44] Unknown:
I mean, of course, the design is very different from it how it was in the beginning. We went from reading just a couple of different file formats to reading many different file formats. Not only the formats are different, but also the different the instruments that we work with have expanded a bit. There are different capabilities that we have now that we didn't have before. So when it comes to design, we had to change things quite a lot because some things were hard coded for a given type of, sensor that doesn't work nowadays, so we need to restructure that. Always trying to keep a very modular and, like, plugin based approach so that it's easier easy to plug in new features and new things now.
And 1 thing that we find out found out is that is very important is to have, if possible, things to work out of the box. And if they don't, have a very easy configuration so that it's easy to set up and start using. As I mentioned in the beginning, PyTorch is an open source collection of packages, but open source only works if if people are using the package. So, if you make it very difficult for them to use, it's not sure that you'll get so many users and thus not many contributors, of course.
[00:29:08] Unknown:
And have there been any particularly interesting or unusual or unexpected uses of PyTorch that you've seen?
[00:29:16] Unknown:
1 thing I always find fascinating is, when people try to combine data either from multiple different sensors together on the same image or when they combine a lot of datasets possibly from the same, data source into an animation or a large image. We have, for example, a word composite that takes the data from different geostationary satellites at the same time and put this to a large world map, which I think is is very nice, or animations that show, for example, the pause of the polar orbiting satellites as it senses the data around the earth, and you can see this as animation, data looping around.
I I I find it very, very exciting. 1 more unusual way to, to use PYTO I've seen is using satellite data in conjunction with weather radar observations. Data is is quite different. But, since PyTorch, most of the packages are quite agnostic when it comes to what kind of physical values it's handling, as long as as it has some proper definition of where each pixel is located, then you can actually work with the data in in quite interesting ways and try, for example, to pre impose satellite data with radar observation, precipitation observation.
[00:30:44] Unknown:
In the process of building these packages, I'm sure that there are various challenges or interesting tidbits that you've come across. So I'm wondering what were some of the most interesting or useful or unexpected lessons that you've learned in the process of building and maintaining these tools.
[00:31:03] Unknown:
Yes. Of course. There there are a lot of things that you learn when you work with, open source software and, quite a large flow of packages. I think 1 1 of the most important lessons I I learned is that better documentation brings more users. And so if if you have poor documentations, your package is probably not going to be used. So if you have the documentation that that fits, then you will have users. And the second lesson I learned is that every every users everybody can be a contributor. Mostly people, when they talk to us, when they chat with us, they they say, well, the problem is I I don't know so much about Python, and I don't know if I can I can help with anything?
But we always find out find out that, well, they have ideas, and they have things they they can improve and I found find ways to improve, I don't know, this composite, this RGB composite a little bit or they can improve the documentation in some way or they have ideas about how to enhance the documentation. And I think this is very important that we that you really listen to to the users and try to empower them, make them understand they can they can be part of this.
[00:32:22] Unknown:
Pytrol was originally conceived and built by these meteorological institutes, and I'm sure that's the primary venue where it's still used. But I'm wondering if you have found any particular general interest as far as people who want to do citizen science or perform their own analyses or exploration of this types of data?
[00:32:44] Unknown:
Yes, of course. I mean, it it's very difficult to know who is actually using PyTorch because we don't have anything that like Google Analytics or anything that tells us who the people are that are using PyTorch, of course. But what we found out is that people are starting using PyChrome to process the data that they received. We have some users that are receiving satellite data through amateur antennas that they have build build themselves. They use the PyTorch packages to track the satellites, with their homemade antenna, retrieve the data, and then from there, they can start assessing it with our tools.
And I think this is this is really nice. There are also people that have bought a license to receive the data from, HumidSat. It's it's a 1 time cost for a lifetime of data. So some people do that, and they use FIFO to generate some imagery that they they think is is really nice. So I'm really excited about this. I think it's it's really fun to see people using it, not because it's their jobs, but because they they enjoy it. And I'm really happy that PyTorch can, help those people to do that.
[00:34:04] Unknown:
And 1 of the packages that you have to simplify some of the data distribution is 1 that, in some ways, mimics the BitTorrent protocol. So I'm wondering if you can talk a bit about some of the ways that that's used and and perhaps how it simplifies some of these citizen science efforts and also access to historical information that somebody might want to process just for being able to maybe use that in machine learning contexts to process that historical information to maybe create a training data set and generate models that you can then apply to new information?
[00:34:42] Unknown:
Okay. So, first about the, the Bitcoin based model that we made, It's called, and it came up from a very simple need that we had at 1 point when we had to replace our antenna because the antenna that we had was getting old, so we had to replace it, and we had a new provider. So what happened in practice is that we had to dismantle the old antenna and then put a new 1, and that takes 1 month or 2. And during this time, you lose all access to the data. And we were lucky enough to have good relations with, some of our our labor institutes, and the Danish Metallurgical Institute was kind enough to provide us with our data at this time, which was working fine. But as I mentioned earlier, timeliness is of the essence when it comes to our data applications.
And in that regards, I was convinced that we could have some kind of streaming of the data in some way so that instead of waiting for the Danish Institute to be ready, having packaged the file, and then we start downloading it only, like, 5 minutes after the data has been received and it takes 5 more minutes to receive the data. We already less lost 10 minutes in this way. So we decided to to look into how it was possible to stream this data in some way. Now what happened is I I I chose to reimplement, or to implement a new protocol, which is based on Bitcoin because at the time, I wasn't aware of, the streaming capabilities of BitTorrent. I think even though that could maybe have worked, I think the, data is sufficiently different, so that had to to be done in another way.
And, so what we ended up basically is have a package that is capable of streaming in real time the satellite data as it comes in through the antenna and then further onto the Internet so that other institutes or other partners can listen in and retrieve the data. And 1 good thing is, with this protocol is that if you already are receiving this data yourself from your own antenna, you can actually merge the 2 data streams in order to take the data that is of best quality and end up in 1 satellite passage, which is of best quality between the 2 possible reception stations.
Then when it comes to the, training of artificial intelligence and historical data, We have some packages that are able to read all data formats also that are the data are available on the Internet that you can download and are quite mainly people or institutions that are using PyTorch for this. 1 of the projects of the European Space Agencies and also from UmedSats are actually analyzing older satellite data. So we have the capability of reading satellite data from the eighties and even the seventies, I think, to generate long term, datasets.
And some of this data are integrated later on, as you said, into, models that are using artificial intelligence to generate some parameters that are later used for either analyzing the climate or looking at, I don't know, the difference in precipitations that has been happening in the last years or such things.
[00:38:30] Unknown:
What do you have planned for the future of the Pyetrol suite or any particular help that you might be looking for within the community?
[00:38:40] Unknown:
Well, I it's difficult to plan, really, when we try to be as open as we are in PyTorch because it's largely driven by contributions. And I wouldn't say there is a a clear leadership and planning as what we want to do for the long term. I think we will see what the future satellite missions would propose and then adapt from there. But, of course, we each I mean, all the main contributors and I suppose also the other contributors have ideas of how they would like to change PyTorch and which direction they want they want to go. So I can only speak for myself here, I suppose, but what I would really like is to have, the possibility to have a 1 step installation for full operational chain for a given satellite data format because this is something I feel we are lacking. It it can be a bit complicated if you're not really familiar with Python, how to install different packages, how to get them to talk to each other in order to process the data. And once we have that or maybe parallel, I don't know, I would really like to like to have a subscription subscription based processing for the users that you have some kind of interface in the processing chain that user can users can go to and directly order some data and say, I'm interested in this kind of data for this area every hour or every 10 minutes, and then the system will automatically update its batch processing tool to do that. And then the last point I I'd like to work with myself is, optimization in particular of, of resampling because it takes quite a lot of of memory still.
And, because we use very generic approach to resampling, it works in in any situation, which is realized. But we also have to take into account the fact that satellite data is often already organized in some way, that pixels are not randomly located in space. They follow different patterns. So I I strongly believe that it is possible to optimize, do the sampling some way so that it could be more scalable on multiple cores and less greedy when it comes to memory.
[00:41:15] Unknown:
And are there any other aspects of the PyTorch project or satellite observation processing or anything along those lines that we didn't cover yet that you think we should discuss?
[00:41:27] Unknown:
Well, I think we, we discussed most of it. So, I I cannot think of anything right now.
[00:41:38] Unknown:
Okay. Well, for anyone who wants to follow-up with you and get in touch or keep up to date with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose the band TOOL. It's from a group that I listened to a long time ago and, recently picked up some of their old CDs because they don't have any of their music available on the Internet. So I, found found a good deal on them. So I've been enjoying that recently, as well as the lead musician's new band that he went to after that, A Perfect Circle.
So, some good, interesting progressive rock. You know, a lot a lot of technicality to the music, so it's a lot of fun. Still enjoy it. So for anybody who's looking for a new band to check out, it's worth a look. And with that, I'll pass it to you, Martijn. Do you have any picks this week?
[00:42:31] Unknown:
Well, talking about music, I suppose, I would like to mention the band called, Gulf Peck, which I am a very big fan of. It is, quite different type of music than what you just mentioned. It is mostly funk music, but it is very refreshing. It's something that is very uplifting, I would say, and I find they have a lot of humor. They try to be quite open and dispute their music videos freely on YouTube and on and you can have, listen at them on Spotify, for example. So I really recommend that.
[00:43:10] Unknown:
Alright. Well, thank you very much for that. I'll have to check them out, and thank you for taking the time today to talk about Pytrol. It's definitely very interesting problem domain and, looks to be a well designed suite of packages. So thank you for that, and I hope you enjoy the rest of your day.
[00:43:27] Unknown:
Thank you very much for having me.
Introduction to Martin Raspo and PyTrol
Genesis of the PyTrol Project
Main Use Cases and Users of PyTrol
Development Challenges and Approaches
Data Retrieval and Processing Workflow
Scaling and Optimization Techniques
Advanced Capabilities and Tools in PyTrol
Evolution and Challenges of PyTrol
Lessons Learned and Community Contributions
Future Plans and Community Involvement
Closing Remarks and Picks