Summary
What’s the weather tomorrow? That’s the question that meteorologists are always trying to get better at answering. This week the developers of MetPy discuss how their project is used in that quest and the challenges that are inherent in atmospheric and weather research. It is a fascinating look at dealing with uncertainty and using messy, multidimensional data to model a massively complex system.
Preface
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- I would like to thank everyone who has donated to the show. Your contributions help us make the show sustainable.
- When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at linode.com/podcastinit and get a $20 credit to try out their fast and reliable Linux virtual servers for running your awesome app.
- Visit our site to subscribe to our show, sign up for our newsletter, read the show notes, and get in touch.
- To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers
- Your host as usual is Tobias Macey and today I’m interviewing Ryan May, Sean Arms, and John Leeman about MetPy, a collection of tools and notebooks for analyzing meteorological data in Python.
Interview
- Introductions
- How did you get introduced to Python?
- What is MetPy and what is the problem that prompted you to create it?
- Can you explain the problem domain for Meteorology and how it compares to other domains such as the physical sciences?
- How do you deal with the inherent uncertainty of atmospheric and weather data?
- What are some of the data sources and data formats that a meteorologist works with?
- To what degree is machine learning or artificial intelligence employed when modelling climate and local weather patterns?
- The MetPy documentation has a number of examples of how to use the library and a number of them produce some fairly complex plots and graphs. How prevalent is the need to interact with meteorological data visually to properly understand what it is trying to tell you?
- I read through your developer guide and watched your SciPy talk about development automation in MetPy. My understanding is that individuals with a pure science background tend to eschew formal code styles and software engineering practices so I’m curious what your experience has been when interacting with your user community.
- What are some of the interesting innovations in weather science that you are looking forward to?
Keep In Touch
- MetPy
- @MetPy on Twitter
- Documentation
- GitHub
- Ryan
- @dopplershift on Twitter
- dopplershift on GitHub
Picks
- Tobias
- Ryan
- Sean
- John
Links
- Unidata
- University of Oklahoma – College of Atmospheric and Geographic Sciences
- University Corporation for Atmospheric Research
- NetCDF
- GEMPACK
- XArray
- The Climate Corporation
- GOES-16
- LDM
- Goes16 on Twitter
- Don’t Panic Geocast
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast.init, the podcast about Python and the people who make it great. I would like thank everyone who has donated to the show. Your contributions help us make the show sustainable. When you're ready to launch your next project, you'll need somewhere to deploy it, so you should check out linode@linode.com/podcast init. Get a $20 credit to try out their fast and reliable Linux virtual servers for running your app or trying out something you hear about on the show. You can visit the site at www.podcastinit.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. And to help other people find the show, you can leave your review on Itunes or Google Play Music, tell your friends and coworkers, and share it on social media. Your host as usual is Tobias Macy. And today, I'm interviewing Ryan May, Sean Armes, and John Lehman about MetPy, a collection of tools and notebooks for analyzing meteorological data in Python.
So could you each introduce yourselves? Then we'll go in the same order that I mentioned. So, Ryan, if you could go first.
[00:01:01] Unknown:
Hi. My name is Ryan May, and I'm a software engineer at Unidata. Hi. My name is Sean Armes, and I too am a software engineer at Unidata.
[00:01:09] Unknown:
And I'm John Lehman. I make 3 software engineers at Unidata.
[00:01:13] Unknown:
And how did you get introduced to Python, if we can go in the same order?
[00:01:17] Unknown:
So I've been using Python since around I believe it's 2, 006. I've been a MATLAB user up to that point. So my background's in electrical engineering and radar meteorology, and MATLAB's used extensively there. And I'd started down that path and then heard about a lot of maybe not rumblings, but a lot of things about how Python had these uses for scientific stuff. So I started poking around at it and used it 1 day to actually I was trying to debug some of my research code and I dumped out arrays from gdb, so you can actually just dump out memory values from the GNU debugger. And so I did that and used Python to plot up the values and managed to find many bugs including some I wasn't expecting to find, within my code.
And that kind of sold me on how easy it was to get the plotting going and, you know, even trying to load that data into MATLAB would have been a pain. So that's kind of where I started and then that summer I was supposed to be studying for my PhD qualifying exam and instead I spent lots of time learning Python. And in the fall, I applied to some class work, and I haven't looked back since.
[00:02:20] Unknown:
And Sean, how did you get introduced to Python?
[00:02:23] Unknown:
Yeah. So it would have been, that fall when Ryan started using it in his course work. We were taking a course called objective analysis. And I decided that would be a great time to to learn c. And I was blow blowing my array bounds all over the place. It was disgusting, coming from from Fortran, which is what we were taught as undergraduates. And, I think the final straw down that path of let's learn c was when the instructor told us to plot up some of the values from the, objective analysis routines we were doing, And it involved compiling a Fortran based image package and using Fortran to plot it up. And Ryan looked at me and he said, dude, no. Just just don't. And, he sent me a a little script doing some basic plotting.
And from there, I think pretty much every morning for the next, let's see, 2007 to 2011 when I moved to Boulder to, work at Unidata. I was in his office, for hours discussing Python
[00:03:24] Unknown:
stuff and plotting how to take over the world using Python. So So I should probably clarify that Sean and I both went to the University of Oklahoma for our undergraduate and graduate degrees in meteorology.
[00:03:35] Unknown:
And, John, how about you? How did you get introduced to Python?
[00:03:38] Unknown:
Well, so I actually went to University of Oklahoma as well for my undergraduate degree in meteorology, and then I moved up to to Penn State to do my graduate work in geoscience in general. But I started using Python in, I would say, about 2008, so a little bit after these guys. And my background has been mostly in experimental geoscience and geophysics, so I needed a way to make plots and data visualizations of experimental data from complex new apparatus that there weren't toolkits for. So I needed to learn some way to do that and actually bumped into Ryan around that time, and he introduced me to Python as well. I've used it all throughout my graduate work, and then now here full time at Unidata.
[00:04:18] Unknown:
And if you could do a brief diversion about what it is that you do at Unidata because since you all work there.
[00:04:24] Unknown:
Sure. So Unidata, we're part of the University Corporation For Atmospheric Research. That organization is responsible for managing the National Center For Atmospheric Research, which is a federally funded research, and development center that does research on the atmospheric sciences. And then UCAR also has this community programs division, which Union Data is 1 of them. And so our focus is to produce technologies for working with weather data and distribute weather data and generally help the academic atmospheric science community be able to integrate weather data into their education and research goals.
And is met py part of that? So met py is a recent development on that side. Python, you know, we've I'll mention how long we've been using it for a while. It's really seen an uptake in use in the, meteorology and atmospheric science communities within the last seriously in the 5 or 6 years, I think we had a symposium started our big American Meteorological Society annual meeting. It's had a Python symposium. I think this year was the 7th 1, 7th or 8th that they've had. And so that started then, but the real uptake, I'd say, been the last 5 years has really seen, you know, everyone within the community we interact with has expressed an interest in using and learning Python. And so, Unidata being, you know, responsive to our community has started to invest much more in trying to meet the needs of people wanting to use Python. And so we do a lot of both in terms of developing software tools like metpy, and there's another package we work on called siphon that's used to download remote data. And then we also do a lot of training in terms of workshops as well as trying to produce online training materials like the notebooks you mentioned.
[00:06:06] Unknown:
And so if you could dig a bit further into what MetPy is and what the problem that you're faced with when you first started it.
[00:06:14] Unknown:
Sure. So met py is a collection of tools in general for working with weather data. So that goes from reading in data, which we in the atmospheric sciences have a wide variety of archaic formats of data or different types of data. So in meteorology, we work with surface based instruments. You know, you have a temperature and humidity sensor app out somewhere to take measurements, as well as wind. We also launch weather balloons to try and gather information about what's going on in the upper levels of the atmosphere. There's satellite based sensing. There's, radar. So that's more about detecting precipitation and severe weather, like hail or tornadoes.
And there are many others I'm sure I'm forgetting right now, but those are the big ones. And so we have a wide array of formats for all those different sensors. And so the code to read in that data can be little tricky. And so MetPy is trying to bring those different formats to the Python world. I will say at Unidata, we've worked to develop kind of a general format called NetCDF that tries to make it easy to have a more self describing data format as opposed to some of these binary formats where if you don't know what the format is, the data is lost. So beyond data formats, MetPy is also about having a library of calculations, reusable calculations to to try to apply to these data files that you read in. So if you read in some temperature and humidity and you want to calculate the dew points, or you can go from that maybe wind chill with the wind data, or heat index, or a variety of other different parameters that meteorologists would be interested in. So trying to come up with a library of those that are reusable both for observational data as well as, things that come out of numerical prediction models. And then the last part of metpi is providing some utilities for generating plots. So we have our domain specific plots that we're used to looking at both for looking at say, we have these we call soundings, skew t diagrams that we use for plotting balloon data or we also have things for looking at those the surface station data and just a variety of other little odds and ends that the, meteorology users are used to looking at. So just trying to make reasonable tools for that and kind of in the vein of trying to plug into the rest of that scientific Python stack. So matplotlib exists, NumPy, scipy, all these great tools exist that provide arrays or numerical algorithms or from Matplotlib the publication quality graphics. And so 1 of the things that metpy tries to do as opposed to some other Python libraries that exist out there is to just plug into the stack that's already well supported, being developed elsewhere, and just fill in our domain specific part rather than reinvent the entire wheel.
[00:08:46] Unknown:
Yeah. There's definitely quite an extensive suite of offerings for being able to manipulate data once you have it into a sort of, standardized format. So it seems that the primary goal of met py is just to serve as an interface to the variegated different types of data sources that are used within, meteorology and atmospheric sciences. Does that sound like an accurate synopsis?
[00:09:10] Unknown:
For part of it. But part of it, it definitely comes down to the calculations. That's a big part trying to go back to various literature references and, you know, look at the scientific literature for where some of these formulas came from for where these calculations were doing, evolved and have a citable source for that as well as doing robust testing of the calculations. And so, you know, historically, there have been other packages that have kind of gone along this road. 1 familiar to many meteorologists would be this package called GEMPAK that actually came out of the, National Centers For Environmental Predictions. So 1 of the the federal government's entities that does weather prediction. And they provided this nice tool called Genpact that is used for making weather maps and various analyses.
And that worked great in the eighties nineties, but as we get towards, you know, modern times, it's, you know, showing its age and development has kind of, as other software packages advance, development of that has has kind of been ceased. And so you have this base functionality of having this library of calculations, but those were all in C and Fortran and difficult to reuse. So 1 of the things metpY is focused on is providing a reusable set that no matter what your problem is, if you can get your data into a NumPy array, you can apply the calculations we have within MetPI.
[00:10:22] Unknown:
Yeah. And that's really a big part of it. The NumPy array is such a backbone of scientific analysis that as long as we can keep mileage out of it for for example, like Ryan mentioned, you've got your dew point, you've got your temperature, you want relative humidity. You can do a simple calculation that way. But in meteorology, part of the problem we have is we're on this rotating sphere, and when you're wanting to do spatial derivatives and things like that, all of a sudden, coordinate systems start playing a big role. And so that's 1 of the challenges with MetPy, is being able to do, even on paper, what would be a simple calculation, but being able to do that smartly and I don't wanna say in a black box way, but but making sure all the information, for example, about the coordinate system is there with the data as well so that you can do those calculations
[00:11:10] Unknown:
correctly. Yeah. Definitely. And I think another part of it too is, you know, we're working with real data a lot of the time. We're trying to do these calculations, like, calculate how much potential energy there is in the atmosphere from a balloon sounding that might have a sparse sampling of the atmosphere at some point, so there might be missing data or the balloon might pop early. So we have to make sure that our calculations are robust
[00:11:32] Unknown:
and are able to deal with these complications that we get when we're taking real measurements. I should also add that in addition to the NumPy array, there's 1 other thing we do readily depend upon in terms of doing these calculations and that's a library called Pint which implements physical quantity or unit support. So for instance, you could take a NumPy array and say, okay, this array of temperatures is in degrees Fahrenheit or it's in Kelvin or maybe this array of lengths is in meters and not kilometers. And so that was kind of when we started this 1 of the founding principles was to try and make unit support built in so that, you know, in my experience I'd spend a lot of time trying to run down errors in my code and I don't think any of them turned out to be unit things but you spend a lot time just making sure am I using all the units correctly in these various places. And by using another library, let the computer track the units rather than making the programmer do it. And, you know, all the horror stories of various things that have gone horribly wrong when units were incorrect, like the, Mars Express. Was it Mars Reconnaissance Orbiter? No. That's the 1 that worked. There was another Mars probe that never entered orbit around Mars because of unit issues, it was determined. So that kind of hammers home the importance of making sure those are correct and trying to let the computer do it rather than the programmer. And so that we have NumPy arrays and then just ask people to attach units to that, and then the the code can work correctly. And that's really important with all the you know, you you can't do things in degrees Fahrenheit when you're doing physical calculations. You need to use Kelvin. And there are various other things where initially we didn't have that and a lot of documentation in MetPy was spent just saying this parameter is in this unit, this parameter is in that unit. And it really simplified the documentation to allow the computer to just track the units for you and do the right thing. Yeah. Units are definitely a significantly important piece when you're dealing with, you know, physical measurements
[00:13:13] Unknown:
and data that comes from that. And, in 1 of the other interviews that I did with the folks from Astro Pi, they mentioned that they have a unit library embedded with it in their project as well. And so when I was speaking with them, I was thinking to myself how it would be useful if that had been extracted to be used in a stand alone fashion. And then when I was reading through your documentation and saw the reference to the pint library, I was thinking how it's, fortunate that that exists as well so that there is that capability of incorporating unit support into other libraries as well because of the fact of how ubiquitous it is when you're dealing with actual physical sciences and physical units of measure. Yeah. Absolutely. It it the 1 unfortunate part though is that so many attempts on these u physical quantity or unit packages have been attempted. So off top of my head, I can think of astropiunits.
[00:14:00] Unknown:
Pintquantities is another 1. And those are those are the ones that I know are well supported or have used in the past. And I remember seeing a talk at SciPy years ago, say, 2013, that actually benchmarked, I think, about 10 of these different unit packages. And so 1 unfortunate part is that it's been redone so many times because no 1 gets enough mind share, and the fact that it's difficult sometimes to integrate these things together. So 1 of the challenges we face, you know, as Sean alluded to with the spatial derivatives, is 1 thing we'd like to make more use of within metpy is the xarray library because that has a notion of having these arrays with coordinate system or at least the coordinates on which a grid of data is available would simplify doing things like spatial derivatives or integration things like that. Those do not readily play with the, point unit package. And so, you know, there's some more work that needs to be done to try and make these 2 play together nicely, but that's probably 1 of the challenges that, you know, the base NumPy array doesn't have a notion of how to do physical quantities. And, you know, I think that's probably 1 of the problems that, from my perspective, we most face in the Python scientific stack as I keep seeing this question pop up over and over across various
[00:15:09] Unknown:
issue trackers on GitHub is the question of, oh, this doesn't play well with this unit package. How do we solve that? And as you were describing some of the problems of manipulating and analyzing some of the data that you're dealing with, I was thinking about how it seems that because of the fact that atmospheric data is necessarily an open system as opposed to a lot of the more controlled experiments that are done with some of the other physical sciences such as physics or chemistry. It seems that there's a much greater difficulty in being able to run the analysis on the information that you do have because of the inherent uncertainty. So I'm wondering if you can discuss a bit about the differences between meteorological and atmospheric sciences and some of the other physical sciences and then also maybe speak to how you deal with some of that uncertainty when you are analyzing the data.
[00:15:58] Unknown:
Well, certainly, when you're dealing with observational data, inherently, it's noisy. And so doing something like taking derivatives of observational data is fraught with problems because you're doing a subtraction and then ends up having, you know, blowing up under the, on the uncertainty in the data. So and 1 of the difference big differences between meteorology, atmospheric sciences, and other sciences, as you alluded to, was you don't have a laboratory. You know, the only laboratory we really have, you know, minus a few small scale tabletop experiments, is you have the world. And so all you can do is take observations by and large as far as, you know, actually traditional laboratory type things. And so that's kind of a you end up with doing a lot of case studies, you go back and look at a lot of data, but you don't have an opportunity to really have a control experiment and go run the same experiment with different conditions and see what happens. Instead, a lot of what we rely on is these numerical models either for prediction purposes, so your daily weather forecast, the tool the National Weather Service forecasters look at, or your private sector forecasters look at are the is output from, a numerical prediction model. And that gives you data on a nice even grid and then the same thing happens for researchers. You can take a maybe it's based on a real case, maybe it's just a synthetic set of conditions, but you can apply that same numerical modeling technique to some chosen set of initial conditions and see what evolves in the computer. And, you know, there's some ties between what goes on the computer and what happens in the real world, but there's some differences. 1 of the big challenges you face though is a lot of tools work really well for model output because you have that fixed grid. Things aren't as noisy. It's well arranged data, and, you know, you don't have missing values. Whereas real data, you have data dropouts. You have things on irregularly spaced samplings.
[00:17:37] Unknown:
You've got, the uncertainty you mentioned. Got instrument failures. You have all kinds of weird things that can get injected into the real data.
[00:17:44] Unknown:
And, as you mentioned, a lot of the sort of visibility that people have into the discipline of meteorology is via their local weather forecasts. So can you talk a bit about some of the sort of research experimentation that goes on that's not necessarily consumer facing?
[00:18:03] Unknown:
Sure. Well, it's always a number ending story of trying to better incorporate the real atmosphere, the actual physics that are going on into, a forecast. And so those numerical prediction models that I talked about, what you're doing is meteorology was once summed up to me as a lifetime of challenges in 1 word. Air acts as a fluid. So meteorology is applied fluid dynamics, and so that's a large sort of a problem. And then you take that and you add 1 more problem of thermodynamics. So you have water vapor in the atmosphere, and that condenses into liquid and gives off an an incredible amount of energy. And so that's what gives rise to storm clouds and precipitation in general. And so take those 2 together, and there's no end of challenges to try to, simulate that within a computer. And so what you have in these numerical prediction models are you take these equations that describe how this air fluid moves across the globe, and the globe is spinning. It's not stationary.
And so you've got Coriolis effects to deal with there. You also have as a fluid, you've got viscosity to deal with. You it's a compressible fluid, so air pressure matters and density changes. And so you take those equations, you put them on a grid, and then you run them forward, essentially. You take a set of conditions and move forward. And so that alone has all sorts of challenges depending on how fine a grid's you know, the spacing of your grid points are that, you know, affects how accurate your forecast can be. It also affects what, size of things you can, capture within there. And so you can often get broad scale weather patterns. Right? But trying to get, you know, I think, nominally, a lot of the models we're running right now is on a 3 kilometer space grid, and those are probably the higher resolution ones. If a storm is a 100 kilometers across, you've got 30 points, which seems like a lot, but, you know, that that's only now getting to that point, and it's still not a great if you're trying to predict a tornado, it's still not enough to try and predict a tornado well in advance. So that's part of the problem. And then the other problem is we have these noisy observations with errors and things, and so trying to inject those into that numerical prediction model is a challenge. And what you actually end up doing is you start with the output from numerical model, the previous output, and then you adjust it based on what the observations actually say. And you've got a lot of statistics and math, and it's called data assimilation. You're assimilating actual observations or data into the forecast.
You actually can't start from scratch because we don't have enough of the observations of all the things. You have wind, that's good, but, detailed information about what precipitation is going on. That's actually 1 of the problems I glossed over there as we talked about all these, you know, 3 kilometer scale and how the fluid works, but there are some processes you can't simulate. Like the air rises and condenses water vapor into little droplets in the air. And you do enough of that and then those little droplets begin, colliding together and coalescing to bigger raindrops and eventually you get rain, or at least that's 1 simplified explanation of it. You can imagine you can't simulate every raindrop within a numerical model, and so you've gotta make some, we call, microphysics parameterizations. You've gotta basically kind of come up with a formula that makes approximately correct results come out from, you know, a 3 kilometer grid that says there's this many raindrops here and there's this much water vapor present, and this is kinda what's prone to happen. I mean, to properly model the atmosphere, we're talking motions on the scale of millimeters
[00:21:25] Unknown:
up to 1, 000 and 1, 000 of kilometers. And so you've gotta chunk up the problem into these parameterization schemes or these simple, almost like cartoon like drawings where you're trying to represent the processes on a smaller scale in a bulk way that kinda fits in between your grid points of the of the model you're running. So there's a lot of observational work that goes into coming up with these parameterized modules that the the models use. But, of course, those schemes may have been developed for a model that was running at 60 kilometers, and now we're trying to run the model at 3 kilometers. But we're using the same parameterization schemes that were, you know, made for a model running at a much, larger grid spacing. And it's pretty clear that you shouldn't be doing that, but it's not clear where to move forward from there. So you can you you get a lot of errors introduced by doing that. And there's a lot of complex couplings too. I mean, we haven't talked about, you know, atmosphere, ocean,
[00:22:18] Unknown:
or the biosphere, cryosphere. There are all these other systems. Everything's tied together. Right? If a volcano erupts, it's going to inject massive amounts of c o 2 into the atmosphere and ashes. There's no way you're gonna capture that in a model, much less all of the other complex interactions that occur with things like tropical storms.
[00:22:35] Unknown:
So it's you you have to really consider the entire Earth system, and that's something that we're just not quite up to the task of doing. And that's where we were asking originally about where you know, areas of research on this. So data assimilation and better ways to incorporate new data sources into a model where, you know, this data assimilation you're kind of taking observed variables and trying to back out primitive things. So you're looking how much precipitation fell over top of this area and trying to back out what the conditions were, how many raindrops were distributed in a 3 d volume over that area, or you're trying to take a satellite observed profile of atmospheric temperature with height and trying to say okay this is what the satellite saw, what might really be going on here. So that's just data simulation and trying to come up with better statistical techniques to fit that. You've got the parameterizations we mentioned trying to come up with better ways to handle different scales and just more accurately do things. You know, we just talked about raindrops. There's snowflakes. There's hail. There's different tendencies for how, you know, the water vapor is going to condense depending whether over, you know, the ocean or whether over your land through the the availability of condensation
[00:23:37] Unknown:
nuclei as we call them. And so there's problems like that. And each 1 of those categories has its own set of problems. So if you're looking at ice crystals, how does water vapor, condense onto a real sharp and pointy looking ice crystal as opposed to something that's nice and symmetrical? And it's it's it's almost intractable
[00:23:55] Unknown:
when you look at it. There's a really good example of that where, you know, we have here in Colorado, we get snow a lot. And noticing the other day driving in from home into work, you know, we encountered about 3 different kinds of snow. You know, you got little pellets, you've got flakes, you've got these big clumps of flakes that we call aggregates. What you're getting there depends highly on what the temperature is, how much water vapor is in the atmosphere, and not just what the temperature is, but what the temperature profile is from the ground to up in the up upper parts of the cloud where the snow is actually forming. And so very small changes in that temperature and humidity can affect what you're getting at the ground. You know, might be a minor detail until you think about how that impacts how much your snow depth is gonna be. So your forecast of, is it 5 inches? Is it 7 inches? Is it 10 inches? I mean, there are a lot of other effects that go into that, but just trying to model accurately snow depth is very dependent upon what kind of snow you're getting. You know, our rule is 10 inches of snow ends up melting to about 1 inch of liquid water. That's a rough rule of thumb, but that can actually be anywhere from 5 inches of snow to 1 inch of liquid to 30 inches of snow to 1 inch of liquid in extremely dry, fluffy cases.
And that has profound effects for, say, flood forecasting or other, you know, hydrological applications. There's a lot of devil in the details there.
[00:25:13] Unknown:
Yeah. And each of these complications that we've talked about, be it hydrology or snowflakes, there's an entire research community that works on each of these problems. Yes. And there are an incredible amount of people that go into making these numerical models do what they do.
[00:25:29] Unknown:
I don't even know what to say to that. As you mentioned, there's just so much inherent complexity in everything that is entailed with dealing with the atmosphere and the global climate because of the vastness of the system and our current, inability to accurately model every component of it simultaneously. You know, we have the computational power to be able to do some localized modeling or localized, approximations of the data that we have, but even just capturing the totality of the data is something that's an impractical problem. So I guess, a couple of different branches of questioning from there I'd like to explore are 1 is curious to what degree machine learning or artificial intelligence systems are employed when dealing with climate and weather data, and also to what extent has the increase in our ability to create sensors and distributed hardware platforms contributed to the increased ability to actually collect data for climate and weather information?
[00:26:28] Unknown:
I'd say both those areas are active areas of research. So I think, again, within the last 5 years, we'll say, roughly, that's when you really started to see more papers on doing machine learning, especially in the the public sector. I think there are private corporations who kind of been on the early side of that wave in terms of planning machine learning to weather data companies like the Climate Corporation that have, you know, a vested interest in trying to provide these useful analyses to try and help predict, say, crop yields or insurance losses and things like that. The, the hazards portion of meteorology is, you know, that that drives a lot. I mean, that that's the thing that, you know, we try to never lose sight of is on some of these things, you know, people's lives end up being at stake when it comes to hurricanes and tornadoes. And but there's been you know, people's livelihoods are at stake when it comes to hail or drought and things like that. And so there's a lot of public impacts there. But on machine learning, that's you know, it's going on a lot, but I think we're still in the research basis of that minus some public or some private sector applications that
[00:27:27] Unknown:
I don't think are as, prominently advertised. I think that's a secret sauce for a lot of companies out there. And what was the other question there? The other question was asking about how the increased ability to create distributed sensor platforms and sensor networks has contributed to the ability to increase the amount and quality of data that we can collect about the climate systems? I'd say the impacts
[00:27:48] Unknown:
haven't been realized yet. A lot of projects ongoing in terms of, you know, talking about trying to collect all these pressure measurements that are available through, I think, Android phones or at least some versions of Android phones that had pressure sensors available on them. And so there's active projects seeking to try and grab this data and see, you know, it's noisy data. The the sensors that you get on the phones are not as exact as a well calibrated weather station, but you make up what you lose in quality there, you make up for in quantity. And so again, playing some machine learning or some other advanced statistical technique that those are open open questions for how much can you make a benefit of. But they definitely we as a community are looking into trying to make more use of these distributed datasets that are available. And then you also have a lot of work going on starting to look at drones as autonomous platforms for collecting weather information. A lot of storm scale research and well, bigger storm, you know, hurricanes down to, severe thunderstorms, you know, involve if you're trying to get information not at the ground, but at, you know, up around what's going on at the top of the thunderstorm or, you know, mid levels of the thunderstorm, involve flying a plane through them and around them. And that has its challenges. There's only so much you can do with the craft that has a human pilot on board. Human pilots have limitations on how many hours they can be flying, and so you're seeing a lot more now starting to look at these autonomous platforms for trying to fly instrumentation
[00:29:07] Unknown:
in and around storms and trying to enhance our understanding of what's going on. We've done a lot with remote sensing through radar and, you know, launching balloons near them, but I think the missing piece has really been trying to get what we call in situ measurements, what's going on literally inside rather than trying to look at it from the outside. And if you can imagine too, even just instrumenting the infrastructure that we already have. So say you had a meteorology package on every semi driving the interstates every day, that's an incredibly dense spatial and temporal sampling of what's going on. So if you had a winter weather event that was happening, that's obviously a large hazard. There can be traffic accidents, impacts to getting materials back and forth, logistics, and so on. That dense sampling could really help us understand what's going on there or at least better mitigate or respond to it as it's happening. So I think that's gonna be a big thing eventually. But in terms of aggregating that data, visualizing it in some meaningful way, and being able to interpret it in an operational scenario nowhere close yet. And that actually fits within a broader problem we're already experiencing in terms of, you know, big data. That's an overused term, but we are definitely
[00:30:08] Unknown:
starting to feel the strain of these large datasets. You know, we're talking about these distributed ones, which would be big, but they're still just single points. Whereas we're starting to talk about numerical weather simulations, you know, a quarter degree resolution spanning the entire globe, so quarter degree latitude, quarter degree longitude, and then say on order 60 levels in the vertical direction. You got these three-dimensional grids. That's that's 1 operational model coming out 4 times a day, and it's doing forecasts every 3 hours out to so long and every 6 hours out beyond that out to 384 hours. So that's a lot of data right there. And then there's actually a new weather satellite, now called GOES 16 that should be coming operational sometime this year, and that brings with a lot of nice new capabilities, impressive sensors, higher time resolution capabilities, but the data volume coming out of that every hour is reaching volumes that were, you know UNO Data historically has shipped a lot of data out to the universities.
I forget what our I think our aggregate bandwidth is almost 2 gigabits out to the universities. 20 gigabyte 20 terabytes a day. Sorry. Yeah. So our aggregated at 20 terabytes a day or this is almost saturating a 2 gigabit network connection, and we're, you know, we're used to doing that. And now with this new satellite we're talking about not even trying to ship it all out, and just exploring solutions, hosting it here, and up in the cloud, because we just aren't equipped to continue to blast out all of this information.
[00:31:33] Unknown:
And not only that, 1 of the 1 of the things we noticed is that with our community of users, anytime we say, hey, would you guys like this data, Whatever it is. The answer is always yes. Peter all just like looking at pictures. It doesn't matter what it is. The answer is always going to be yes. And really, we're we're past the point where it's not, would you like this? But what do you wanna give up so that your institution can continue to receive data because you're maxed out? We can no longer get data into your university because you're saturated. And while users don't really like, at least our users, don't seem to really like to pick and choose the data, That's at the point where we joke, you know. You want all the data? You want the whole data feed? You can't handle the feed, man. You can't handle it. And so we're looking at at what we've traditionally done as push technologies where we push data out. We've gotta move to more of a pull technology, where users can pull the information they need as they need, as opposed to just give me the whole thing. So we've for years, we've been exploring server side technologies that allow you to slice and dice the data remotely and get back what it is you are actually looking to look at. So you don't need to download a full 20 gigabyte model run if you just want to look at the temperature forecast. Especially if you just want the next 48 hours of temperature forecast, you don't need to download 20 gigabytes of information.
[00:32:49] Unknown:
And have you also looked at using something like BitTorrent to ease the distribution load for the for the sizable data sets that you're dealing with?
[00:32:57] Unknown:
So we've actually we, UNODATA, developed a technology called the LDMD local data manager, which that started back in the we'll say mid nineties. I don't have a good handle on the year, but that powers what we call the Internet data distribution, which started as a way to distribute the so the National Weather Service distributes data to its forecast office across the country using a satellite broadcast and so 1 of Unidata projects was to install a satellite receiver for that. It's called NOAA port. It's the satellite broadcast. And so we have a satellite fee or satellite dish out there that receives that broadcast, and then we distribute that to universities over the Internet using the LDM. And so that's a peer to peer transmission software designed to be very quick about, you know, the data gets there's there's low latency from when the data arrives here to when it reaches, our member institutions. And so we've got this network where, you know, we get it and we distribute to a few top level nodes, and then those nodes responsible for, you know, partnering with other, you know, being open to other universities wanting to feed off of it. So we're gonna have this tree structure distributing data. It's not exactly BitTorrent, but So there was a project called the Next Generation LDM that used the BitTorrent protocol.
[00:34:08] Unknown:
And what we found is getting universities to open up the firewall for our regular LDM traffic is very tough. But trying to get universities to open the firewall for something that has bit torrent in the description is pretty much a no go. Especially, a lot of universities are using these turnkey solution firewalls that, honestly, it doesn't seem like they totally understand. It's kind of black boxes. So even getting Internet traffic in on on anything other than port 80 is a nightmare. And there have been institutions where they can't get their firewall to accept traffic from the LDM, and they have no choice but to go to, a remote server to get data over HTTP instead.
[00:34:53] Unknown:
As you guys have mentioned a few times, there's a fair bit of visual modeling and display of the data for being able to interact with it. I'm wondering if there's sort of a, inherent need to visualize the data to be able to understand it properly given the fact of how multidimensional and complex it is from a measurement standpoint, from an analytical standpoint?
[00:35:16] Unknown:
Oh, visualizations is the core of what we do in meteorology. I mean, from the time you start as an undergraduate, you spend a lot of time looking at weather maps. And so a lot of what we do is generally looking at these 2 d slices. You start at the surface, you're looking at, you know, we call station model plots. That's taking the various, you know, wind, temperature, dew point, pressure, and plotting it up around the station location in a certain format. And you use that it's a very dense information dense format. Once you get used to looking at it, you can really suss out some information. And then, you know, as an undergraduate, and I think, you know, many institutions are still doing that today, you have the students actually do a hand a contour analysis by hand on, this data. And so you're drawing the lines of constant value around each of the stations and it's a very painful exercise especially when you're first doing it. And you get very familiar with the colored pencils because you got a new temperature in 1 color and moisture in another color. But what it forces you to do is you really get a handle on what the atmosphere is doing. And, you know, you can figure out where fronts are and where, the moisture is pulling up so where you might expect some thunderstorms to fire and things like that. But lots and lots of weather maps and so doing at the surface and doing it at we, instead of thinking in terms of height, what we generally do is do, maps at constant pressure levels. And so you just work your way up from the surface looking at 850 millibars pressure and 705100 and 300. And so lots of these 2 d slices to form a picture of what's going on from the jet stream up at 300 millibars down to what's, you know, what's going on just overhead at maybe 857 100 millibars and then you also have the mid levels at 500 millibars, which really drives a lot of the weather that's taking place. But you get really familiar with looking at a map and seeing the data on it and really making quick judgments of what's going on. It's amazing what the, the human visual system can do in terms of pattern recognition. You know, you get very quick at, oh, I see that. I see that. That's going on. You can also lose it very quickly as I've learned. But, you know, in the height of undergraduate education, you really get good at, you know, looking at this. And if you're a forecaster doing this on a daily basis, you get phenomenally good at recognizing these patterns. And you learn pretty quickly that the human element in all of this is very important. While it's pretty trivial to pass in, say, a gridded field, like a 2 d NumPy array,
[00:37:24] Unknown:
of temperature data, and you you pass that to matplotlib's imshow and you look at it, Or or you put it in a contouring routine. You look at the contours. You go, wow. That's nice and pretty. But when you do it by hand and you you put your knowledge of meteorology into that analysis, you start to see that the computer tools don't always, you know, while while the the greeting routine is correct and optimal, it's not always truly reflective of what's going on in the atmosphere. Especially when you consider the the noisy observations and, you know, a little bit of instrumentation error and, you know, often do contouring.
[00:37:59] Unknown:
You're assuming linear interpolation between
[00:38:01] Unknown:
observations when really atmospherically, you know, really this is kinda bunching up here. So Yeah. I mean, for example, if you're looking around the front and if you're contouring, you know, lines of constant pressure, lines of constant height, you know that they're going to kink around the front. We know from physics that's what's happening. And if we had a dense enough sampling, sure, we would see that, but we don't. So that's some knowledge that you have to put in when you're doing the contours yourself. And you really end up seeing these as pretty much what we're making is an atmospheric topo map. We're mapping out constant heights or constant pressures in the atmosphere and seeing how this fluid flows through these different ridges and valleys or troughs as we call them in meteorology.
[00:38:40] Unknown:
Yep. And so there's a lot of 2 d planar visualization. You'll also look at cross sections to try and get an idea of a different way to get that three-dimensional picture in your head. There's been a lot of work doing three-dimensional, visualization of weather data and some tools do it better than others. The challenge we face is that's great if you're working interactively with the data, and a lot of people are. But when you need to take snapshots, you wanna put it on the web, you wanna have it go into a research publication. The 3 d visualizations, with some exception,
[00:39:09] Unknown:
are hard to make stand out without having the interaction so you can actually get a feel for the depth dimension. You know, a lot of 3 d visualization you see come out of NASA, for example. Those are actually put together in full, almost like animation studios. The same tools that they're using to do, you know, movies like Toy Story. I'm sure I just stated myself a bit there. But but it's a full design studio that's putting together these animations and visualizations.
[00:39:32] Unknown:
And to have a generic general purpose tool to do that, it's just really hard to get something that looks good. And jumping back down to the code a bit, I was reading through your developer guide and watched the scipy talk that, you gave Ryan about the development automation that you're using for actually creating met py. And my understanding from speaking with a number of people who are working in the scientific space and who started in the scientific space is that individuals who do have that kind of background are generally not necessarily as well versed in proper software engineering practices in terms of coding style and test automation and things like that. So I'm curious what your experience has been when interacting with people from that community who are interested in contributing back to MetPie or at least just understanding the internals of the package?
[00:40:23] Unknown:
Well, I should start by saying I I really owe my interest in doing these kinds of things to going to SciPy. At some point in my graduate career, I skipped a scientific conference because it was just far away and inconvenient and instead decided to go, you know, ask my advisor about going to the SciPy conference. This like, 2007, and went and learned a lot and saw a lot of scientifically minded programmers, but doing things trying to do things better, and trying to do things with testing and trying to do things to make robust software and not just do some of the things I'd already encountered in graduate school in terms of, you know, the the research software that runs on 1 system and doesn't compile on another system. And SciPy really taught me a lot over the years about how to do good software. And so, you know, it was important to me to try to avoid the pitfalls that had befallen many a scientific project and put in place things to try and keep technical debt from accruing, to be to not be scared to modify my own library, but that I haven't, you know if something breaks, it's because my test suite wasn't good enough and, you know, we'll fix that and we'll be better for it, but not be paralyzed to change. And so for contributors coming in, you know, that's definitely something I've worried about and whenever I get a new 1, I'm, you know, someone's trying to submit something I'm very not cautious, but I'm wondering what's the reaction gonna be like when I say, okay we need to adjust this and this, and can you add a test and this. And so far those who've been willing to actually just even get to the point of putting in a pull request on GitHub, generally they're they're very receptive to it because people who submit a pull request on GitHub are already a certain level of programmer, either a scientist who's learned Git and is curious about programming at that level or just a general programmer who's used to it making a contribution, but they've been receptive and going through the steps and eager to learn and, you know, there's usually some hand holding involved because not everybody is have gone through the 2 year process that it takes to get comfortable with Git and familiar with the pull request model, but generally with a bit of hand holding, I so far with a limited number of contributors you know, admittedly. I haven't had a lot of pushback on that. A few people I just point to, you know, once to the Travis CI runs that we use and say, oh, you know, if you look here, there's a bunch of things being flagged by flake 8. And thankfully, I haven't had, you know, say, oh, we need to fix this in this. They just saw the errors and kept iterating on it until the errors went away, and so that was great. So wonder if each of you could
[00:42:40] Unknown:
briefly share what are some of the interesting innovations in weather science, either at the research level or at the underlying technology level as far as sensing and data analysis capabilities go. What are some of the innovations that you're looking forward to that are coming down the pike?
[00:42:56] Unknown:
I guess I'll start. I'm excited about this GOES 16 satellite, or as we used to call it GOES R, with the different channels that are gonna be available to, you know, try and retrieve things like how much ice is up there in the clouds or different cloud types or things going on at the surface. I'm excited to see that combined with what really excites me is the temporal resolution. So historically, we're used to seeing a full disk of the globe every half hour or, you know, getting images every 5 minutes. Now they're talking about regularly getting images every every minute and potentially for certain special areas where severe weather is going on or other, significant weather and getting even higher resolution. And so that's just really exciting to finally have a broad coverage of this high temporal information to really to really see what's going on. It's amazing.
If you've ever seen a time lapse of what's going on in the atmosphere, you know, when you speed up some of these processes, you can really see how the air is a fluid and how you get these wave like motions and how, you know, a thunderstorm isn't just this big cloud sitting here, but you can see how parts of it are rising and and reached out the top of the thunderstorm and kind of descend afterwards. You can just get a better handle on how this fluid really flows. And so it's really cool to see those, and so having, you know, 1 minute satellite imagery is really gonna help play a role just seeing the processes
[00:44:12] Unknown:
going on without needing to fill in with your mind's eye. Yeah. If you if you search Twitter for the go 16, g o e s 16, hashtag, there have been a few pieces of data that have come out from the Ghost platform that people are now starting to throw up visualizations of. And it's, it's the reaction on Twitter has been quite amazing to it. I mean, we've we've got a scene from, the TV show back in the the nineties, Saved by the Bell, where Jesse is saying, I'm so excited. I mean, people are totally geeking out over this, and it's great. It is awesome.
[00:44:45] Unknown:
We should also say that the weather community has been waiting for this moment for a while. The the GOES satellite, this newest 1 has been delayed on multiple occasions. And so I think this is at least probably 15 years in the making. I think I remember seeing some ban some articles that were talking about what would be on this thing. And I think it's been another, you know, probably overshot by at least 5 years in terms of when it's finally made space. So we've been kind of waiting for this, exciting new data source to go online. And so literally today, we've tried finally seen some images coming back from the satellite. It launched in November, and they've been doing various checkouts. And so that's that's in it. The observational side of things, which I think all of us sitting on the table, we're we're we're prone to more be more observationalist
[00:45:23] Unknown:
rather than numerical modelers. And so seeing new observational capabilities is really exciting. Yeah. I would piggyback off Ryan and say go 16 is really what's what's got me excited. I do at Unidata, I do data data infrastructure in terms of data data transport across the network. And dealing with the volume of data that we're gonna have from this platform has me really excited. Because slicing and dicing the data sets are gonna be critical, but also being able to interrogate them remotely in terms of, you know, I want this time to this time, and I wanna see this, this variable and the imagery is is 1 thing. But but then on on the server side where the data actually live, trying to aggregate these huge collections of data into some logical data set that can be interrogated in an efficient manner is gonna pose quite a challenge. And so that's that's got my,
[00:46:17] Unknown:
computer programming side pretty excited about the future here. Alright. And I guess I'll go in a a similar direction, but maybe a little bit closer to to home, not out in space, and say that I'm excited about seeing these highly portable sensor arrays that we can take to interesting events, supercell thunderstorms, tropical events, what have you, and deploy them to get high resolution
[00:46:38] Unknown:
measurements that maybe are even moving, you know, in the atmosphere like a weather balloon as opposed to stations that are staying stationary. So seeing what kinds of new data we can collect from events to better understand them as sensors get smaller and cheaper and practically disposable. And circling back around, what's really exciting about this is, you know, the potential for trying to innovate in the tool space and trying to figure out how we can help out our community of users and make it easier for them to take advantage of these new capabilities by taking off the rough spots in terms of the tools. So making it easier to download these satellite data and do some more advanced things and really help facilitate them doing cool science or just better operational type applications. And so that's what's really makes working on something like MetPy exciting. Yeah. Being able to do exploratory
[00:47:26] Unknown:
data analysis shouldn't be a huge stumbling block that takes weeks to months of labor for a researcher.
[00:47:32] Unknown:
So my my vision is to have everyone opening up Jupyter Notebooks and downloading the data in a few lines of Python and making some full pictures without needing these huge cells of code. And so that's that's really how we're looking at trying to approach these problems is prototype something and look at okay. That took way too many lines, and let's see how we can take that and make it some a generic tool that we can maybe the tool being a function or a class or something in Python. But trying to, you know, extract that away and make it so our users can just do it in a few lines and don't have to understand the chain of of calls they need to make here. It's definitely difficult to try and distill the vastness that is meteorology
[00:48:07] Unknown:
and as atmospheric sciences to a podcast that lasts about an hour. But, are there any other topics or questions that you think we should cover before we start to close out the show?
[00:48:16] Unknown:
No. I I I think we've hit it. You know, we spent a lot of time talking about meteorology and stuff, but, you know, clearly, we're all passionate about this field. And that's really it's really the best part about what we get to do is we take our passion for meteorology and put it into into tools, especially in a in a great language like Python. We get to do that every day and work on things to make other people's lives easier. So that's that's what drives us, and that's what makes it easy to get up in the
[00:48:39] Unknown:
morning. Well, I'll ask each of you to send me your preferred contact information so I can put that in the show notes. And with that, I'll bring us to the picks. My pick today is a new podcast that I just came across recently called the drill to detail podcast. It's a show that, has episodes about data warehousing and big data analysis and some of the different tools and trends going on there. Particular, 1 of the episodes I listened to recently was really interesting discussing the idea of data capital. So I'll add a link to that particular show as well, but it's, been pretty interesting. I think about half dozen episodes into it so far, and it's been keeping my interest. So if anybody's interested in hearing about that space, I definitely recommend checking it out. And with that, I'll pass it to you. Do you have any picks for us today, Ryan?
[00:49:24] Unknown:
So the 1 thing I'd like to point out is so for pytest, the, Python testing framework, there's a plugin called pytest dash mpl. And so that's a plugin that allows you to do image based tests if you have some matplotlib based plots that you'd like to make sure to keep going. So it makes it very easy to generate a set of set of baseline images for a test, and then you use that plug in to run your test, and it can make sure that, your images haven't changed. Now on often, there's a lot of small changes that can take place that aren't significant, so you can set a certain threshold of difference before the test actually fails. And so, you know, we use that extensively in metpY to, you know, make sure our plots aren't breaking. And so just anyone out there doing matplotlib based plots and would like to actually test them, pytestnpl makes it a lot easier.
[00:50:12] Unknown:
Sean, do you have any picks? The the kids and wife and I watch Trolls, and, I wouldn't necessarily give that a pick. How about that?
[00:50:20] Unknown:
I I I watched it myself, and I I can relate. It it was definitely entertaining and humorous, and there are parts of it where you're just like, I can't believe I'm actually watching this. Yeah. It was a good a good 1 time thing. I I, yeah. I'll leave it at that. It's a good movie for sure. Good movie for some family fun. John, do you have any picks for us?
[00:50:38] Unknown:
Sure. If you like really technical code based podcast, I would say check out the embedded dot f m podcast. They talk a lot about hardware and making embedded systems, but there's always some really great technical discussions. The episode on MISTRACY is a good 1 to start with if you're into that kind of thing. It's just a really enjoyable podcast to listen to every week, so check out embedded. Fm.
[00:51:00] Unknown:
And, I believe that at least some, if not all of you, are involved in another podcast as well. So I'll, give you guys the chance to describe that briefly for our listeners who might be interested in checking it out. Oh, sure. So I actually coproduce a podcast with a faculty member at the University of Oklahoma every week called the Don't Panic Geocast,
[00:51:19] Unknown:
where we talk about geoscience. So sometimes it's meteorology, sometimes it's geology, even a little bit more geography. We do interviews. So just a very generalized podcast on the planet that we live on, and it's don't panicgeocast.com.
[00:51:34] Unknown:
Well, I really appreciate all of you taking the time out of your day to share your work with Met Pai and your passion for meteorology. It's definitely been a really interesting conversation, and I learned a lot about what goes on behind the scenes, the, daily weather forecast. So, happy you were able to join me, and I hope you guys enjoy the rest of your evenings. Thanks a lot. Happy to be on here. Thank you.
Introduction and Guest Introductions
How the Guests Got Introduced to Python
Unidata and Its Role
Introduction to MetPy
MetPy's Functionality and Goals
Challenges in Meteorological Data Analysis
Research and Experimentation in Meteorology
Machine Learning and Sensor Networks in Meteorology
Data Distribution Challenges
Importance of Data Visualization
Development Practices in MetPy
Innovations in Weather Science
Closing Thoughts