TEEMLEAP - A new TEstbed for Exploring Machine LEarning in Atmospheric Prediction

Project description

Despite steady improvements in numerical weather prediction models, they still exhibit systematic errors caused by simplified representations of physical processes, assumptions about linear behaviour and the challenging task to integrate all available observational data. Weather services around the world now recognise that addressing these deficiencies through the use of artificial intelligence (AI) could revolutionise the discipline in the coming decades. This requires a fundamental shift in thinking that integrates meteorology much more closely with mathematics and computer science. TEEMLEAP will foster this cultural change through a collaboration between scientists from the KIT centers Climate and Environment and MathSEE by establishing an idealised testbed for exploring machine learning in weather forecasting. In contrast to weather services, which naturally focus on improvements of numerical forecast models in their full complexity, TEEMLEAP intends to evaluate the application possibilities and benefits of AI in this testbed along the entire process chain of weather forecasting. The process chain in the testbed will consist of the following elements:

Fig. 1: Process chain of weather forecasting (after Dueben et al. 2021).

  1. Observations: Since observations are spatially and temporally very heterogeneously available, pseudo-radiosonde observations are generated for the testbed from ERA5 reanalysis data provided by European Centre for Medium-Range Weather Forecasts (ECMWF).
  2. Data assimilation: For the assimilation of the pseudo-radiosonde observations, the 3D-Var assimilation system "BaCy" of the German Weather Service (DWD) will be used to generate the initial conditions for the actual numerical weather prediction.
  3. Numerical integration: In order to be able to transfer the testbed approach to an operational weather forecast system in the future, global simulations will be carried out using the ICON model of the DWD.
  4. Post-processing: The simulations are evaluated with respect to user-relevant variables such as wind speed, temperature and precipitation near the ground. Any remaining systematic errors in these quantities will be corrected by statistical methods.

Overall, the testbed will provide answers to the fundamental question of whether we should try to integrate data-driven approaches into physical models in the future, or whether learning architectures that take physical constraints into account are a more promising and efficient approach.



Dueben, P., Modigliani, U., Geer, A., Siemen, S., Pappenberger, F., Bauer, P., Brown, A., Palkovic, M., Raoult, B., Wedi, N. & Baousis, V. (2021). Machine learning at ECMWF: A roadmap for the next 10 years. ECMWF, 878, https://doi.org/10.21957/ge7ckgm