File size: 6,629 Bytes
ec56f9e 31156e0 ec56f9e 31156e0 ec56f9e 31156e0 ec56f9e 31156e0 ec56f9e 31156e0 ec56f9e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
license: other
license_name: canada-ogl-lgo
license_link: https://open.canada.ca/en/open-government-licence-canada
pipeline_tag: graph-ml
metrics:
- mse
tags:
- geml
- graphcast
- weather
---
# GEML 1.0
Here, we introduce version 1.0 of the Global Environmental eMuLator (GEML), a data-driven model compatible with the ¼°, 13-level version of [GraphCast](https://github.com/google-deepmind/graphcast) (Lam et al. 2023, [1]). This model was trained by the Meteorological Research Divison (MRD) and Canadian Centre for Meteorological and Environmental Prediction (CCMEP), divisions of [Environment and Climate Change Canada](https://www.canada.ca/en/environment-climate-change.html).
This model was trained "from scratch," using [training code](https://github.com/csubich/graphcast) developed for other research projects. In inference (forecast production), this model is fully compatible with the model code in DeepMind's GraphCast repository.
## License
These model weights are available under the [Canada Open Government license](https://open.canada.ca/en/open-government-licence-canada), which permits derivative works and commercial use with attribution.
## Variables
The model predicts the following meteorological variables on a ¼° latitude/longitude grid (with poles):
* At elevation: tempearture, geopotential, u (zonal) component of wind, v (meridional) component of wind, vertical velocity, specific humidity
* At surface: temperature (2m), u component of wind (10m), v component of wind (10m), mean sea level pressure, 6hr-accumulated precipitation[†]
[†] — This variable is incorrect. Please see the 'erratum' section.
The atmospheric variables are predicted at the 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa pressure levels. For points that lie below the surface, extrapolated values are given (i.e. they are not masked).
The model timestep is 6 hours, and the model takes two time levels as input. That is, to produce a forecast valid at 12Z, the model needs input data at 6Z and 0Z.
### Input data
All forecast variables except accumulated precipitation and taken as input values. The model also requires the surface geopotential, land-sea mask, and top-of-atmosphere incident solar radiation (accumulated over 1h) as input values.
The surface geopotential and land-sea mask are static variables. The incident solar radiation must be provded at both the input time-levels and the output time-level. This value can be calculated, and both the DeepMind GraphCast repository and the training code repository contain an incident solar radiation model.
## Model training
### Datasets
The model was pre-trained on ERA5 data (calendar years 1979–2015, inclusive), following the training configuration in Lam et al. Subsequently, it was fine-tuned on the "HRES initial conditions" dataset for calendar years 2016–2021.
Both of these datasets are available from the [WeatherBench 2 project](https://weatherbench2.readthedocs.io/en/latest/data-guide.html) (Rasp et al. 2024, [2]).
#### Erratum
Although the HRES dataset contains an accumulated precipitation variable, [this value is always zero](https://github.com/google-research/weatherbench2/issues/159), so during the fine-tuning the process the model was trained towards a prediction of zero precipitation.
Since precpitation is the one predicted variable that is not given as an input, we do not think that this error will have any impact on the prediction of the other output variables.
### Loss function
The model was trained with the latitude and level-weighted mean squared error loss function, equation (19) in the supplementary material of Lam et al:
$$ \mathrm{MSE} = \underbrace{\sum_{\tau=1}^{N_t} \frac{1}{N_t}}_{\text{Lead time}}
\underbrace{\sum_{i,j} \frac{dA(i,j)}{4\pi}}_{\text{Space}}
\underbrace{\sum_{k=0}^{N_k} w(k)}_{\text{Level}} \underbrace{\sum_{\mathrm{var}} \omega_\mathrm{var} }_{\text{variable}}
\frac{ (\hat{x}_\mathrm{var}(i,j,k;\tau) - x_\mathrm{var}(i,j,k;\tau))^2}{\sigma^2_{\Delta\mathrm{var}}(k)}
$$
### Normalizations
The GraphCast architecture takes normalized input data (z-scores) and outputs a forecast difference normalized by the standard deviation of 6-hour differences in a climatological dataset.
For these fields, we used the same normalization factors as the DeepMind GraphCast model, computed over the ERA5 dataset. Since the HRES data is very close to the ERA5 data, we re-used the ERA5 normalization factors without change during the model fine-tuning.
### Training curriculum
The pre-training step closely followed the training curriculum of Lam et al.:
#### Pre-training
Stage | Batches | Forecast Length | Learning Rate
|:-:|:-:|:-:|:-:|
1 (Warmup) | 1000 | 1 step (6 h) | \\(0 \to 10^{-3}\\) (linear)
2 | 299000 | 1 step (6 h) | \\(10^{-3} \to 3 \cdot 10^{-7}\\) (cosine)
3 | 1000 each | 2–12 steps (12–72 h) | \\(3 \cdot 10^{-7}\\) (constant)
#### Fune-tuning
Stage | Batches | Forecast Length | Learning Rate
|:-:|:-:|:-:|:-:|
Fine tune | 5000 | 12 steps (72 h) | \\(3 \cdot 10^{-7}\\) (constant)
In both cases, the batch size was 32 forecasts, and the training data was sampled with replacement. On average, each training forecast (initialization date) was seen about 184 times in the pre-training stage and 4.5 times in the fine-tuning stage.
#### Optimizer
As in Lam et al., the training used the AdamW optimizer (Lohchilov and Hutter 2019, [3]), with momentum parameters \\(\beta_1 = 0.9\\) and \\(\beta_2 = 0.95\\) and weight decay of \\(0.1\\) on the weight matrices. Unlike Lam et al., we did not need to impose gradient clipping for stability.
## Validation
Validation data/plots to come
## Model weights
The fully-trained model weights are available as [geml_1.0.ckpt](geml_1.0.ckpt) in this repository.
For research purposes, we will also shortly update this repostiory to include intermediate checkpoints from the pretraining and fine-tuning process.
## References
[1]: R. Lam et al., “Learning skillful medium-range global weather forecasting,” Science, vol. 382, no. 6677, pp. 1416–1421, Dec. 2023, doi: 10.1126/science.adi2336.
[2]: S. Rasp et al., “WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models,” Journal of Advances in Modeling Earth Systems, vol. 16, no. 6, p. e2023MS004019, 2024, doi: 10.1029/2023MS004019.
[3]: I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” Jan. 04, 2019, arXiv: arXiv:1711.05101. doi: 10.48550/arXiv.1711.05101.
|