Prior uncertainties
The uncertainty settings should be grouped in a section of the configuration file, with the following structure :
optimize :
emissions :
tracer1 :
category1 :
annual_uncertainty :
spatial_correlation :
cortype :
temporal_correlation :
optimization_interval :
category2 :
...
tracer2 :
...
The annual_uncertainty settings should be in the form of a value and unit (e.g. "1.2 TgCH4", "0.5 PgCO2", ...). The optimization_interval settings should be a string understandable by the pandas.tseries.frequencies.to_offset
function. The spatial_correlation and temporal_correlation should contain a cortype
sub-key, which determines which correlation function should be used. Depending on this, they may require additional subkeys (e.g. corlen
, stretch_ratio
, etc.).
Error-covariance matrix
The error covariance matrix is calculated in three steps:
- The standard are estimated, for each state vector component
- The correlations are determned
- The resulting total uncertainty is computed (by combining the standard deviations and correlations), and the standard deviations are scaled uniformly, to match a user-specified annual uncertainty target.
where is it in the code?
The uncertainty matrix is calculated in the lumia.PriorConstraints.setup
method, which should be called by the main script (see, e.g. run/co2_inversion.py)
1. Standard deviations
The standard deviations (\(\sigma_\mathbf{x}\), i.e. the diagonal elements of the \(\mathbf{B}\) matrix) are prescribed based on the absolute value of the prior emissions.
Consider a control variable \(x\), controlling for the offsets to the prior emissions for a group of contiguous grid cells \(p_i\) to \(p_j\), and from time steps \(t_n\) to \(t_m\). Then the value of \(x\) in a given iteration is:
The prior values of \(x\) (i.e. when \(E = E^{apri}\)) is therefore 0.
The corresponding prior uncertainty, \(\sigma_x\), is defined as:
In other words, the uncertainty on \(x\) is proportional to the sum of the absolute values of the "components" of \(x\). This ensures in particular that the uncertainty doesn't drop to zero if the fluxes to which \(x\) is an offset happen to cancel out each other (e.g. when respiration and photosynthesis are of equal amplitude).
Note that the absolute values of the standard deviations computed in this step has no importance, as they are scaled to match a target annual uncertainty, in step 3 below. The aim of this step 1 is only to define how this annual uncertainty will be distributed, in time and space.
where is it in the code?
The standard deviations are directly calculated in the lumia.PriorConstraints.setup
method. It relies on the lumia.Mapping.coarsen_cat
method to aggregate the model-resolution error estimates into a control-vector-like object.
2. Correlations
Four correlation methods are implemented, and can be selected using the cortype subkey of the spatial_correlation and temporal_correlation sections. Three of them calculate correlation coefficients based on a mathematical function of the spatial or temporal distance between the grid-cells, whereas the last one reads pre-computed correlation-distance relationships from a file.
where is it in the code?
The uncertainty settings are read within the lumia.Mapping.setup_optimization
method. In addition, if you need to implement new options, you might need to edit the lumia.optimizer.categories.Categories
class.
Correlation functions
- Exponential (cortype = e): \(r = e^{- \left(d / L\right)}\)
- Gaussian (cortype = g): \(r_H = e^{- \left(d / L\right)^2}\)
- Hyperbolic (cortype = h): \(r_H = 1 / \left(1 + d / L\right)\)
where \(d\) is the geographical distance between the center of two gridcells, and \(L\) is a user-specified correlation length (optimize.emissions.{tracer}.{catname}.horizontal_correlation.cortype key)
For temporal correlations, only the exponential method is implemented, and cortype should be a string understandable by the pandas.tseries.frequencies.to_offset
function.
The last method (cortype = f) reads the correlation-distance relationship (\(r(d)\)) from a file (see below). In which case, a corrfile argument is needed (and corlen is not used).
where is it in the code?
The correlation functions are all implemented in the lumia.prior.uncertainties
module (SpatialCorrelation
and TemporalCorrelation
classes).
Correlation file
Implementation pending
It is possible to provide correlation-distance relationships through a file. This allows for more complex correlation functions to be implemented. In order to use this feature, set cortype to "f" (or to "file"), and provide the name of the correlation file in a corfile key.
The correlation file should be a netCDF file, containing one correlations group, with the following structure:
group: correlations {
dimensions:
point = 1644 ;
time = 365 ;
variables:
double lat(point) ;
lat:_FillValue = NaN ;
double lon(point) ;
lon:_FillValue = NaN ;
double horizontal_correlation(point, point) ;
horizontal_correlation:_FillValue = NaN ;
double temporal_correlations(time, time) ;
temporal_correlations:_FillValue = NaN ;
int64 time(time) ;
time:units = "days since 2018-01-01 00:00:00" ;
time:calendar = "proleptic_gregorian" ;
int64 point(point) ;
} // group correlations
The horizontal_correlations
and temporal_correlations
variables contain the spatial and temporal matrices. The time
variable is the coordinate of the time
dimension of the temporal_correlations
matrix. The lat
and lon
variables store the spatial positions corresponding to the point
coordinate of the horizontal_correlation
matrix.
where is it in the code?
The matrices are read within the SpatialCorrelation
and TemporalCorrelation
classes of the lumia.prior.uncertainties
module. Specifically, it uses the read_spatial_correlations
and read_temporal_correlations
functions of that module, which doesn't just read the matrices, but also ensure that they are compatible with the current list of coordinates, since there can be nan values in the pre-computed matrices.
3. Uncertainty scaling
The net uncertainty is calculated as:
with \(s\) the vector containing the standard deviations of \(x\), \(S\) the same vector, but reshaped as a (nt, np) matrix (with nt and np respectively the number of optimization time steps and optimization (clusters of) grid cells), and vec the operator reshaping a (nt, np) matrix as a np \(\times\) nt vector. \(\mathbf{C_t}\) and \(\mathbf{C_h}\) are the temporal and spatial correlation matrices calculated in step 2.
This relies on the properties that:
- \(\mathbf{B = C_t \otimes C_h}\)
- the sum of a covariance matrix can be inferred from \(\mathbf{s \cdot Q \cdot s^T}\) (with \(\mathbf{s}\) the vector of standard deviations, and \(\mathbf{Q}\) the correlation matrix
- The equivalence between \((\mathbf{C_t \otimes C_h)} vec(\mathbf{S})\) and \(vec(\mathbf{C_h S C_t^T})\)
Combining these three equations, we obtain \(\mathbf{\Sigma^2} = \mathbf{s \cdot C_t \otimes C_h \cdot s^T = s} \cdot vec(\mathbf{C_h S C_t^T})^T\), with \(\mathbf{\Sigma^2}\) the total variance.
Because the formula above doesn't construct the covariance matrix explicitly, but only its sum, it is (relatively) lightweight and efficient to compute. Once the total variance is computed, the standard deviations are multiplied by a scalar factor \(\gamma\), defined as
with \(\Sigma_{target}\) the target annual uncertainty, \(\Delta\) the length of the simulation (in seconds) and \(\Delta_y\) the lenght of one year.
where is it in the code?
The uncertainty scaling is done directly in the lumia.PriorConstraints.setup
method.