Spaces:
Running
Running
## Modeling {#sec-sec_2_4_Modeling} | |
In this section, the fourth main step of \gls{cnmc}, i.e., modeling, is elaborated. | |
The data and workflow is described in figure @fig-fig_42 . | |
It comprises two main sub-tasks, which are modeling the \glsfirst{cpevol} and modeling the transition properties tensors $\boldsymbol Q / \boldsymbol T$. | |
The settings are as usually defined in *settings.py* and the extracted attributes are distributed to the sub-tasks. | |
Modeling the \gls{cpevol} and the $\boldsymbol Q/ \boldsymbol T$ tensors can be executed separately from each other. | |
If the output of one of the two modeling sub-steps is at hand, \gls{cnmc} is not forced to recalculate both modeling sub-steps. | |
Since the tracked states are used as training data to run the modeling they are prerequisites for both modeling parts. | |
The modeling of the centroid position shall be explained in the upcoming subsection [-@sec-subsec_2_4_1_CPE], followed by the explanation of the transition properties in subsection [-@sec-subsec_2_4_2_QT]. | |
A comparison between this \gls{cnmc} and the *first CNMc* version is provided at the end of the respective subsections. | |
The results of both modeling steps can be found in section | |
[-@sec-sec_3_2_MOD_CPE] and [-@sec-sec_3_3_SVD_NMF] | |
{#fig-fig_42} | |
### Modeling the centroid position evolution {#sec-subsec_2_4_1_CPE} | |
In this subsection, the modeling of the \gls{cpevol} is described. | |
The objective is to find a surrogate model, which returns all $K$ centroid positions for an unseen $\beta_{unseen}$. | |
The training data for this are the tracked centroids from the previous step, which is described in section [-@sec-sec_2_3_Tracking]. | |
To explain the modeling of the *CPE*, figure @fig-fig_43 shall be inspected. | |
The model parameter values which shall be used to train the model $\vec{\beta_{tr}}$ are used for generating a so-called candidate library matrix $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$. The candidate library matrix $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is obtained making use of a function of *pySindy* [@Silva2020; @Kaptanoglu2022; @Brunton2016]. | |
In [@Brunton2016] the term $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is explained well. However, in brief terms, it allows the construction of a matrix, which comprises the output of defined functions. | |
These functions could be, e.g., a linear, polynomial, trigonometrical or any other non-linear function. Made-up functions that include logical conditions can also be applied. \newline | |
{#fig-fig_43} | |
Since, the goal is not to explain, how to operate *pySindy* [@Brunton2016], the curious reader is referred to the *pySindy* very extensive online documentation and [@Silva2020; @Kaptanoglu2022]. | |
Nevertheless, to understand $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ equation @eq-eq_20 shall be considered. | |
In this example, 3 different functions, denoted as $f_i$ in the first row, are employed. | |
The remaining rows are the output for the chosen $f_i$. | |
Furthermore, $n$ is the number of samples or the size of $\vec{\beta_{tr} }$, i.e., $n_{\beta,tr}$ and $m$ denotes the number of the features, i.e., the number of the functions $f_i$. \newline | |
$$ | |
\begin{equation} | |
\boldsymbol{\Theta_{exampl(n \times m )}}(\,\vec{\beta_{tr}}) = | |
\renewcommand\arraycolsep{10pt} | |
\begin{bmatrix} | |
f_1 = \beta & f_2 = \beta^2 & f_2 = cos(\beta)^2 - exp\,\left(\dfrac{\beta}{-0.856} \right) \\[1.5em] | |
1 & 1^2 & cos(1)^2 - exp\,\left(\dfrac{1}{-0.856} \right) \\[1.5em] | |
2 & 2^2 & cos(2)^2 - exp\,\left(\dfrac{2}{-0.856} \right) \\[1.5em] | |
\end{bmatrix} | |
\label{eq_20} | |
\end{equation}$$ {#eq-eq_20} | |
The actual candidate library matrix $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ incorporates a quadratic polynomial, the inverse $\frac{1}{\beta}$, the exponential $exp(\beta)$ and 3 frequencies of cos and sin, i.e., $cos(\vec{\beta}_{freq}), \ sin(\vec{\beta}_{freq})$, where $\vec{\beta}_{freq} = [1, \, 2,\, 3]$. | |
There are much more complex $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ available in \gls{cnmc}, which can be selected if desired. | |
Nonetheless, the default $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is chosen as described above. | |
Once $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is the generated, the system of equations @eq-eq_21 is solved. | |
Note, this is very similar to solving the well-known $\boldsymbol A \, \vec{x} = \vec{y}$ system of equations. | |
The difference is that the vectors $\vec{x}, \, \vec{y}$ can be vectors in the case of @eq-eq_21 as well, but in general, they are the matrices $\boldsymbol{X} ,\, \boldsymbol Y$, respectively. The solution to the matrix $\boldsymbol{X}$ is the desired output. | |
It contains the coefficients which assign importance to the used functions $f_i$. | |
The matrix $\boldsymbol Y$ contains the targets or the known output for the chosen functions $f_i$. | |
Comparing $\boldsymbol A$ and $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ mathematically, no difference exists.\newline | |
$$ | |
\begin{equation} | |
\boldsymbol{\Theta}\,(\vec{\beta_{tr}}) \: \boldsymbol X = \boldsymbol Y | |
\label{eq_21} | |
\end{equation}$$ {#eq-eq_21} | |
With staying in the *pySindy* environment, the system of equations @eq-eq_21 is solved by means of the optimizer *SR3*, which is implemented in *pySindy*. | |
Details and some advantages of the *SR3* optimizer can be found in [@SR3]. Nevertheless, two main points shall be stated. | |
It is highly unlikely that the $\boldsymbol{\Theta}\,(\vec{\beta_{tr}}),\: \boldsymbol X,\, \boldsymbol Y$ is going to lead to a well-posed problem, i.e., the number of equations are equal to the number of unknowns and having a unique solution. | |
In most cases the configuration will be ill-posed, i.e., the number of equations is not equal to the number of unknowns. | |
In the latter case, two scenarios are possible, the configuration could result in an over-or under-determined system of equations.\newline | |
For an over-determined system, there are more equations than unknowns. | |
Thus, generally, no outcome that satisfies equation @eq-eq_21 exists. | |
In order to find a representation that comes close to a solution, an error metric is defined as the objective function for optimization. | |
There are a lot of error metrics or norms, however, some commonly used [@Brunton2019] are given in equations @eq-eq_22 to @eq-eq_24, where $f(x_k)$ are true values of a function and $y_k$ are their corresponding predictions. | |
The under-determined system has more unknown variables than equations, thus infinitely many solutions exist. | |
To find one prominent solution, again, optimization is performed. | |
Note, for practical application penalization or regularization parameter are exploited as additional constraints within the definition of the optimization problem. | |
For more about over- and under-determined systems as well as for deploying optimization for finding a satisfying result the reader is referred to [@Brunton2019].\newline | |
$$ | |
\begin{equation} | |
E_{\infty} = \max_{1<k<n} |f(x_k) -y_k | \quad \text{Maximum Error} \;(l_{\infty}) | |
\label{eq_22} | |
\end{equation}$$ {#eq-eq_22} | |
$$ | |
\begin{equation} | |
E_{1} = \frac{1}{n} \sum_{k=1}^{n} |f(x_k) -y_k | \quad \text{Mean Absolute Error} \;(l_{1}) | |
\label{eq_23} | |
\end{equation}$$ {#eq-eq_23} | |
$$ | |
\begin{equation} | |
E_{2} = \sqrt{\frac{1}{n} \sum_{k=1}^{n} |f(x_k) -y_k |^2 } \quad \text{Least-squares Error} \;(l_{2}) | |
\label{eq_24} | |
\end{equation}$$ {#eq-eq_24} | |
The aim for modeling *CPE* is to receive a regression model, which is sparse, i.e., it is described through a small number of functions $f_i$. | |
For this to work, the coefficient matrix $\boldsymbol X$ must be sparse, i.e., most of its entries are zero. | |
Consequently, most of the used functions $f_i$ would be inactive and only a few $f_i$ are actively applied to capture the *CPE* behavior. | |
The $l_1$ norm as defined in @eq-eq_23 and the $l_0$ are metrics which promotes sparsity. | |
In simpler terms, they are leveraged to find only a few important and active functions $f_i$. | |
The $l_2$ norm as defined in @eq-eq_24 is known for its opposite effect, i.e. to assign importance to a high number of $f_i$. | |
The *SR3* optimizer is a sparsity promoting optimizer, which deploys $l_0$ and $l_1$ regularization.\newline | |
The second point which shall be mentioned about the *SR3* optimizer is that it can cope with over-and under-determined systems and solves them without any additional input. | |
One important note regarding the use of *pySindy* is that *pySindy* in this thesis is not used as it is commonly. For modeling the *CPE* only the modules for generating the candidate library matrix $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ and the *SR3* optimizer are utilized.\newline | |
Going back to the data and workflow in figure @fig-fig_43, the candidate library matrix $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is generated. | |
Furthermore, it also has been explained how it is passed to *pySindy* and how *SR3* is used to find a solution. It can be observed that $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is also passed to a *Linear* and *Elastic net* block. The *Linear* block is used to solve the system of equations @eq-eq_21 through linear interpolation. | |
The *Elastic net* solves the same system of equations with the elastic net approach. In this the optimization is penalized with an $l_1$ and $l_2$ norm. | |
In other words, it combines the Lasso [@Lasso; @Brunton2019] and Ridge [@Brunton2019], regression respectively. | |
The linear and elastic net solvers are invoked from the *Scikit-learn* [@scikit-learn] library.\newline | |
The next step is not depicted in figure @fig-fig_43 . | |
Namely, the linear regression model is built with the full data. For *pySindy* and the elastic net, the models are trained with $90 \%$ of the training data and the remaining $10 \%$ are used to test or validate the model. | |
For *pySindy* $20$ different models with the linear distributed thresholds starting from $0.1$ and ending at $2.0$ are generated. | |
The model which has the least mean absolute error @eq-eq_23 will be selected as the *pySindy* model. | |
The mean absolute error of the linear, elastic net and the selected *pySindy* will be compared against each other. | |
The one regression model which has the lowest mean absolute error is selected as the final model.\newline | |
The described process is executed multiple times. | |
In 3-dimensions the location of a centroid is given as the coordinates of the 3 axes. | |
Since the *CPE* across the 3 different axes can deviate significantly, capturing the entire behavior in one model would require a complex model. | |
A complex model, however, is not sparse anymore. | |
Thus, a regression model for each of the $K$ labels and for each of the 3 axes is required. | |
In total $3 \, K$ regression models are generated. \newline | |
Finally, *first CNMc* and \gls{cnmc} shall be compared. | |
First, in *first CNMc* only *pySindy* with a different built-in optimizer. | |
Second, the modeling *CPE* was specifically designed for the Lorenz system @eq-eq_6_Lorenz . | |
Third, *first CNMc* entirely relies on *pySindy*, no linear and elastic models are calculated and used for comparison. | |
Fourth, the way *first CNMc* would perform prediction, was by transforming the active $f_i$ with their coefficients to equations such that *SymPy* could be applied. | |
The disadvantage is that if $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ is changed, modifications for *SymPy* are necessary. | |
Also, $\boldsymbol{\Theta}\,(\vec{\beta_{tr}})$ can be used for arbitrary defined functions $f_i$, *SymPy* functions, however, are restricted to some predefined functions. | |
In \gls{cnmc} it is also possible to get the active $f_i$ as equations. | |
However, the prediction is obtained with a regular matrix-matrix multiplication as given in equation @eq-eq_25 . The variables are denoted as the predicted outcome $\boldsymbol{\tilde{Y}}$, the testing data for which the prediction is desired $\boldsymbol{\Theta_s}$ and the coefficient matrix $\boldsymbol X$ from equation @eq-eq_21 . | |
$$ | |
\begin{equation} | |
\boldsymbol{\tilde{Y}} = \boldsymbol{\Theta_s} \, \boldsymbol X | |
\label{eq_25} | |
\end{equation}$$ {#eq-eq_25} | |
With leveraging equation @eq-eq_25 the limitations imposed through *SymPy* are removed. | |
{{< include 7_QT.qmd >}} | |