Neural extended Kalman filters for learning and predicting dynamics of structural systems

What problem does this paper solve?
What they did
- Key equations
Experiments and results
What this paper does not solve
Significance for PEML and digital twinning
References

What problem does this paper solve?

Data-driven modeling for Digital Twins requires capturing the temporal evolution of structural states to enable future response prediction. Standard deep learning approaches, such as Deep Markov Models (DMMs) or dynamical Variational Autoencoders (VAEs), often fail to learn accurate transition dynamics because their training objective (Evidence Lower Bound) relies heavily on a separate inference network (encoder). This inference network can "cheat" by reconstructing observations from the immediate data without enforcing a stable, long-term trajectory in the generative transition model, rendering the resulting model poor for forecasting when observation data is unavailable. Additionally, pure black-box models lack the probabilistic rigor required for uncertainty quantification in noisy engineering environments.

What they did

The authors propose the Neural Extended Kalman Filter (Neural EKF), a learnable state-space model that replaces the explicit inference network of a VAE with the differentiable equations of an Extended Kalman Filter.

The framework assumes the system dynamics (transition) and measurement processes (observation) are governed by unknown nonlinear functions, which are parameterized by neural networks. Unlike a standard VAE, which learns an approximate posterior distribution via a separate encoder network, the Neural EKF calculates the posterior analytically (under Gaussian assumptions) using the standard EKF predict-update recursion.

The training process is end-to-end: gradients are backpropagated through the EKF steps to update the weights of the transition and observation networks, as well as the process and measurement noise covariance matrices ( $Q$ and $R$ ). To ensure the learned transition model is robust for long-term simulation (not just one-step-ahead correction), they employ replay overshooting: a regularization technique that forces the transition network to predict multiple steps into the future recursively without using observational updates, adding this prediction error to the loss function.

Key equations

\mathbf{z}_t = f_{\theta_t}(\mathbf{z}_{t-1}, \mathbf{u}_{t-1}) + \mathbf{w}_t, \quad \mathbf{x}_t = g_{\theta_o}(\mathbf{z}_t) + \mathbf{v}_t

This defines the learnable state-space model where the transition $f$ and observation $g$ are neural networks parameterized by $\theta$ , replacing the fixed physical matrices of a standard Kalman Filter.

\mathcal{L}(\theta; \mathbf{x}) = \sum_{t=1}^T \left( \log p_{\theta_o}(\mathbf{x}_t|\mathbf{z}_t) - \text{KL}[q(\mathbf{z}_t|\mathbf{x}_{1:t}) || p_{\theta_t}(\mathbf{z}_t|\mathbf{z}_{t-1})] \right)

This is the variational objective (ELBO). Crucially, the posterior $q(\mathbf{z}_t|\mathbf{x}_{1:t})$ is not from a neural network encoder, but is the Gaussian distribution computed directly by the EKF update step.

\mathcal{L}_{total} = \alpha \mathcal{L}_{ELBO} + (1-\alpha) \mathcal{L}_{overshoot}

The total loss combines the filtering objective with an "overshooting" loss, which penalizes errors in open-loop predictions (running the transition model freely) to ensure the dynamics are stable for forecasting.

Algorithm Pipeline:

Initialization: Define NN architectures for $f_\theta$ and $g_\theta$ . Initialize learnable noise covariances $Q$ (diagonal) and $R$ (diagonal).
Forward Pass (Filter): For a sequence of data, run the EKF recursion.
- Predict: Use $f_\theta$ to estimate the prior mean and covariance. Compute Jacobian $A_{t-1}$ of $f_\theta$ via autograd.
- Update: Compute Kalman Gain using $g_\theta$ and its Jacobian $C_t$ . Update posterior mean and covariance.
Smoothing (Optional): Run Rauch–Tung–Striebel (RTS) smoothing backward to refine estimates (used in training).
Overshooting: Randomly select starting points in the batch and simulate $f_\theta$ forward for a fixed horizon without correcting with $x_t$ . Compute loss against actual data.
Optimization: Maximize the modified ELBO using stochastic gradient ascent (e.g., Adam).

Experiments and results

The authors validated the method on one numerical benchmark and two real-world structural datasets, comparing performance primarily against Deep Markov Models (DMM).

Duffing Oscillator (Simulated): A 2-DOF nonlinear system under random excitation.
- The Neural EKF correctly reconstructed the phase portraits (latent dynamics) and predicted response with high accuracy.
- The baseline DMM failed to capture the phase portrait topology, learning a distorted latent space that was poor for prediction.
- Robustness: The method maintained low RMSE even when training data was corrupted with significant noise (up to 0.1 variance).
Seismic Building Response (Real-world): 6-story hotel in San Bernardino (CSMIP data).
- Input: Ground motion; Output: Sensor responses on floors 1, 3, and roof.
- Result: Trained on historical earthquakes, the model accurately predicted the response to the unseen 2009 San Bernardino earthquake.
Wind Turbine Blade (Experimental): Small-scale blade tested in a climate chamber with healthy and damaged states (added mass, cracks).
- Prediction: Achieved an average RMSE of 1.08 on healthy test data (vs. 3.82 for DMM).
- Anomaly Detection: The model trained on healthy data was used to predict responses for damaged states. The resulting prediction errors (RMSE) clustered distinctly according to damage type and severity (e.g., distinguishing between mass changes and cracks), demonstrating utility for unsupervised SHM.

Dataset	Metric	Neural EKF	DMM (Baseline)
Duffing (Noise=0.001)	RMSE ( $x_1$ )	0.049	significantly higher (visual)
Wind Blade (Healthy)	RMSE (avg)	1.08	3.82
Wind Blade (Cracked)	RMSE clustering	Distinct clusters	N/A

What this paper does not solve

Explicit Physics Extraction: While the model learns a latent state space that produces correct outputs, the latent states themselves are not guaranteed to align with physical coordinates (displacement/velocity) without additional constraints or rotation. They are "black-box" states.
Handling Strong Discontinuities: The inference relies on EKF linearization (Jacobians). Systems with non-smooth dynamics (impacts, friction) might degrade performance or require switching to Unscented or Particle Filter variants.
Online Adaptation: The paper focuses on offline training. While the EKF runs online, the neural network weights ( $f_\theta, g_\theta$ ) are fixed after training. The paper notes this as a trade-off: the model must be retrained offline if the system sustains damage.
State Dimension Selection: The dimension of the latent vector $\mathbf{z}$ is a hyperparameter (set to $2 \times$ DOFs here). There is no automated mechanism proposed to determine the optimal model order for unknown systems.

Significance for PEML and digital twinning

This paper provides a robust alternative to standard recurrent neural networks for structural dynamics. By embedding the Extended Kalman Filter as a differentiable layer, it imposes a strong inductive bias that suits physical systems: the separation of process noise (system uncertainty) and measurement noise (sensor error).

Contribution: It demonstrates that replacing the "encoder" of a VAE with an EKF significantly improves the long-term predictive capability of the model, making it viable for Digital Twin forecasting rather than just data compression.
Open Questions: The natural next step is to incorporate partial physical knowledge (e.g., known mass or stiffness matrices) into the neural transition function, moving from "Physics-Enhanced" via architecture (EKF) to "Physics-Informed" via constraints (Hamiltonian/Lagrangian mechanics).

References

(Liu et al., 2024) Liu, W., Lai, Z., Bacsa, K., & Chatzi, E. (2024). Neural extended Kalman filters for learning and predicting dynamics of structural systems. Structural Health Monitoring, 23(2), 1037–1052.

Liu, W., Lai, Z., Bacsa, K., & Chatzi, E. (2024). Neural extended Kalman filters for learning and predicting dynamics of structural systems. Structural Health Monitoring, 23(2), 1037–1052. https://doi.org/10.1177/14759217231179912