Methods
gsax implements two complementary approaches to variance-based global sensitivity analysis (GSA). Both methods decompose the variance of a model's output into contributions attributable to individual input parameters and their interactions, enabling practitioners to identify which parameters drive model behaviour and which are effectively unidentifiable from available measurements.
Background: Variance-Based Sensitivity Analysis
Why Global Sensitivity Analysis?
Unlike local sensitivity methods (e.g. partial derivatives at a nominal point), global sensitivity analysis explores the entire parameter space simultaneously. This is essential for non-linear models where parameter interactions and non-monotonic responses mean that local gradients can be misleading. GSA quantifies the contribution of each parameter to output uncertainty and reveals which parameters have the strongest influence on observable outputs. Parameters with negligible sensitivity indices are effectively unidentifiable from available measurements, while high sensitivity indices indicate parameters that are likely to be well-constrained by data.
In practice, GSA serves multiple roles:
- Parameter identifiability: Ranking parameters by their sensitivity indices before fitting reveals which parameters can realistically be estimated. Parameters with near-zero indices across all outputs are practically unidentifiable and may need to be fixed rather than estimated.
- Experimental design: For time-series outputs, the evolution of sensitivity indices over time can guide the selection of optimal measurement times, ensuring data is collected when outputs are most sensitive to the parameters of interest.
- Model simplification: If interaction indices are negligible, the model response is approximately additive, and simpler surrogate models may suffice.
The Hoeffding–Sobol' Decomposition
The theoretical foundation of variance-based GSA is the Hoeffding (ANOVA) decomposition. Any square-integrable function
where
where
Sobol' Sensitivity Indices
Dividing each variance component by
First-order index — the fraction of output variance explained by parameter
Second-order index — the additional variance from the pairwise interaction between
Total-order index — the total contribution of parameter
where
Sobol' Indices via Saltelli Sampling
gsax uses the Saltelli sampling scheme (Saltelli 2002, 2010), which constructs a structured set of quasi-random sample matrices so that first-order (
The Pick-Freeze Sampling Scheme
The method generates two independent scipy.stats.qmc.Sobol). For each parameter
The total computational cost is
Estimators
gsax implements the following estimators:
First-order — Saltelli (2010):
Total-order — Jansen (1999):
Variance normalisation: All estimators normalise by a pooled output variance computed over the concatenation of
How to use it
gsax.sample()generates the Sobol' quasi-random sequence and builds the Saltelli cross-matrices. Duplicate rows are removed so your model only evaluates unique input points.- You evaluate your model on
sampling_result.samples. gsax.analyze()reconstructs the Saltelli layout internally and computes all indices in a singlejit(vmap(...))pass.
Optional: gsax.analyze(..., prenormalize=True) applies SALib-style output standardization once per output slice over the sample axis before computing the Sobol estimators. This changes the point-estimate path to align more closely with SALib. When bootstrapping, ci_method="quantile" reports percentile bootstrap lower/upper bounds, while ci_method="gaussian" reports symmetric gaussian lower/upper bounds from the bootstrap standard deviation. gsax still returns endpoint arrays rather than SALib's symmetric confidence widths.
Index summary
| Index | Meaning |
|---|---|
| Fraction of output variance due to parameter | |
| Fraction of output variance due to parameter | |
| Fraction of output variance due to the pairwise interaction between |
When to use Sobol': You can afford the structured Saltelli sampling design (
RS-HDMR (Random Sampling High-Dimensional Model Representation)
RS-HDMR takes a fundamentally different approach: instead of requiring a structured sampling design, it constructs a B-spline surrogate from any set of input–output pairs and then derives sensitivity indices analytically from the surrogate's variance decomposition.
Theoretical Background
High-Dimensional Model Representation (HDMR) exploits the observation that, for many practical problems, only the low-order interactions among input variables significantly influence the model output. The RS-HDMR variant constructs component functions from randomly sampled input–output data, rather than requiring structured grids. The model is decomposed as:
where each component function is expanded in a B-spline basis and fitted via backfitting with Tikhonov regularisation.
ANCOVA Decomposition
Unlike the classical Sobol' decomposition which assumes independent inputs, RS-HDMR uses an ANCOVA (analysis of covariance) decomposition that separates each component's variance into:
- Structural variance (
): the contribution that would remain if all inputs were independent — analogous to the classical Sobol' index. - Correlative variance (
): the additional contribution arising from correlations between inputs.
This distinction is important in practice because many real-world models have correlated inputs (e.g. coupled physical parameters), and conflating structural and correlative contributions can lead to misleading sensitivity rankings.
How to use it
- You provide any set of
pairs — no structured sampling design required. gsax.analyze_hdmr()normalises inputs to, optionally standardises outputs once over the sample axis via prenormalize=True, builds B-spline basis matrices, and fits component functions via backfitting with Tikhonov regularisation.- The ANCOVA decomposition splits each component's variance into structural (
) and correlative ( ) parts. Total-order indices ( ) sum contributions from all terms involving a given parameter.
When HDMR prenormalization is enabled, the fitted surrogate is trained on the standardized outputs but emulate_hdmr() maps predictions back to the original output scale before returning them.
Index summary
| Index | Meaning |
|---|---|
| Structural (uncorrelated) variance contribution of term | |
| Correlative variance contribution of term | |
| Total contribution per term: | |
| Total-order per parameter: sum of |
When to use HDMR:
- Model evaluations are expensive and you want to reuse existing runs (no structured design needed)
- Inputs may be correlated (Sobol' assumes independent inputs)
- You need a surrogate/emulator for fast prediction at new inputs (
emulate_hdmr)
Choosing Between Them
| Consideration | Sobol' | HDMR |
|---|---|---|
| Sampling requirement | Structured Saltelli design, | Any |
| Input independence | Assumed | Handled via ANCOVA decomposition |
| Surrogate/emulator | No | Yes (emulate_hdmr) |
| Accuracy | Exact (given enough samples) | Depends on B-spline fit quality |
| Second-order indices | Direct estimation from cross-matrices | From interaction component functions |
| Interaction detection | Via | Via explicit interaction component functions |
Output Shapes
Both methods support scalar, multi-output, and time-series outputs. The shape of Y determines the shape of all returned index arrays:
| Y shape | S1 / ST shape | S2 shape |
|---|---|---|
(N,) | (D,) | (D, D) |
(N, K) | (K, D) | (K, D, D) |
(N, T, K) | (T, K, D) | (T, K, D, D) |
D is always the last axis. Confidence interval arrays (when using bootstrap) prepend a leading dimension of 2 for [lower, upper].
Time-series outputs are particularly useful for dynamic models, where the evolution of sensitivity indices over time can reveal which parameters dominate at different stages of a process — for example, a parameter that is highly influential early in a batch but negligible later.
Data Cleaning
gsax.analyze() automatically drops sample groups that contain non-finite values (NaN, Inf). The Saltelli layout requires groups of rows to stay together, so if any row in a group is non-finite, the entire group is removed. A message is printed when this happens. The nan_counts field on the result reports how many NaN values remain in the computed indices.
References
- Sobol', I.M. (2001). Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and Computers in Simulation, 55(1-3), 271-280.
- Saltelli, A. (2002). Making best use of model evaluations to compute sensitivity indices. Computer Physics Communications, 145(2), 280-297.
- Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., & Tarantola, S. (2010). Variance based sensitivity analysis of model output. Computer Physics Communications, 181(2), 259-270.
- Jansen, M.J.W. (1999). Analysis of variance designs for model output. Computer Physics Communications, 117(1-2), 35-43.
- Li, G., Rabitz, H., Yelvington, P.E., Oluwole, O.O., Bacon, F., Kolb, C.E., & Schoendorf, J. (2010). Global sensitivity analysis for systems with independent and/or correlated inputs. Journal of Physical Chemistry A, 114(19), 6022-6032.
- Rabitz, H. & Alis, O. (1999). General foundations of high-dimensional model representations. Journal of Mathematical Chemistry, 25(2-3), 197-233.