in gravitational wave data analysis
2026-04-24
Strain data for the GW250114 event (Abac et al. 2025).
\[ d_i = n(i \Delta t) + h(i \Delta t; \theta, M) \]
where typically \(1/\Delta t \approx \text{a few kHz}\).
The contributions are:
Typical assumptions about the noise:
Examples of glitches in gravitational wave data (Powell 2018).
In full generality:
\[ \mathcal{L}(d | \theta, M) = p(\text{noise realization } n = d - h(\theta, M) | \theta, M) \]
With Gaussian stationary noise:
\[ \mathcal{L}(d | \theta, M) \propto \exp\left( - \frac{1}{2} \int_{f_\text{min}}^{f_\text{max}} \frac{|\tilde{d}(f) - \tilde{h}(f; \theta, M)|^2}{S_n(f)} \text{d} f \right) \]
\[ \underbrace{\mathcal{L}(d | \theta, M) \pi (\theta | M)}_{\text{assumptions}} = \underbrace{p(\theta | d, M) \mathcal{Z}(d | M)}_{\text{result}} \]
Figure 1: Posterior distribution on the component masses of GW250114. Adapted from Abac et al. (2025).
The signal parameters \(\theta\) include:
Depending on the model, there can be 10–17 parameters.
In standard inference, we evaluate the likelihood \(\mathcal{L}(d | \theta, M)\).
In simulation-based inference, we simulate data distributed according to the likelihood:
\[ d \sim \mathcal{L} ( d | \theta, M) \]
More details: Liang & Wang (2025).
Take a density estimator \(q_\phi (\theta | d)\) with parameters \(\phi\). Train it with a loss function in the form (Dax et al. 2021, Papamakarios & Murray 2016)
\[ L(\phi) = \mathbb{E}_{\theta \sim \pi(\theta)} \mathbb{E}_{d \sim p(d|\theta)} [-\log q_\phi(\theta|d)]\,. \]
If training is successful, the estimator \(q_{\bar{\phi}}(\theta | d)\) at \(\bar{\phi} = \text{argmin}_\phi L(\phi)\) will approximate \(p(\theta | d)\).
Consider the KL divergence from posterior \(p\) to the estimator \(q\):
\[ D_{\text{KL}}( p || q_\phi) = \int p(\theta | d) \left[ \log p(\theta | d) - \log q_\phi (\theta | d) \right] \text{d}\theta \]
We use as a loss the average over all realizations of the data \(d\):
\[ \begin{aligned} L(\phi) &= \mathbb{E}_{d \sim \mathcal{Z}(d)} [D_{\text{KL}}( p || q_\phi)] \\ &= \int p(\theta | d) \mathcal{Z}(d) \left[ - \log q_\phi (\theta | d) \right] \text{d}\theta \text{d}d + C \end{aligned} \]
By Bayes’ theorem, \(p(\theta | d) \mathcal{Z}(d) = \mathcal{L}(d | \theta ) \pi(\theta)\), so this is equal to
\[ \begin{aligned} L(\phi) &= \int \pi(\theta) \mathcal{L}(d|\theta) \left[ - \log q_\phi (\theta | d) \right] \text{d}d \text{d}\theta + C \\ &= \mathbb{E}_{\theta \sim \pi(\theta)} \mathbb{E}_{d \sim \mathcal{L}(d|\theta)} \left[ - \log q_\phi (\theta | d) \right] + C \end{aligned} \]
We can compute the expectation value by
A normalizing flow is an invertible, differentiable map \(T_\phi\) from a simple (e.g. Gaussian) probability distribution \(g(u)\) to an arbitrary one, \(q (\theta)\), so that
\[ u \sim g(u) \iff T(u) = \theta \sim q(\theta) \]
It is often constructed by composing several simple parametric transformations:
\[ T_\phi = T_1 \circ \dots \circ T_N\,. \]
To obtain samples from \(q\):
\[ q(\theta) = g (T_\phi^{-1}(\theta)) |\det J_{T_\phi^{-1}}(\theta)| \]
The example is borrowed from the LAMPE documentation.
Suppose we have a model with three parameters: \(\theta \sim \pi(\theta) = \mathcal{U}([-1, 1]^3)\).
Our data is one observation of
\[ x = \begin{pmatrix} \theta_1 + \theta_2 \theta_3 \\ \theta_1 \theta_2 + \theta_3 \end{pmatrix} + \varepsilon \]
where \(\varepsilon \sim \mathcal{N}^2(\mu = 0; \sigma = 0.05)\).
We can then sample with DYNESTY.

estimator = NPE(*args, **kwargs)
loss = NPELoss(estimator)
loader = JointLoader(prior, simulator)
optimizer = optim.Adam(
estimator.parameters())
step = GDStep(optimizer)
for epoch in range(16):
losses = []
for theta, x in islice(loader, 256):
losses.append(
step(loss(theta, x)))
samples = (estimator.flow(x_obs)
.sample((2**16,))
)
If the likelihood is tractable, we can compute
\[ w_i = \frac{\mathcal{L}(d |\theta_i) \pi(\theta_i)}{q(\theta_i | d)}\,. \]
def effective_samples(samples):
log_like_times_prior = [
log_like(sample) + log_prior(sample)
for sample in samples
]
log_weights = log_like_times_prior - flow_density.detach().numpy()
weights = np.exp(log_weights - np.max(log_weights))
n_eff = weights.sum()**2 / (weights**2).sum()
return weights, n_eff
sbi_samples = samples.detach().numpy()
weights, n_eff = effective_samples(sbi_samples)We have 66k samples but only 29k effective samples.
The effective sample size is
\[ n _{\text{eff}} = \frac{(\sum_i w_i)^2}{\sum_i w_i^2}\,. \]
If \(q(\theta | d) = p(\theta| d)\), this is the number of samples. The sample efficiency is
\[ \epsilon = \frac{n _{\text{eff}}}{N}\in \left[ \frac{1}{N}, 1 \right]\,. \]
Standardization of the arrival direction (Dax et al. 2023).
It includes variation of the noise spectral density and Group-Equivariant NPE (Dax et al. 2023):
\[ \begin{aligned} L_{\text{GNPE}} = &\mathbb{E}_{\theta \sim \pi(\theta)} \\ &\mathbb{E}_{S_{n} \sim p(S_{n})} \\ &\mathbb{E}_{d \sim p(d|\theta)} \\ &\mathbb{E}_{ \delta t_{I} \sim \kappa (\delta t_{I})} [-\log q(\theta|d_{-t_{I}(\theta)-\delta t_{I}}, S_{n}, t_{I}(\theta) + \delta t_{I})]\,. \end{aligned} \]
Other approaches exist, such as heterodyning (Roulet et al. 2026).
Sample efficiency as a function of SNR (Santoliquido et al. 2025).
Let’s chat!