Concepts
This page explains the moving parts of IGL: what the encoder, the Green's-function kernel, and Matryoshka truncation each contribute, and how they combine into an end-to-end training procedure that reports the effective latent dimension as a byproduct of fitting the task.
The pieces work together as follows: an encoder maps your input to a low-dimensional latent; the Green's-function kernel turns that latent into a structured design matrix; and Variable Projection with Matryoshka truncation jointly fits the encoder and reads off how few dimensions of the design matrix the task actually needs. The rest of the page walks through each piece in math, then covers two extensions — the SPD/Riemannian module and the spectral formulation.
Where each piece lives
| Subpackage | Role |
|---|---|
igl |
Top-level surface: configs, building blocks, sklearn models, metrics. |
igl.core |
Geometry-agnostic primitives: encoder, kernel, solver, trainer. |
igl.kernels |
Operator zoo: Gaussian, Laplacian, Cauchy, Helmholtz, Gabor, Mexican-hat, Yukawa, multiquadric, soft-box. Registration via register_operator. |
igl.matryoshka |
Truncation samplers (uniform, power-law) and the post-fit dimension curve. |
igl.metrics |
d_eff_from_curve, compare_d_eff, elbow detectors. |
igl.models |
sklearn estimators: classifier, regressor, autoencoder. |
igl.nn |
Bare IGLModule for custom training loops. |
igl.spd |
Riemannian extension: AIRM, log-Eig, orthogonality, reconstruction. |
igl.spectral |
Spectral kernels + null-space augmentation: closed-form bases, learned LB, graph Laplacian. |
igl.data |
Synthetic generators: torus, moons, swiss roll, SPD dataset. |
igl.viz |
Optional matplotlib helpers (gated behind the [viz] extra). |
Encoder + Green-kernel design
Given ambient inputs \(x \in \mathbb{R}^D\), IGL trains:
- An encoder \(\Psi_\theta : \mathbb{R}^D \to \mathbb{R}^{d_{\max}}\)
(
igl.MLPEncoderor your ownEncoderProtocol). - A multi-scale Green's-function kernel
(
igl.GreenKernel) with \(R\) learnable anchors \(\mu_r \in \mathbb{R}^{d_{\max}}\) that turns the latent \(z = \Psi_\theta(x)\) into a structured design matrix \(\Phi \in \mathbb{R}^{N \times R}\). - A linear readout \(\hat y = \Phi w + b\) whose weights \(w\) are not
learned by gradient descent — they are refreshed in closed form by
igl.direct_solve_weights(Tikhonov- regularised lstsq).
The kernel is a product over the latent dimensions and a weighted sum over \(K\) scales:
with \(G_{\text{op}}\) one of nine operators registered in
igl.kernels (Gaussian, Laplacian, Cauchy, Helmholtz,
Gabor, Mexican-hat, etc.), \(\gamma_k\) a softmax over scale mixing
weights, and \(m_j \in \{0, 1\}\) a Matryoshka truncation mask.
Variable Projection training
The readout \(w\) is closed-form given the encoder and kernel state, so gradient descent only operates on \(\theta\) and the kernel parameters. Per training step:
- Encode and (optionally) truncate: \(z = \Psi_\theta(x)\), \(z_k = z \odot m\) where \(m\) keeps the first \(k\) dimensions.
- Compute the design matrix \(\Phi_k = \text{GreenKernel}(z_k)\).
- Solve \(w_k^* = \arg\min \|\Phi_k w - y\|^2 + \lambda \|w\|^2\) in closed form (no autograd through the solve).
- Backpropagate the task loss \(\mathcal{L}\bigl(\Phi_k w_k^*, y\bigr)\) through \(\Phi_k\) to update \(\theta\) and the kernel.
This separation — Variable Projection — gives the optimiser a much better-conditioned landscape than joint gradient descent on \((\theta, w)\) would.
Matryoshka truncation and effective dimension
At every training step the trainer samples a truncation level \(k \sim \text{Uniform}\{1, \dots, d_{\max}\}\) and masks the latent beyond \(k\). Because \(k\) varies across batches, the encoder is forced to work at every truncation level — its first dimension carries the most useful information, its second the next-most, and so on. The model is Matryoshka-nested: smaller models are subsets of larger ones.
After training, igl.eval_dimension_curve
sweeps \(k = 1, 2, \dots, d_{\max}\) and reports the curve score (error
rate for classification, MSE for regression / reconstruction,
AIRM² for SPD reconstruction). igl.detect_elbow
locates the elbow on a log scale: the smallest \(k\) beyond which adding
more dimensions stops giving substantial improvements. That \(k\) is the
effective dimension \(d_{\text{eff}}\).
The task-conditioned hierarchy
{ #the-task-conditioned-hierarchy }
The same input data carries different amounts of "useful structure" for different tasks:
- A classifier typically needs one direction to separate the moons.
- A regressor predicting smooth manifold coordinates needs both intrinsic directions.
- A reconstruction model has to capture the full intrinsic geometry.
Empirically, on a wide range of synthetic manifolds and real datasets:
The library makes this measurable: igl.compare_d_eff
takes any number of dimension curves (keyed by task name) and returns a
DimensionComparison with the per-task
effective dimensions and a hierarchy_holds flag that's True iff the
values appear in non-decreasing order. The bundled
examples/synthetic/moons_xor.py demonstrates True out of the box.
The SPD extension
For symmetric positive-definite (SPD) data — covariance matrices, EEG
epochs, clinical correlations — Euclidean MSE is not the right loss
because the SPD cone is a Riemannian manifold, not a flat vector space.
igl.spd ships:
LogEigVectorizer— maps each SPD matrix \(C\) to a flat vector \(\operatorname{vec}(\log C)\) in the log-Euclidean tangent space at the identity.-
AIRMLoss— implementsLossStrategyusing the affine-invariant Riemannian metric\[\text{AIRM}(C, \hat C)^2 = \bigl\lVert \log\bigl(C^{-1/2}\, \hat C\, C^{-1/2}\bigr)\bigr\rVert_F^2.\]
The trainer's lstsq still operates in log-Euclidean tangent space (a
flat vector space where Euclidean lstsq is geometry-respecting), but
the gradient signal is shaped by the manifold distance.
- OrthogonalityPenalty — an
ExtraLoss that drives the pullback metric
\(g = J J^\top\) (where \(J = \partial \Psi / \partial x\)) toward
diagonality. When \(g\) is diagonal at every \(x\), the latent
coordinates are first-order orthogonal — the Stäckel condition — and
geodesics separate cleanly per coordinate.
- IGLReconSPDClassifier — a
two-stage classifier: stage A trains the encoder to reconstruct
log-Eig vectors via AIRM, stage B fits a LogisticRegression on the
frozen Green-kernel design matrix.
Loss strategies and the ExtraLoss seam
The trainer is task-agnostic: every task plugs in via a
LossStrategy (provides target, loss,
metric, curve_score, and a higher_is_better flag) and zero or more
ExtraLoss regularizers (called per batch,
multiplied by weight, added to the task loss before backprop).
Adding a new task — say, contrastive learning on a metric-space output
— is one new LossStrategy; no trainer changes. Adding a new
regularizer — say, a sparsity penalty on the latents — is one new
ExtraLoss. The reference implementation contains gate-sparsity,
Stäckel-pullback, and AIRM losses all sharing these two seams.
Advanced: spectral formulation and the null space
The default GreenKernel is enough for most use cases — you can skip
this section on a first read. It's relevant when you need a basis tied
to a specific differential operator (e.g. a Laplacian with known
boundary conditions) or when your data lives on a learned/graph
manifold that's natural to express via an eigendecomposition.
The local GreenKernel is a product of fixed-shape 1-D kernels at
learnable anchors. The spectral formulation replaces this with the
eigendecomposition of an operator \(L\):
The library ships eight bases in
igl.spectral:
| Basis | Domain | Notes |
|---|---|---|
FourierSineBasis |
\([0, 1]\) | Dirichlet BCs, no null mode. |
FourierCosineBasis |
\([0, 1]\) | Neumann BCs, \(\phi_0 = 1\) is the null mode. |
ChebyshevBasis |
\([-1, 1]\) | Polynomial spectral-element basis. |
LegendreBasis |
\([-1, 1]\) | Orthogonality w.r.t. uniform weight. |
HermiteBasis |
\(\mathbb{R}\) | Gaussian-weighted. |
LaguerreBasis |
\([0, \infty)\) | Exponentially-weighted. |
LearnedLaplacianBasis |
learned manifold | \(k\)-NN graph + sparse eigsh + Nyström extension. |
GraphLaplacianBasis |
user-supplied graph | Symmetric / random-walk / unnormalized. |
For multi-dimensional latents,
SpectralKernel takes either one
basis (uniform across dims) or a sequence (per-dim). For mixtures on
the same dimension,
MultiSpectralBasis wraps
\(K\) bases with a softmax-mixed weighting.
Null-space augmentation
Operators with non-trivial kernels — e.g. the Neumann Laplacian, whose
\(\phi_0 = 1\) has \(\lambda_0 = 0\) — cannot reach those modes via the
Green's expansion. The library exposes
NullSpaceBasis as a kernel-agnostic
add-on: extra design-matrix columns that the lstsq solve fits without
Tikhonov shrinkage, so the null component comes from the data.
Three concrete bases:
ConstantNullSpace— one column of ones (the DC mode).PolynomialNullSpace— constant + per-dimension monomials up to a given degree.CustomNullSpace— wraps an arbitrary callable.
Both the local GreenKernel and the
SpectralKernel accept a
null_space= argument; the lstsq target column count and the
source_weights buffer width adjust automatically.
Learned Laplace–Beltrami spectrum
For data on an unknown manifold, the operator to invert is the
Laplace–Beltrami operator of the learned metric \(g = J^\top J\) where
\(J\) is the encoder's Jacobian.
LearnedLaplacianBasis
estimates the spectrum numerically:
- Build a \(k\)-NN graph on the encoded latents.
- Symmetric normalised Laplacian \(L = I - D^{-1/2} W D^{-1/2}\) with Gaussian edge weights.
- Sparse eigendecomposition via
scipy.sparse.linalg.eigsh. - Nyström extension to evaluate the eigenfunctions on new points.
Because the metric drifts during training, the basis must be
refreshed periodically. The
LearnedLBRefresh hook plugs
into the trainer via the extra_losses= parameter and re-runs the
eigendecomposition every \(N\) batches with the current encoder.