Skip to content

igl.matryoshka

Truncation samplers and post-fit dimension-curve helpers.

igl.matryoshka.sampler.UniformSampler

Uniform sampler: k ~ Uniform{1, …, d_max}.

The simplest Matryoshka sampling strategy, and the default. It gives equal weight to all truncation levels and lets the encoder discover the true intrinsic dimension without prior bias.

igl.matryoshka.sampler.PowerLawSampler

Power-law sampler: P(k) ∝ k^{-α}.

Useful when prior knowledge suggests the effective dimension is small — biases sampling toward lower truncation levels so the encoder is more aggressively forced to compress.

Parameters:

Name Type Description Default
alpha float

Exponent (must be positive). Larger alpha puts more mass on small k. Default 1.0.

1.0

igl.matryoshka.dimension_curve.eval_dimension_curve(module, x_val, y_val, *, loss, source_l2=0.001)

Evaluate the trained module at every truncation level k.

For each k, freshly solves the readout weights via lstsq using only the first k latent dimensions, then computes the validation metric. Returns a {k: metric} mapping. The mapping iterates k = 1, 2, …, d_max in insertion order.

Parameters:

Name Type Description Default
module IGLModule

Trained :class:IGLModule.

required
x_val Tensor

Validation inputs [N, D].

required
y_val Tensor

Validation targets.

required
loss LossStrategy

Loss strategy used to compute the per-k metric.

required
source_l2 float

Tikhonov regularisation forwarded to :func:igl.direct_solve_weights.

0.001

Returns:

Type Description
DimensionCurve

A dict mapping k → curve_score where curve_score is whatever

DimensionCurve
DimensionCurve
DimensionCurve

The curve score is always lower-is-better so :func:detect_elbow

DimensionCurve

can locate the knee.

igl.matryoshka.dimension_curve.detect_elbow(curve, *, ratio=2.0)

Locate the elbow of a dimension/loss curve in log-space.

Operates on log(loss) so a 5× loss reduction has the same log-delta whether it occurs at loss=0.1 or loss=0.001. Returns the largest k whose log-reduction exceeds max_log_delta / ratio.

Parameters:

Name Type Description Default
curve DimensionCurve

{k: loss} mapping (from :func:eval_dimension_curve). Must contain at least one entry.

required
ratio float

A reduction must be at least max_log_delta / ratio to count as substantial. Default 2.0.

2.0

Returns:

Type Description
int

The estimated intrinsic dimension d_eff.

Raises:

Type Description
IGLConfigError

If curve is empty or ratio <= 0.