Regress#

Overview#

The regress module performs one-dimensional regression with total-variation (TV) regularisation on the second derivative [1]. Because TV is measured with the measure norm (denoted \(\|\cdot\|_{\mathcal{M}}\)), solutions are piecewise-linear splines that use few knots, giving very compact models.

Key features#

Guarantees piecewise-linear solutions with few knots.
Provides a fast two-step algorithm that returns the sparsest solution found.
Works for both interpolation (exact fit) and regression (noisy data).

These properties are valuable in machine learning (where sparsity improves generalisation) and in signal processing (where interpretability matters). In 1D, the method is closely related to rectified linear unit (ReLU) neural networks, which also create piecewise-linar functions, but here we obtain the sparsest possible representation directly.

Mathematical background#

Problem formulation#

\[\arg\min_{f}\; \Biggl( \sum_{m=1}^{M} E\bigl(f(x_m),y_m\bigr) \;+\; \lambda\,\|\mathrm{D}^2 f\|_{\mathcal{M}} \Biggr),\]

where

The term \(E\) is a data-fidelity term (e.g., squared loss \((f(x_m)-y_m)^2\));
The regularization parameter \(\lambda>0\) balances fidelity and sparsity;
The operator \(\mathrm{D}^2 f\) is the second derivative;
The norm \(\|\cdot\|_{\mathcal{M}}\) is the total variation (TV) norm on measures, promoting sparse second derivatives.

This is the generalised Beurling LASSO (g-BLASSO): it extends the classical LASSO (\(L^1\) regularisation on vectors) and the Beurling LASSO (BLASSO, \(L^1\) on measures) by inserting the linear operator \(\mathrm{D}^2\).

Representer theorem#

A solution of the g-BLASSO has the form

\[f_\text{opt}(x)\;=\;b_0 + b_1 x \;+\; \sum_{k=1}^{K} a_k\bigl(x-\tau_k\bigr)_+,\]

where

\(b_0,b_1\in\mathbb{R}\) describe the global trend;
\((x-\tau_k)_+\) is a shifted ReLU function;
the number \(K\) satisfies \(K\le M-2\), so only few knots appear.

Uniqueness and sparsity#

The g-BLASSO may admit multiple solutions, but the algorithm implemented here leverages the theoretical analysis of [1] to always return the sparsest one (minimum K).

Algorithm#

The solver uses two stages*:

Data fitting: solve a discrete \(L^1\)-regularised problem to obtain \(y_\lambda\) (each ADMM iteration costs \(\mathcal{O}(M)\) and the residual decreases like \(\mathcal{O}(1/n)\)).
Sparsification: in exactly \(\mathcal{O}(M)\) time, extract the spline with the fewest knots.

*Stage 2 is linear-time; stage 1 is linear per iteration.

Advantages and applications#

Few-knot guarantee: the returned spline is the sparsest among all feasible solutions.
Exact interpolation: with \(\lambda=0\), the method finds the least angular spline through every point.
Segmented regression: ideal for interpretable fits in finance, epidemiology, etc.
ReLU connection: in 1-D this outperforms naïve ReLU networks in terms of parameter count.

Regularisation parameter#

Choosing \(\lambda\):

Small \(\lambda\) → exact or near-exact interpolation (risk of over-fit).
Large \(\lambda\) → smoother, eventually linear.

Practical tip: run the solver on a grid of \(\lambda\) values and plot sparsity vs. data-fidelity (e.g., root-MSE) to pick a balanced point.

Example#

Regress Module