The L1-magic, a take on the data term

Recent advances in Compressed Sensing (CS) have focused a great deal of attention onto $\ell_1$ norm minimization. This is due to the fact proven by D.L. Donoho (see “Compressed sensing”, Donoho 2004) that minimizing an $\ell_1$ norm gets the same sparsity-enforcing behavior than minimizing an $\ell_0$ norm. This allowed to actually enforce sparsity, since minimizing an $\ell_0$ norm is an NP-hard problem.
The classical setup to do CS is in the form of an ill-posed (under-determined) inverse problem (for a comprehensive paper, see “Compressive sensig: a paradigm shift in signal processing”, Holtz 2008“) given as:

$\text{minimize} \quad ||\mathbf{x}||_1 \quad \text{s.t.} \quad ||\mathbf{\Phi x} - \mathbf{y}||_2^2 \leq \epsilon,$

with $\mathbf{y}$ the observations vector and $\mathbf{\Phi}$ the system’s forward matrix and $\epsilon$ a bound on the error. Notice that this formulation minimizes an $\ell_1$ norm on the signal to reconstruct and an $\ell_2$ norm on the observations.

It is however less common to minimize an $\ell_1$ error on the data term. Kevin Chan (who kindly let me talk about his paper on this webpage), a former PhD student of Michael at UCSB, published a paper called “Simultaneous temporal superresolution and denoising for cardiac fluorescence microscopy“. In this paper, he solves an inverse problem by minimizing an $\ell_1$ cost function (both data term and regularization). The presented method works on any quasi-periodic signal. The basic idea behind the method is to sequentially acquire multiple repetitions of a periodic signal and treat each of these repetition as a low-resolution observation of the same high-resolution signal. All observations are then fused together into a high-resolution signal (for more information, check out the paper and some code provided here).

Method presentation

For a demo we had to do, we decided to apply Kevin’s method on videos of a toy rotating plane, using the plugin provided here. The idea was to film the plane as it passes in front of a camera multiple times and use the method to perform temporal super-resolution.
The video hereunder shows the result of the algorithm applied on a video comprising multiple cycles of the toy-plane (typically 8-9 cycles).

All data shown here is ours and is not free to use, contact me if you’re interested in using it.

The method was applied in the white square in the middle of the image, hence the two different time scales. The effect of the temporal super-resolution is obvious, the movement of the plane is much smoother, as well as the details that become sharper once the plane is inside the square area.

In this paper, the inverse problem is solved by minimizing the following cost function

(1) $\begin{equation*} \mathbf{\hat{f}_x}(\lambda) = \text{argmin}_{\mathbf{f}}\ || \mathbf{C f_x - g_x}||_1 + \lambda|| \mathbf{\Gamma f_x}||_1 \end{equation*}$

where $\mathbf{C}$ is the system’s forward matrix, $\mathbf{f_x}$ is the high temporal resolution signal of interest and $\mathbf{g_x}$ is the observations vector and $\mathbf{\Gamma}$ is a second order Tikhonov matrix used for regularization.

Experiment

In order to emphasize the effect of minimizing an $\ell_1$ norm on the data term, we included some outliers in our measurements. To do this, I simply put my hand in front of the plane as it passed in front of the camera. The video hereunder shows all 9 cycles used to give as input to the method. The perturbed cycle is quite obvious, with my hand obfuscating the plane.

Comparison $\ell_2$ vs $\ell_1$

I compared minimizing both an $\ell_2$ and an $\ell_1$ norms in the equation presented above, the video hereunder shows both results. On the left side of the video, you have the $\ell_2$ minimization and on the right hand side is the $\ell_1$ minimization.

The difference is stunning; the $\ell_2$ minimization clearly lets the hand appears and you can see it flickering, while it is almost impossible to see a perturbation on the $\ell_1$ minimization.

Statistical point of view

To give an element of explanation as of why is the $\ell_1$ norm more robust than $\ell_2$ to outliers, let us take a statistical point of view.

We perform linear regression assuming that the errors (the residuals of Eq.(1)) follow respectively a Gaussian distribution or a Laplace distribution. This linear regression is exactly similar to minimizing respectively an $\ell_2$ norm and an $\ell_1$ norm. A thorough explanation of why this is the case is outside the scope of this blog article, but let us admit that finding the maximum likelihood estimation (MLE) on the model parameters corresponds to minimizing respectively an $\ell_2$ and an $\ell_1$ norms (it is the case, a paper mentioning this can be found here).

The plot hereunder shows a Gaussian and a Laplace distributions, both with unit variance, or scale ( $\sigma = 1$ ) and zero mean ( $\theta = 0$ ).

The fact that the Laplace exhibits high tails explains why the outliers do not have such a huge impact on the MLE obtained parameters, as having values off-centered is still probable. For the Gaussian case, it is so unlikely to have a value outside of +/- 6 sigmas that any outlier will have a tremendous influence on the MLE parameters. Hence the higher robustness of $\ell_1$ over $\ell_2$ .

There is nothing new research-wise in this post, robust norm was a hot topic in the 60s, for instance with the famous Huber loss, but I think this makes for a beautiful illustration of the robustness of the $\ell_1$ norm to outliers.

The L1-magic, a take on the data term

Method presentation

Experiment

Comparison $\ell_2$ vs $\ell_1$

Statistical point of view

Recent Posts

Recent Comments

Archives

Categories

Meta

The L1-magic, a take on the data term

Method presentation

Experiment

Comparison vs

Statistical point of view

Recent Posts

Recent Comments

Archives

Categories

Meta

Comparison $\ell_2$ vs $\ell_1$