Title: Online Dictionary Learning for Sparse Coding
Author: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro
Publication: ICML 2009
Sparse coding optimize the empirical cost function:
Sparse coding: modeling data vectors as sparse linear combination of basis elements.
This paper propose a new stochastic online algorithm for learning dictionaries adapted to sparse coding task, and proven its convergence.
n: number of samples
D: m*k dimension dictionary
Loss function l(x,D) should be small if D is “good” at representing the signal x.
Define l( x , D ) as the optimal value of the ℓ 1 -sparse coding problem:
To prevent D from being arbitrarily large, it is common to constrain its columns d to have an ℓ 2 norm less than or equal to one.
Since the computation of α dominates the cost of each iteration, a second-order optimization technique can be used in this case to accurately estimate D at each step when α is fixed.
Contrary to classical first-order stochastic gradient descent, it does not require explicit learning rate tuning and minimizes a sequentially quadratic local approximations of the expected cost.
Online Dictionary Learning
Two motivations:
Dictionary Update
Updating the dictionary uses block-coordinate descent with warm restarts, and one of its main advantages is that it is parameter-free and does not require any learning rate tuning, which can be difficult in a constrained optimization setting.
Optimizing the Algorithm
- Handling Fixed-Size Datasets.
The size of the training set is often finite. In this situation, the same data points may be examined several times, and it is very common in online algorithms to simulate an i.i.d. sampling of p( x ) by cycling over a randomly permuted training set - Mini-Batch Extension.
Improve the convergence speed of our algorithm by drawing η > 1 signals at each iteration instead of a single one, which is a classical heuristic in stochastic gradient descent algorithm - Purging the Dictionary from Unused Atoms.
Some of the dictionary atoms are never used. A common practice is to replace them during the optimization by elements of the training set.
Contribution:
- The dictionary learning problem as the optimization of a smooth nonconvex objective function over a convex set, minimizing the (desired) expected cost when the training set size goes to infinity.
- Propose an iterative online algorithm that solves this problem by efficiently minimizing at each step a quadratic surrogate function of the empirical cost over the set of constraints.
- Experimentally their algorithm is significantly faster than previous approaches to dictionary learning on both small and large datasets of natural images.
