Title: Aggregating local descriptors into a compact image representation
Author: Herve Jegou, Matthijs Douze, Cordelia Schmid, Patrick Perez
Publication: IEEE CVPR'10
Image search on large scale should consider three constraints: the search accuracy, its efficiency and the memory usage.
This is obtained by optimizing:
- the representation, i.e., how to aggregate local image descriptors into a vector representation;
- the dimensionality reduction of these vectors;
- the indexing algorithm.
This paper first contribution consists in proposing a representation that provides excellent search accuracy with a reasonable vector dimensionality, as we know that the vector will be indexed subsequently. They propose a descriptor, derived from both BOF and Fisher kernel that aggregates SIFT descriptors and produces a compact representation. It is termed VLAD (vector of locally aggregated descriptors).
- VLAD:
like BOF, the idea of the VLAD descriptor is to accumulate, for each visual word ci, the differences x−ci of the vectors x assigned to ci.
This characterizes the distribution of the vectors with respect to the center.(ci : visual word i, x: the descriptor vectors)
- Coding vector:
1. a projection that reduces the dimensionality of the vector.
Method: Approximate nearest neighbors, then using the asymmetric distance computation (ADC) variant of this approach.
2. quantization used to index the resulting vectors.
Method: PCA
Second contribution, they show the advantage of jointly optimizing the trade-off between the dimensionality reduction and the indexation algorithm.
Optimizing the dimension D′ ( the D dimensional VLAD vector reduce by PCA). Empirically measured the mean square error, the optimization selects D′=64.
沒有留言:
張貼留言