aMMAI: [aMMAI] Paper Summary: Aggregating local descriptors into a compact image representation

Title: Aggregating local descriptors into a compact image representation

Author: Herve Jegou, Matthijs Douze, Cordelia Schmid, Patrick Perez

Publication: IEEE CVPR'10

Image search on large scale should consider three constraints: the search accuracy, its efficiency and the memory usage.

This is obtained by optimizing:

the representation, i.e., how to aggregate local image descriptors into a vector representation;
the dimensionality reduction of these vectors;
the indexing algorithm.

This paper first contribution consists in proposing a representation that provides excellent search accuracy with a reasonable vector dimensionality, as we know that the vector will be indexed subsequently. They propose a descriptor, derived from both BOF and Fisher kernel that aggregates SIFT descriptors and produces a compact representation. It is termed VLAD (vector of locally aggregated descriptors).

VLAD:
like BOF, the idea of the VLAD descriptor is to accumulate, for each visual word ci, the differences x−ci of the vectors x assigned to ci.
This characterizes the distribution of the vectors with respect to the center.(ci : visual word i, x: the descriptor vectors)

Coding vector:
1. a projection that reduces the dimensionality of the vector.
Method: Approximate nearest neighbors, then using the asymmetric distance computation (ADC) variant of this approach.
2. quantization used to index the resulting vectors.
Method: PCA

Second contribution, they show the advantage of jointly optimizing the trade-off between the dimensionality reduction and the indexation algorithm.

Optimizing the dimension D′ ( the D dimensional VLAD vector reduce by PCA). Empirically measured the mean square error, the optimization selects D′=64.

aMMAI

2011年3月9日星期三

[aMMAI] Paper Summary: Aggregating local descriptors into a compact image representation

沒有留言:

張貼留言

標籤

網誌存檔

總網頁瀏覽量