2011年3月9日 星期三

[aMMAI] Paper Summary: Aggregating local descriptors into a compact image representation

Title: Aggregating local descriptors into a compact image representation
Author: Herve Jegou, Matthijs Douze, Cordelia Schmid, Patrick Perez
Publication: IEEE CVPR'10



Image search on large scale should consider three constraints: the search accuracy, its efficiency and the memory usage.
This is obtained by optimizing:
  1. the representation, i.e., how to aggregate local image descriptors into a vector representation;
  2. the dimensionality reduction of these vectors;
  3. the indexing algorithm.


This paper first contribution consists in proposing a representation that provides excellent search accuracy with a reasonable vector dimensionality, as we know that the vector will be indexed subsequently. They propose a descriptor, derived from both BOF and Fisher kernel that aggregates SIFT descriptors and produces a compact representation. It is termed VLAD (vector of locally aggregated descriptors).

  • VLAD:
    like BOF, the idea of the VLAD descriptor is to accumulate, for each visual word ci, the differences xci of the vectors x assigned to ci.
    This characterizes the distribution of the vectors with respect to the center.(ci : visual word i, x: the descriptor vectors)

  • Coding vector:
    1. a projection that reduces the dimensionality of the vector.
    Method: Approximate nearest neighbors, then using the asymmetric distance computation (ADC) variant of this approach.
    2. quantization used to index the resulting vectors.
    Method: PCA


Second contribution, they show the advantage of jointly optimizing the trade-off between the dimensionality reduction and the indexation algorithm.
Optimizing the dimension D ( the D dimensional VLAD vector reduce by PCA). Empirically measured the mean square error, the optimization selects D=64.

沒有留言:

張貼留言