Title: Efficient visual search of videos cast as text retrieval
Author: Josef Sivic and Andrew Zisserman
Publication: IEEE TPAMI, 2009
The paper describes an approach to object retrieval which searches for and localizes all the occurrences of an object in a video, given a query image of the object.
Employing methods form statistical text retrieval to efficient retrieval.
The methods including inverted file system, and text and document frequency weighting.
Object retrieval using visual words can be divide two parts off-line and run-time
1. Pre-processing (off-line)- Detect affine covariant regions in each keyframe of the video. (Shape Adapted & Maximally Stable)
Represent each region by a SIFT descriptor. - Track the regions through the video and reject unstable regions.
- Build a visual vocabulary by clustering stable regions from a subset of the video. Assign each region descriptor in each keyframe to the nearest cluster centre.
- Remove stop-listed visual words
- Compute tf–idf weighted document frequency vectors
- Build the inverted file indexing structure
- Determine the set of visual words within the query region.
- Retrieve keyframes based on visual word frequencies
- Re-rank the top Ns(= 500) retrieved keyframes using spatial consistency
Then they compare the time complexity of retrieval architecture to no vector quantization.
The time complexity with vector quantization: O(MN).No vector quantization: O(NR^2D).
M: each frame contain M distinct visual words
R: each frame contain R region.
N: frames
D: SIFT feature dimension
The method in this paper allows retrieval for a particular visual aspect of an object. However, temporal information within a shot may be used to group visual aspects, and enable object level retrieval.
Some difference between document retrieval by bag-of-word and frame retrieval by bag-of-visual word:
- bag-of-word discards spatial information
- A proportion of visual word expects to match between query and target not all.
- Internet search exploit cues such as the link structure of the web and web-page
***** Personal Notes ******
Visual Analogy:
Analogy是指不相似的東西在某些部分有相似的地方。
Visual Analogy也可以用上面的方式來解釋。
沒有留言:
張貼留言