2011年3月1日 星期二

[aMMAI] Paper Summary: Distinctive Image Features from Scale-Invariant Keypoints

Title: Distinctive Image Features from Scale-Invariant Keypoints
Author: David G. Lowe
Publication: IJCV, 2004


  
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene.

The features are invariant to image scaling and rotation, and partially invariant to change in illumination and 3D camera viewpoint. They are well localized in both the spatial and frequency domains, reducing the probability of disruption by occlusion, clutter, or noise. In addition, the features are highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features, providing a basis for object and scene recognition.

The approach of Scale Invariant Feature Transform (SIFT) is as follows
  1. Scale-space extrema detection: Using a DoG function to identify potential interest points
  2. Keypoint localization: Determine the location and scale of the candidates.
  3. Orientation assignment: One or more orientations are assigned to each keypoint location based on local image gradient directions.
  4. Keypoint descriptor: Measure image gradients. Orientation histogram.


This paper has also presented methods for using the keypoints for object recognition such as using approximate nearest-neighbor lookup, a Hough transform for identifying clusters that agree on object pose, least-squares pose determination, and final verification. Other potential applications include view matching for 3D reconstruction, motion tracking and segmentation, robot localization, image panorama assembly, epipolar calibration, and any others that require identification of matching locations between images.



**** 寫來自己看的筆記 ****

SIFT目的是要改善Harris corner detector不是scale-invariant這個問題。
SIFT的四步驟:
1.1利用DoG建立scale space
2、跟圖3一起看的話,會比較清楚(因為我圖2一直看不懂)
3最上面三張照片,相當於圖2左下Scale(first octave)的部分。
3最上面三張照片表示不同scale的圖,最左邊是原圖,中間跟右邊利用Gaussian filter不同的sigma值計算出來不同scale的圖。
Gaussian filtersigma取原來的平方時,就跳到另外一個octavesample rate減半(512*512 -> 256*256),所以圖就變小了。(不過我不懂為什麼當跳到另一層的octave sample就可以減半….)
將這些圖相減,就可以得到DoG值。

1.2extrema

這就蠻好懂的,就是找出與前後左右、上一張、下一張間最不相同的點,做為feature point

2.從上面取出的feature point中刪掉一些不stable的點。
stable的點: 對比低的點、可能是edge的點
利用泰勒展開式。公式不好貼,詳細就再去看paper....

3.找出主要的方向
對每一個feature point都去計算gradient的大小和方向。
利用orientation histogram

4. 找出description

pixel附近,8*8window切成2*2sub-window,每個sub-window去統計裡面4*4的大小跟方向,形成8維的方向。
這樣每個pixel就會有4*8 = 32的維度。


參考資料:
裡面有提到DoG,圖3從這邊取出來的。
Feature Matching CYY的投影片說明,講得很詳細。 

沒有留言:

張貼留言