MediaLab Love Chapter 2

Assistance of MediaLab Love about Javascript test and more...

Natural Feature Tracking

Augmented Reality: Camera Motion Tracking from Natural Features
今日も昨日(id:Koumei_S:20060301)と一緒でMixed Reality Lab | The Mixed Reality Lab (MXR) aims to push the boundaries of research into interactive new media technologies through the combination of technology, art, and creativity.からの情報です。
動画の特定の部分を認知して、その部分を追尾するというものです。たとえば特定の部分に文字を貼り付ければ、視点が動いてもその文字は特定の部分に張り付いたままです。

For three-dimensional (3-D) Augmented Reality (AR) applications, accurate measurements of the 6 d.o.f camera pose (i.e. position and orientation) relative to the real world are required for the proper registration of virtual objects. Currently, we are developing a robust framework and algorithm for measuring camera pose accurately by tracking natural point features in the scene alone. There are two challenges in camera pose tracking from natural features. First we must establish which features i.e. corner points correspond to which between different frames from the same sequence. Second, we must estimate the change in camera pose between frames based on the change in positions of these features. Our approach is based on always calculating camera motion relative to two or more pre-captured reference image frames of the scene. This has the advantage of preventing a gradual increase in the camera position error. Camera pose relative to the reference frames is computed through the minimization of a simple cost function based on two-view epipolar and three-view constraints on feature position. Time-series information is used to provide the starting point for this minimization and to regularize the error surface when the incoming data is impoverished (see Figure 1).

http://155.69.54.110/RESEARCH/NFT/IMAGE/NFT-intro1.gif

Figure 1.The problem is to estimate the transformation matrix, Tk, between the camera and the scene for the current frame, Vk. The system matches the current frame to two stored reference frames, VA and VB, to determine the camera position. This estimation problem is regularized using data from previous frames in the time-series.

Unfortunately, although matching to fixed reference images for each frame removes the problem of a gradual drift in the position of the virtual object, it somewhat aggravates the correspondence problem. Instead of matching to the previous frame in the image sequence, we now need to match to these reference images, which may be far from the current position. This is commonly known as the "wide-baseline" problem. We solve this problem by chaining together two mappings: (1) A global homography between the current frame and previous frame, followed by (2) local affine transforms between the previous frame and the reference frame (the previous frame is assumed to be matched correctly to each of the reference frames). For each corner point in the current image, we only consider corners in the reference image that are in the neighborhood of the predicted position (from the two mappings) as potential matches (seeFigure 2). Using this procedure, we can considerably increase the initial proportion of good matches. This decreases the amount of trials required in the RANSAC procedure and reduces the possibility of any remaining false matches.

http://155.69.54.110/RESEARCH/NFT/IMAGE/NFT-intro2.gif

Figure 2.Given a feature in the current image Vk, we aim to find the corresponding point in the reference image VB. To help constrain the search, we construct an indirect mapping from Vk to Vk-1 and then from Vk-1 to VB. We can approximate the former mapping by a homography, since the camera translation is likely to be very small between the frames. We already have a list of point correspondences between the previous frame and the reference. We interpolate between these known correspondences to estimate the new position. We now consider corner matches only in the region of this estimate (white square).

"wide-baseline" Problem、ローカル・アフィン変換、RANSACなど知らないことばかりでよく分かりません。
勝手な理解をしますと、認知のために参考となるフレーム(動画の一部)と今のカメラの過去のフレームを用意し、それらから現在のカメラ位置を推定(ホモグラフィー変換というものがあるのでしょう)させます。そして、カメラの動きに対応した文字の動きを現在の映像に反映させる。こんなところでしょうか。
ホモグラフィーでググったらこんなものもありました。
動画像処理: computer vision with proce55ing(Javaオブジェクトの読み込みに注意)
写真をドラッグするとそれに対応した床の部分を追跡してくれます。