vision-based endoscope tracking for 3d ultrasound image...

Vision-based endoscope tracking for 3D ultrasound image-guided surgical navigation

[Yang et al. 2014, Comp Med Imaging and Graphics]

Gustavo Sato dos Santos

IGI Journal Club 23.10.2014

Motivation

Goal: Surgical navigation for minimally-invasive fetal surgery

Disadvantages of other tracking methods •  Optical: line-of-sight between tracker and markers •  Electromagnetic (EM): prone to noise - electronic devices in OR •  EM tracker + inertia measurement unit (IMU): issues with tracking

initialisation, drift errors, accuracy •  Vision-based (Structure-from-Motion): not suitable due to

unpredictable amniotic fluid, need to minimise illumination

Approach

•  Initial camera position by ultrasound image-based localisation; •  Vision-based tracking

•  Ultrasound: Hitachi ProSound α10 w/ 3D tilt-scanning convex sector transducer

•  mounted on rigid bracket (to minimise motion artefacts)

•  Endoscope: Shinko Optical, 5.4mm diam. rigid endoscope, Xenon light source

•  Translation stage: Sigma Koki, 4 µm/pulse resolution, 1 µm precision

Workflow

U/S image-based initialisation

•  Scene geometry acquired by 3D ultrasound imaging •  Manual selection of placenta ROI; •  Thresholding with isovalue → meshed surface model (50,000 vertices)

•  Camera position acquired by localising fiducial (8 cm length, 0.3 cm diam.) •  Prior fiducial-camera calibration (f → c transformation)

•  Localisation error ≈ 1.32 mm •  Low acquisition rate, multiple sampling required for robustness

Underwater camera calibration

Optical properties of medium → intrinsic parameters of camera

•  Camera pre-calibrated in saline solution used for experiments

•  Camera calibration toolbox for Matlab (Bouguet JY, 2004)

•  Images corrected for radial and tangential lens distortions

•  Brown-Conrady model (Brown DC, Photon Eng 1971)

Inter-frame feature matching

Speed-Up-Robust-Features (SURF) algorithm [Bay et al., Comput Vis Image Und 2008]

•  Scale and rotation invariant features

•  FAST-Hessian feature detection, 64-element descriptor representing

distribution of Haar-wavelet responses of feature neighbourhood

•  Robust even in scenes with poor texture (important for tissue imaging)

•  Outlier removal: RANSAC algorithm

•  Result: 10-30 reliable feature matches (20 required for subsequent

processing)

Inter-frame feature matching

Texture conditions: (a) Desirable (b) Moderate (c) Poor

Phantom Ex-vivo monkey placenta

2D-3D point correspondence

Mapping image coordinates (ip,jp) to 3D coordinates (xp,yp,zp) •  Project 3D vertices of ultrasound image model to the camera plane to obtain

their image coordinates: •  Delaunay triangulation of points (i,j,k-1z) → dense depth map Z(i,j) •  3D camera-centric coordinates of interest points:

K: intrinsic camera parameter matrix; k-1Ru,k-1tu: rotation matrix and translation vector from camera’s viewpoint at frame k-1

(i0,j0) and (fx,fy): principal point and focal length from K

2D-3D point correspondence

3D interest points are updated every frame, according to matching features across two adjacent images

Pose estimation

Pose estimation as Perspective-n-Point (PnP) problem

•  Better accuracy and stability than Direct Linear Transformation (DLT) •  EPnP algorithm (Lepetit V et al., Int J Comput Vision 2009):

•  non-iterative •  solves coordinates of M=4 virtual control points q={q1,…,qM}

•  Control points q consist of the centroid of interest points p and another 3 points that align closely to the principal direction of p

•  Computational time: O(n) •  Performs well even with noisy non-fixed interest points

λlm: homogeneous barycentric coordinates summing to one

Pose estimation

EPnP implementation. Pink surface = placental scene geometry; textured patch = camera views projected onto constructed surface model.

Overview of workflow

14

Results: phantom study, controlled trajectory

Each processing frame was at an interval of 10 acquisition frames (approx. 7s). Total displacement = 15-25 mm.

Results: phantom study, freehand trajectory

In the trajectory of approx. 30 mm, the mean absolute error was 2.69 mm in the 30 processed frames (300 acquired frames in 10 s).

Results: ex vivo study, static estimation

•  Analysis of 100 estimations (5 positions x 20 frames) •  Validation against optical tracking, which has ~0.17 mm error •  Errors larger than phantom validation of static estimation

Results: effect of relocalisation (phantom)

•  Ultrasound image-based delocalisation at 200th frame of 400-frame video •  Rectification of cumulative errors in vision-based tracking •  Final positional error reduced from 11.35 mm to 4.61 mm over total

displacement of 45 mm

Results: computation time

•  On a workstation with Intel Core i7-2600 3.4 GHz processor

Contributions / Future work

•  Approach essentially vision-based, augmented with scene geometry information from ultrasound

•  Relocalisation corrects cumulative errors or tracking failures

•  Need to check performance under conditions closer to clinical setting (various kinematics, scene geometries, and illumination)

•  Limitations in quality of endoscopic images can be addressed by:

•  fluorescence endoscope;

•  ultra-high sensitive endoscopic camera;

•  hyperspectral imaging of placental vasculature

•  Ultrasound image artefacts lowered accuracy in the ex vivo study

21

vision-based endoscope tracking for 3d ultrasound image...

Documents