high-quality streamable free-viewpoint video: …hhoppe.com/fvv_supplemental.pdf · high-quality...

4
High-Quality Streamable Free-Viewpoint Video: Supplemental Material Alvaro Collet Ming Chuang Pat Sweeney Don Gillett Dennis Evseev David Calabrese Hugues Hoppe Adam Kirk Steve Sullivan Microsoft Corporation Abstract In this supplemental material, we describe our default, tower-based camera configuration for the capture stage. We analyze the local configuration of a single tower and visualize the combined camera frustum. We also describe our camera calibration process and show larger prints for the experiments on different camera configurations. 1 Stage Configuration The current capture stage is a greenscreen volume encircled by 106 synchronized high-speed video cameras, as shown in Fig. 2. These cameras are mounted on 8 wheeled towers (with 12 cameras each), plus 10 cameras mounted in an array overhead. The default con- figuration is 53 Sentech STC-CMC4MCL RGB cameras and 53 Sen- tech STC-CMB4MCL IR cameras covering a reconstruction volume of 2.8 m in diameter and 2.5 m in height. We record 2048x2048 im- ages at 30 Hz, though we can go up to 60 Hz for high-speed action. We add unstructured static IR laser light sources on the towers to provide strong IR texture cues. We reached the current stage configuration after multiple iterations. Our initial implementations contained 16 cameras, then 32, and we developed most of our system using 40 cameras. After establish- ing targets for quality, acquisition volume, and types of content, we built our current configuration: 106 cameras on movable tow- ers plus an overhead rig, mapped dynamically (at processing time) into pods (logical groups of 2 IR cameras + 2 RGB cameras). We can easily change camera count and reconfigure to different vol- ume sizes/shapes, and the quality improves/degrades gracefully as the configuration changes. Given our unconstrained capture con- tent, we found that a denser sampling of viewpoints was particularly critical to resolve severe occlusions (e.g., multiple actors/props in- teracting closely) and recover fine features with silhouettes comple- menting stereo reconstruction. 1.1 Tower configuration Fig. 1(left) shows an example of one of our wheeled towers. Each tower contains 12 cameras (6 RGB and 6 IR) mapped into 3 pods of 2 RGB and 2 IR cameras each. Stereo depth computations are performed only within a pod. The baseline for each stereo pair is approximately 60 cm, which corresponds to angles of triangulation between 7 and 13 degrees within the capture volume. Our configuration for RGB cameras is slightly different than for IR cameras. We opted for a horizontal arrangement of RGB cameras within each tower to provide a more even sampling of the view space, as the RGB cameras are used for texturing. The IR pairs are positioned vertically to maximize the coverage of standing subjects for reconstruction (see Fig. 3). 2 Calibration We perform a checkerboard-based calibration of the camera setup (intrinsics and extrinsics) before every set of takes. To speed up POD 1 POD 2 POD 3 IR RGB Figure 1: Configuration of a single tower, and its mappings into camera pods and stereo pairs. this typically tedious process, we built a wheeled octagonal tower which we call the Octolith, as shown in Fig. 4. With the Octolith, we only need to capture a few frames (typically five) at different po- sitions within the capture volume to robustly compute the intrinsic and extrinsic calibration parameters. 3 Experiments on camera configuration Fig. 5 shows larger prints of the evaluation of geometry and texture in different camera configurations (main paper, Fig. 13).

Upload: tranbao

Post on 25-Jul-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-Quality Streamable Free-Viewpoint Video: …hhoppe.com/fvv_supplemental.pdf · High-Quality Streamable Free-Viewpoint Video: ... Alvaro Collet Ming Chuang Pat Sweeney Don Gillett

High-Quality Streamable Free-Viewpoint Video:Supplemental Material

Alvaro Collet Ming Chuang Pat Sweeney Don Gillett Dennis Evseev David CalabreseHugues Hoppe Adam Kirk Steve Sullivan

Microsoft Corporation

Abstract

In this supplemental material, we describe our default, tower-basedcamera configuration for the capture stage. We analyze the localconfiguration of a single tower and visualize the combined camerafrustum. We also describe our camera calibration process and showlarger prints for the experiments on different camera configurations.

1 Stage Configuration

The current capture stage is a greenscreen volume encircled by 106synchronized high-speed video cameras, as shown in Fig. 2. Thesecameras are mounted on 8 wheeled towers (with 12 cameras each),plus 10 cameras mounted in an array overhead. The default con-figuration is 53 Sentech STC-CMC4MCL RGB cameras and 53 Sen-tech STC-CMB4MCL IR cameras covering a reconstruction volume of2.8 m in diameter and 2.5 m in height. We record 2048x2048 im-ages at 30 Hz, though we can go up to 60 Hz for high-speed action.We add unstructured static IR laser light sources on the towers toprovide strong IR texture cues.

We reached the current stage configuration after multiple iterations.Our initial implementations contained 16 cameras, then 32, and wedeveloped most of our system using 40 cameras. After establish-ing targets for quality, acquisition volume, and types of content,we built our current configuration: 106 cameras on movable tow-ers plus an overhead rig, mapped dynamically (at processing time)into pods (logical groups of 2 IR cameras + 2 RGB cameras). Wecan easily change camera count and reconfigure to different vol-ume sizes/shapes, and the quality improves/degrades gracefully asthe configuration changes. Given our unconstrained capture con-tent, we found that a denser sampling of viewpoints was particularlycritical to resolve severe occlusions (e.g., multiple actors/props in-teracting closely) and recover fine features with silhouettes comple-menting stereo reconstruction.

1.1 Tower configuration

Fig. 1(left) shows an example of one of our wheeled towers. Eachtower contains 12 cameras (6 RGB and 6 IR) mapped into 3 podsof 2 RGB and 2 IR cameras each. Stereo depth computations areperformed only within a pod. The baseline for each stereo pair isapproximately 60 cm, which corresponds to angles of triangulationbetween 7 and 13 degrees within the capture volume.

Our configuration for RGB cameras is slightly different than for IRcameras. We opted for a horizontal arrangement of RGB cameraswithin each tower to provide a more even sampling of the viewspace, as the RGB cameras are used for texturing. The IR pairs arepositioned vertically to maximize the coverage of standing subjectsfor reconstruction (see Fig. 3).

2 Calibration

We perform a checkerboard-based calibration of the camera setup(intrinsics and extrinsics) before every set of takes. To speed up

POD 1

POD 2

POD 3

IR

RGB

Figure 1: Configuration of a single tower, and its mappings intocamera pods and stereo pairs.

this typically tedious process, we built a wheeled octagonal towerwhich we call the Octolith, as shown in Fig. 4. With the Octolith,we only need to capture a few frames (typically five) at different po-sitions within the capture volume to robustly compute the intrinsicand extrinsic calibration parameters.

3 Experiments on camera configuration

Fig. 5 shows larger prints of the evaluation of geometry and texturein different camera configurations (main paper, Fig. 13).

Page 2: High-Quality Streamable Free-Viewpoint Video: …hhoppe.com/fvv_supplemental.pdf · High-Quality Streamable Free-Viewpoint Video: ... Alvaro Collet Ming Chuang Pat Sweeney Don Gillett

2.8 m diameter

3.75 m

2.75 m

0.3 m

Figure 2: (top) View of our full stage in its default 106-camera configuration. (bottom) Base measurements of the stage in its defaultconfiguration. The reconstructed area is a cylinder of 2.8 m diameter with 2.5 m height. We also show a visualization of the IR cameras of asingle tower to provide a sense of scale.

Page 3: High-Quality Streamable Free-Viewpoint Video: …hhoppe.com/fvv_supplemental.pdf · High-Quality Streamable Free-Viewpoint Video: ... Alvaro Collet Ming Chuang Pat Sweeney Don Gillett

Figure 3: Combined frustum of the IR cameras of a single tower (analogous for the RGB cameras).

Figure 4: Calibration object (Octolith).

Page 4: High-Quality Streamable Free-Viewpoint Video: …hhoppe.com/fvv_supplemental.pdf · High-Quality Streamable Free-Viewpoint Video: ... Alvaro Collet Ming Chuang Pat Sweeney Don Gillett

LG/LT

HG/LT

LG/HT

HG/HT

Figure 5: For Kendo dataset, comparison of geometry and texture quality for different camera configurations (Low and High number ofcameras for both Geometry and Texture reconstruction).