new seam elimination in free-viewpoint videouffe/xjobb/karl_bristav_thesis... · 2016. 8. 17. ·...

Seam Elimination in Free-ViewpointVideoMaster of Science Thesis in Computer Science -Algorithms, Languages, and Logic

KARL BRISTAV

Department of Computer Science & Engineering

Chalmers University of Technology

University of Gothenburg

Gothenburg, Sweden 2016

Seam Elimination in Free-Viewpoint VideoKARL BRISTAV

c© KARL BRISTAV, 2016.

Chalmers University of TechnologyDepartment of Computer Science and EngineeringSE-412 96 GothenburgSwedenTelephone +46 (0)31-772 1000

Cover: Individual camera contributions to scene geometry (top) and theirrespective weight maps (bottom) generated using a method described in thispaper. The scene geometry and images are taken from the Sintel movieproject, c© copyright Blender Foundation — www.sintel.org

Department of Computer Science and EngineeringGothenburg, Sweden 2016

Abstract

Free-viewpoint video (FVV) is a term for any system that lets users freelyadjust the viewpoint in a video to novel positions, synthesized from sourcedata in the video stream. Recent advances in the area by Kampe et al.suggest encoding the geometric information separately as dynamic 3D voxeldata, together with RGB-video streams from a set of standard RGB camerascapturing the scene from individual view points. During play back, the 3Dvoxel data is rendered and the color information from the RGB cameras areback-projected onto this geometry, to provide the final result per frame. Inthe setting of a voxelized geometry and a number of RGB camera streams,some graphical artifacts may occur if the images of the cameras are projectedonto the geometry with equal weights. We refer to this as the naive method.To alleviate these issues, three methods are proposed, all building on cal-culating weights for use in a weighted averaging when calculating the finalcolor of a piece of geometry. The most promising of the proposed methodsis implemented and tested against an implementation of the naive method.The results show that our method reduces the number of graphical artifactswhile keeping the processing time low.

Keywords: FVV, Free-Viewpoint Video, Computer Graphics, Voxels

Acknowledgements

First of all, I would like to thank Ulf Assarsson for suggesting this topic formy master’s thesis.

I would like to hand a huge thanks to my supervisor, Erik Sintorn, forhis great ideas, optimism, and support (above and beyond the call of duty).

I also thank Mika Segerstrom for always being there for me wheneverprogress was slow or i needed a break.

Finally, I want to thank Jakob Jarmar and Orvar Segerstrom for readingand commenting on my draft when I had stared myself blind.

This thesis uses material from the Sintel project, c© copyright BlenderFoundation — www.sintel.org

Karl Bristav, Gothenburg, January 2016

Contents

1 Introduction 2

2 Previous Work 4

3 Method 73.1 Choosing colors for view-dependent materials . . . . . . . . . 93.2 Eliminating occlusion seams and other artifacts . . . . . . . . 10

3.2.1 Identifying areas of visibility in voxel space . . . . . . . 123.2.2 Dynamically identifying areas of visibility in view-camera

space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.3 Identifying areas of visibility in scene camera space . . 16

4 Results 194.1 Test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.1 View 1: Wall . . . . . . . . . . . . . . . . . . . . . . . 224.1.2 View 2: Woman . . . . . . . . . . . . . . . . . . . . . . 244.1.3 View 3: Roof . . . . . . . . . . . . . . . . . . . . . . . 264.1.4 View 4: Bucket . . . . . . . . . . . . . . . . . . . . . . 284.1.5 Max distance size comparison . . . . . . . . . . . . . . 30

4.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Discussion 33

6 Conclusion 35

7 Future Work 36

Bibliography 37

1

Chapter 1

Introduction

Free-Viewpoint Video (FVV) is a system for viewing video in which theviewer can navigate freely inside the 3D-environment of the video and lookat the filmed settings and actors from arbitrary view position and angle [1].This could involve filming the scene with a number of cameras simultaneouslywith traditional color cameras as well as cameras capturing the distance tothe objects in the scene, providing 3D-information from which the scene canbe reconstructed.

One way of coloring the geometry of the reconstructed scene is to projectthe colors from the cameras onto it. This means that for every point on thegeometry being colored, that point is projected onto the image plane of eachvisible camera in order to find the color that camera has for the projectedpoint. When all visible cameras have been sampled, the final color of thegiven point is set to be the average of all the sampled colors. We refer to thisas the naive method.

Some cameras may have different colors for the same point on the ge-ometry, for example if their respective images have different effective resolu-tion for that piece of geometry, or if the material of the geometry is view-dependent. A view-dependent material is a material that does not look thesame from different angles. If two or more images then are projected onto asurface, since some of the cameras might not have recorded the same colorfor the same spot on the geometry, visual seams may occur on the bordersof camera visibility.

Another type of visual artifact comes from the fact that both the voxelizedgeometry and the camera images are approximations of the original scene.Voxel geometry is likely to protrude outside the boundaries of the originalgeometry, causing a discrepancy between the two. When determining thecolor of a point on such a voxel protrusion, the projection of that point ontoa camera might in some circumstances give the color of a point behind the

2

given point, as seen from the camera. Even if the geometry was perfect andno protrusions existed, every pixel of the camera image represents a smallfrustum rather than a single point. A pixel on the edge of an object willhave color information from both that object and its background, makinga projection of a point at the edge of an object still acquire color from thebackground.

The problem to solve is then how to project colors onto the surface insuch a way that these artifacts are not visible.

This thesis considers the problem in the context of a voxelized scene.The scene consists of 100 frames of voxel geometry and four cameras, eachof which has an associated file specifying position, rotation, field-of-view,and aspect ratio, as well as a video stream containing 100 frames of videocorresponding to the geometry frames. The voxel geometry is reconstructedfrom a virtual depth cameras in Blender [2].

To limit the scope of the task, some aspects of the problem will not beconsidered in this thesis. Temporal consistency, i.e. how the result behaveswhen applied to consecutive frames and then played back, is an importantcharacteristic of a possible solution, but too large to also consider in this the-sis. The speed of the solution is also not wholly considered, as this thesis willfocus on finding a solution; not optimizing it. A precomputation approachwill be rejected only if it is obviously extremely slow.

3

Chapter 2

Previous Work

An early architecture for FVV is the Multiple-Perspective Interactive Videopresented by Kelly et al. [3], which lays a foundation for constructing virtualviews based on a number of fixed-position camera streams and simplifiedgeometry representations. Another early system is the Virtualized Reality,presented by Rander et al. [1], in which a global mesh geometry is createdfrom local meshes given by each camera and then textured. A system isconsidered specialized if it is not designed with general situations in mind,but can only be used for certain specific situations.

Examples of specialized systems include that of Horiuchi et al. [4], whopresent a system for rendering musicians from different viewpoints on dif-ferent backgrounds. Koyama et al. [5] present a system for live FVV of asoccer stadium. Carranza et al. [6] present a model that utilizes a humanbody model and thus is unsuitable for subjects other than human beings.Hauswiesner et al. [7] also make use of a human body model when theypresent a system that allow users to try on clothes articles in a virtual en-vironment. All of these make assumptions regarding the subjects and theirproperties.

A common property of many of the specialized systems mentioned earlieris that they make use of methods that only utilize the images provided by theviewpoints and information regarding the position and orientation of those;the geometry of the subject is not a part of the model. These systems areclassified as image-based systems, whereas systems that utilize a computedgeometry of the subject are called model-based systems [8].

General systems have the property that they are designed to work on anysubject, and thus make very few assumptions regarding what kind of scene isportrayed. Examples of general systems include the unstructured lumigraphby Buehler et al. [9], which is dependent on a relatively high number ofcameras. The systems presented by both Collet et al. [10] and Salvador et al.

4

[11] blends colors from cameras based on surface normals and the intrinsicinterpolation of surface normals in a mesh geometry; the system of Colletet al. [10] additionally relies on a strictly controlled environment and a highnumber of cameras to achieve a good result. Ishikawa et al. [12] proposesan image-based general system where the user can walk through the scene,something that is normally not possible using image-based methods. Thesystem, however is limited to views in a specific plane and also requires ahigh amount of specifically positioned cameras. Palomo and Gattass [13]propose an image-based algorithm that is capable of performing FVV usingthe images of conventional cameras and images by depth cameras. However,the purpose of the system is to be an efficient way of rendering FVV in real-time without need for precomputation, and it does not take into concern thevisual artifacts that might be present..

Some systems fall somewhere between specialized and general. Wanget al. [14] consider a framework for synthesis of viewpoints between alreadyexisting viewpoints, as opposed to a system where the viewpoint can be freelymoved; this fact does not make it specialized, but it might not be consideredentirely general either.

FVV systems are possible to use in a wide variety of circumstances, andon different platforms. Systems have been presented for controlling FVVsystems using a touch screen for possible use on mobile devices [15], and forstreaming and cloud rendering of FVV data [16, 17].

Kampe et al. [18] present an efficient method for storing time-varyingvoxel data in a directed acyclic graph (DAG). The voxel geometry this thesisuses is stored using this method.

An interesting technique is relighting [19], with which the scene lighting isundone so that the scene may be lit from an arbitrary source within the FVVsystem. One usage of relighting is when FVV content is mixed with othercontent, either other FVV content or some virtual environment. Relightingis then used so that all content can be lit the same way. Relighting could beused to eliminate the color differences between cameras so that seams willnot occur. On the other hand, this would also erase any previous lightinginformation present in the scene. This thesis aims to facilitate the translationof already lit scenes from real world to FVV, preserving the lighting fromthe original scenes.

As for the texturing of the scene, Salvador et al. [11], Collet et al. [10],and Starck and Hilton [20] all employ weighted camera blending to achievetheir results. The approaches of all of these, however, are not applicableto this thesis because they make use of normals in a generated 3D mesh togenerate camera weights, something that is not directly applicable in thisthesis as no information regarding the normals of the geometry is available.

5

An interesting method for FVV rendering without losing view-dependentinformation (such as reflections) is view-dependent texture mapping (VDTM).VDTM was originally presented as a method for simulating geometric detailon basic geometric models [21], similar to how normal mapping [22] is used inmodern computer graphics. Volino and Hilton [23] present a layered texturerepresentation for VDTM with focus on FVV use that retains view-dependentinformation, improving realism.

Lempitsky and Ivanov [24] Describe a method for constructing a mosaic-like texture mapping from different cameras on triangle meshes. The methoduses seam levelling in order to smooth seams between mosaic fragments.The seam levelling implementation presented makes use of the triangle meshstructure, making it unsuitable for use with a voxel grid.

6

Chapter 3

Method

Given a scene geometry and the color images from a number of cameras, thisthesis considers how to assign colors to any point in space. The multipleframes of scene geometry consists of voxels specified by a single DirectedAcyclic Graph (DAG) containing all geometry frames, as described by Kampeet al. [25], and a number of static cameras, each with an associated position,orientation, field-of-view angle, and an image stream that correspond to theframes of the geometry.

One way of solving the problem of determining the color of a given pointp is to project p onto the camera plane of each scene camera Ci. The colorsColip of the pixels at the projected positions in the scene camera imagesi = [1..N ] are then averaged to calculate the final color Colp:

Colp =

∑Ni=1ColipN

(3.1)

This method, however, disregards the geometry of the scene and may pro-duce erroneous color values for points that are occluded by some geometry, asthe color from the occluder will be used in the final value. We can determinewhether a point is seen by a given camera or not by comparing the distanceto the point from the scene camera and the value of the projected position onthe depth map (an image from a scene camera’s point of view representingdistances to the geometry) of the scene camera [11], if such a depth map isavailable. If the value obtained from the depth map and the distance betweenthe point and the scene camera are sufficiently close, the point is consideredvisible from that scene camera and the scene camera should be sampled forthe final color; if the values differ, the point is occluded and that camerashould not be sampled for the final color. Another way to test if a point isseen by a specific scene camera is to shoot a ray from the point towards thescene camera; if the ray hits geometry between the point and the camera,

7

the point is occluded. The final color is then obtained by what we call thenaive projection method:

Colp =

∑i∈Cv Colip|Cv|

(3.2)

where Cv is the set of scene cameras visible from the point.For a given point, the color given by projection on scene camera images

may differ between cameras even if the point represents the same object forthe scene cameras. There are various conceivable reasons for this. Firstly,the material of the object may be view-dependent, that is, it looks differentfrom different directions. Any material that does not reflect light uniformlyis view-dependent, as more light will be reflected in certain directions thanothers. Secondly, there might be a mismatch between the scene cameraimages and the scene geometry, as the voxelized geometry does not exactlycorrespond to the original scene geometry. Such a mismatch could cause anerroneous color sampling along silhouettes of objects; a point on the veryedge of an object, as seen from a specific scene camera, could give the colorvalue from the object behind it instead. The sampling mismatch is illustratedin Figure 3.1a. Another kind of mismatch can be seen is shown in Figure3.1b, where the color value is obtained from the correct surface at the wrongposition, thereby displacing the projection on that surface in the directiontoward the camera.

(a) (b)

Figure 3.1: Sampling mismatches occur. The dashed circle and line representsoriginal geometry, and the squares represent voxels in the scene geometry. Inboth cases, coloring the point p1 will give the color of p2 from the camera, asthe position of p1 corresponds to p2 in the original geometry

These are the causes of the visual artifacts that may be present whenthe naive projection method is used. The problem then becomes how to,

8

correctly and without visual artifacts, determine the color of a given point ofthe geometry surface.

3.1 Choosing colors for view-dependent ma-

terials

As mentioned earlier, due to the nature of some materials, a point may givedifferent colors from different scene cameras. The problem of choosing whichof these colors to use, or how to combine them, for the final color is not trivial.When choosing colors for a view-dependent material, no solution is complete.Consider a mirror surface; every single view of the mirror will see a uniquereflection in the mirror, thus we would need an infinite number of cameras toperfectly replicate the possible views of the mirror surface. Thus, everythingdescribed from this point on is to be considered as approximations, that maygive plausible results for slightly view-dependent materials.

Simply choosing the color from the scene camera that might be consideredto have the best view of a given position may give unsatisfactory results: Iftwo neighboring points have different best scene cameras with different colors,the result will not look good. Similarly, blending colors from all visible scenecameras with equal weight may also give unsatisfactory results, as some scenecameras might have a very angled or otherwise compromised view of the givenpoint, making them unsuitable for sampling.

There are a number of possible solutions to this problem. Instead ofchoosing the best scene camera to sample for a color, it is possible to choosea number of suitable scene cameras, and blend the color values acquiredfrom them to produce the final color [11]. Apart from just choosing thecameras that see the given point, the normal of the surface on which thegiven point is located can be used to select which of these cameras can beused. The surface normal can also be used to determine the blending weightsof these chosen cameras. The blending weight of each selected scene cameracould be defined in such a way that the scene cameras for which the viewvectors are the closest to being parallel to the surface normal of the givenpoint are given more weight in the blending [11]. If blending the colors isnot considered a viable option, levelling functions may be used. Levellingfunctions are commonly used in image stitching, and can be used to combineimages without overlapping areas [26].

9

3.2 Eliminating occlusion seams and other ar-

tifacts

When a surface is partly visible by a camera, A, either because the surface ispartly occluded or because it is partly outside the view of A, there may be avisible seam between the points where A is used and the points where it is notif the surface in question is view-dependent or if the cameras have differenteffective resolution for that surface. Figure 3.2 illustrates the circumstancesunder which seams may appear.

Figure 3.2: point p1 is in full view of scene cameras A and B, while itsneighbor, p2 is only in view by scene camera B. This could cause a seambetween p1 and p2.

The problem is how to, for any point in the scene, assign weights to scenecameras in order to present a plausible result without visible seams or otherartifacts, such as sampling mismatches.

While there is not much previous work that deals explicitly with thiskind of seams, some of the methods for handling view-dependent materialsas described in section 3.1 could conceivably be used to handle this problem.A levelling function, for instance, could, combined with good selection ofscene cameras to sample, provide a possible solution. That solution, however,makes use of a triangle mesh structure which we do not have access to.

Our solution to this problem is to assign weights to scene cameras basedon how well they see a piece of geometry. To achieve this, the view of eachscene camera is divided into areas of visibility. An area of visibility is a

10

geometrically continuous area which is visible to a certain scene camera.These areas of visibility are used to establish weight maps, a mapping ofweights per scene camera onto either the geometry or the scene camera image,specifying the weight of each color in the scene camera image. The weightsshould be assigned in such a fashion that each area of visibility has a setweight, except near the edges, where the weights are lower the closer tothe edge they are. This ensures that the center of the area of visibility isassociated with full weight of that scene camera, while the scene camera’sinfluence fades out near the edges, making for a seamless blending into othercameras’ areas of visibility. Figure 3.3 illustrates two areas of visibility fromtwo different cameras. This solution alleviates the problems with occlusionseams by making sure that if an area of visibility overlaps another one, theweights of both areas are lower closer to the edges than in the center, fadingout their influence. This solution also alleviates the problem of samplingmismatches along edges of objects because the silhouette edge of an objectis per definition also the edge of the area of visibility on that object; thusthe colors that are likely to be erroneously sampled have a very low weightcompared to those sampled from scene cameras for which the given geometrypoint is not along the object’s edge.

Figure 3.3: A surface with areas visible to scene cameras A and B, respec-tively. In p1, B has a low weight while A has a high weight. In p2, A and Bhave the same weight. In p3, B outweighs A.

The presentation of three methods will follow, each taking a slightly dif-ferent approach to the solution presented.

11

3.2.1 Identifying areas of visibility in voxel space

With this method, areas of visibility are built per scene camera by identifyingwhich voxels are on the edge of such an area, followed by an iterative fillingprocess that assigns weights to voxels based on how many voxel steps awaythe edge is.

A voxel is defined to be on the edge of an area of visibility if any of itsneighbor voxels are not visible from the scene camera, which means that it iseither occluded or outside the scene camera view. After the edge voxels havebeen identified, they are given the weight 1 and put in a list. For each voxel inthis list, its unprocessed, non-occluded neighbors are given the weight 2 andput in a new list. This new list is processed in the same manner as the last,setting the weight of every unprocessed and non-occluded neighbor to 3, andputting them in a new list. This continues until a set max weight is reached,whereupon the rest of the unprocessed and non-occluded voxels are given themax weight. This ensures that the area of visibility has a uniform weightvalue in the center region and fading weights in the edge region. Figure 3.4illustrates an area of visibility filled with weight values.

Figure 3.4: Weights of a flat voxel grid, with edge voxels marked in gray. Allvoxels inside the edges have been marked with their respective distances to theoutside of the area.

However, there are some limitations with this method. Since the weightsare assigned based purely on integer distance across the geometry to thenearest area edge, the distance of the camera to the geometry is not takeninto consideration. It could be done by modifying the weights with some

12

function of the distance between the voxel in question and the camera, forexample the inverse distance, but that might, for a very close camera, changethe weight values so much that the requirement of edges with weights close to0 is broken. The result is that if the weights are not (relatively) close to 0 inthe edges of some regions, they might not blend seamlessly with overlappingareas, producing visual artifacts.

Another issue is that this method has a voxel-centric view of the problem,and that might cause artifacts in the finished result. For example, one voxelmay correspond to more than one pixel in the camera image. This, combinedwith the fact that only one color per voxel is stored, risk making some partsof the scene (specifically, the part closest to the camera in question) under-sampled. Another problem comes from the fact that we determine occlusionfor some camera by shooting rays from the center of the voxel towards thecamera. While this might intuitively not seem to be a problem, there existsituations in which the center of a voxel is occluded but not some other part(see Figure 3.5). Even if we were to shoot rays from all corners of the voxel,as well as its center, there might exist situations in which the center and allcorners of a voxel are occluded, but not some other part. The result of thisis that some voxels will be falsely flagged as completely occluded, making fora possibly inadequate solution.

Figure 3.5: The voxel v1 is being identified as occluded by v2 despite not beingso, due to inadequate occlusion testing.

13

3.2.2 Dynamically identifying areas of visibility in view-camera space

This method approaches the problem from the direction of the view camera,which is the camera from which the scene is viewed by the user of the system.The method dynamically establishes areas of visibility only for the pointscurrently being rendered.

The scene camera weights for each pixel of the view are calculated byshooting rays through pixels in the vicinity of that pixel. The vicinity of apixel is defined as an area surrounding it with some margin, defined in numberof pixels, as seen in Figure 3.6. For each of the rays that hit anything, thehit position is checked for occlusion against all scene cameras that can seethe original pixel. The weights are then calculated by summing the numberof hit positions visible from each camera.

Figure 3.6: Examples of square vicinities of pixels p1 and p2. These vicinitiesare 7 and 3 pixels wide, respectively.

This method also ensures that the areas of visibility have a uniformlyweighted center, as the max weight assigned is the area of the vicinity, whichis a set size throughout the entire process. Figure 3.7 illustrates the generalidea of this solution.

An advantage of this method is that since colors are not stored in thevoxels, but rather retrieved on a per pixel basis, a voxel can have any numberof colors, preventing undersampling of the scene cameras.

One big disadvantage of this method is that the weight calculation cannot

14

Figure 3.7: The point p1 has full weight from B and little more than halfweight from A. p2, on the other hand, has full weight from A and no weightfrom B, since p2 is not visible by B.

be precomputed since it depends on the location and rotation of the view.This makes this solution fairly computationally heavy in real-time.

Another drawback is that if the pixel is near the edge of an object, somepart of the vicinity of the pixel might correspond to a piece of backgroundgeometry. If it does, the visibility of the camera in the background may beused in the weight calculation of the pixel, as illustrated in Figure 3.8. Thismay cause some areas to have more weight for a certain camera than theyshould, and would affect the final result.

This algorithm does not provide a complete solution to the given problem,as the weight of a point close to the edge of an area of visibility will not beclose to zero, as is desirable. This will reduce the effectiveness of the method.A slightly modified version of the method can be used to satisfy the edgeweight condition: The weight of each camera visible at the given point isset to be the closest pixel distance to a point that is not within one of thatcamera’s areas of visibility. This will ensure that the points close to an areaof visibility edge is given low weight for that camera.

Given the modified weight assigning algorithm, the method does still notcompletely solve the problem. The center of an area of visibility could be atthe edge as seen from the view, giving it low weights where it should havehigh. The reverse is also possible, and will cause sampling mismatches tostill be visible in some cases.

15

Figure 3.8: The point p should only have a small weight for B, but since thebackground of the object is also seen by B, it gets taken into consideration,giving B a larger weight than it should have.

3.2.3 Identifying areas of visibility in scene cameraspace

This method combines a screen space approach with the ability to precom-pute weights, so that instead of calculating the weights every time they areneeded, they are precomputed and stored on disk. The areas of visibility arecalculated in scene camera space, and the weight of a pixel is representedby the distance to the nearest local discontinuity in geometry. The smallestweight this method will assign a pixel is 1, as that is the smallest distancefrom one pixel to another.

The precomputation is done by first shooting one ray per image pixelfor each scene camera and store the resulting collision distances in a depthmap. These distances are then used to, for each image pixel of the scenecamera, calculate the distance to the nearest local discontinuity. A localdiscontinuity is defined as when the recorded depth of two neighboring pixelsof the scene camera image differ by more than a certain threshold value(See Figure 3.9). The choice of threshold value is important, as a too lowvalue could make the solution interpret surfaces that face the camera asdiscontinuous with themselves, giving too low weights. The discontinuityis found by traversing the depth map around the hit pixel in an outwardfashion until a local discontinuity is found. This distance is then stored asthe weight for that pixel in that camera. It is worth noting at this point that

16

this algorithm is not optimized for performance in any way, and is thus notsuitable for real-world usage.

Figure 3.9: A local discontinuity is detected when the difference between twoneighboring distances is more than a certain threshold.

When a ray is shot from the view camera into the scene to calculate thecolor for the pixel corresponding to the ray, the colors and the stored weightsfor the hit position is looked up for each camera. The final color is thencalculated using a weighted average of the colors from the cameras.

An advantage of this method is that weights for a piece of geometry faraway will be lower than weights for an identical piece of geometry that iscloser (see Figure 3.10), making distance an integral part of weight calcu-lation. This is advantageous because as the camera has a lower effectiveresolution on an object that is far away, we want the camera to have lowerweights on that object.

The outward search for a local discontinuity for the calculation of eachweight could become very time-consuming, as the search area is larger thanthe square of the search distance. Therefore, a max distance value is intro-duced, where the search will stop as soon as it has reached a certain distancefrom the pixel of origin. If this max value is reached, the weight of the pixelis set to the max distance. This will result in weight plateaus in the middleof areas of visibility that are large enough.

The fact that the weights actually are the pixel distance to the closestdiscontinuity might give unexpected results if the scene camera images are ofdifferent resolutions. Some sort of normalization is likely necessary in order

17

Figure 3.10: Despite being of equal size, objects A and B have differentweights in the center, as A appears much bigger in the scene camera space.

to achieve even results in such cases.If the geometry resolution is higher than the scene camera image resolu-

tion, there might be cases where discontinuities in the geometry are missedbecause the resolution of the depth map is too low to discover them. Thiscould conceivably be a problem. The resolution of the depth map couldof course be increased to avoid this, but then it would also take longer tocalculate.

18

Chapter 4

Results

Given a scene consisting of voxel geometry and a number of camera positionswith associated video streams, we want to determine the color of a givenposition on the geometry. The naive method does this by checking whichcameras are in view of the given position, and averaging the color valuesfrom those cameras at that position, as described in Equation 3.2. Thismight create a number of different artifacts, and our method is designed toeliminate these. Our method uses a weighted averaging, and seeks to assigndifferent weights to the different areas of the camera projections, based onhow good of a view the camera has of the corresponding area in the geometry.

The method was tested by implementing both the naive method and thealgorithm described in section 3.2.3, and then comparing the visual qualityof a number of views rendered using them both. Our implemented method isan algorithm that assigns weights in camera space of the scene cameras. Theweight of a pixel in the scene camera image is based on the pixel distance tothe nearest local discontinuity in the geometry. Max distance is, in the con-text of our method, the maximum distance at which the local discontinuitiesare looked for, and thus also the maximum weight given to a pixel in a scenecamera image. The implementation of our method uses a max distance valueof 32 throughout most of this chapter (specifically, sections 4.1.1, 4.1.2, 4.1.3,and 4.1.4). The choice of implemented method was made based on the lim-ited timeframe of the master thesis only giving adequate room to implementa single method, combined with the desirable properties possessed by thechosen method. The implementations were made in C# using the OpenTKtoolkit [27] for rendering and vector math.

A number of views (virtual camera positions and orientations) were se-lected for comparison by identifying parts that displayed prominent occlu-sion seams and/or other graphical artifacts when rendered using the naivemethod. The graphical artifacts produced by the naive method include oc-

19

clusion seams, sampling mismatches, and projection displacement. Occlusionseams occur on edges along the occlusion shadow of an object (see section3.2). When cameras have different color values for the same position, thetransition from an area both cameras can see to an area that only one cam-era can see will be a transition from two different colors averaged to just oneof the colors. Sampling mismatches occur because the voxelized scene ge-ometry does not always match exactly with the original scene geometry (seeFigure 3.1a). If a voxel extends outside the surface of the original object, aray that would pass by the object in the original scene might hit the voxel,creating a mismatch between the actual and believed hit positions of the ray.This mismatch may cause the color of the background of the object to beprojected onto the object itself. Projection displacement also occurs whenthe voxelized geometry does not closely correspond to the original geometry(see Figure 3.1b). A ray shot toward an object could hit the voxelized surfacesomewhat earlier than it would hit the surface in the original geometry. Ifthe ray is shot at an angle, this might cause the projection on that surfaceto be displaced in the direction toward the camera.

4.1 Test results

Four camera views exhibiting occlusion seams and other graphical artifactswill be presented, and figures illustrating the difference between the tworendering methods will be shown. Figure 4.1 shows, for reference, the firstframe of each scene camera’s video stream. These four frames comprise theentire image input for the algorithm.

20

(a) Camera 1 (b) Camera 2

(c) Camera 3 (d) Camera 4

Figure 4.1: The views of each camera in the scene.

21

4.1.1 View 1: Wall

The Wall view corresponds to a geometrically simple part of the scene de-picting a flat wall. It features a number of seams and serves a good exampleof the chosen method. Figure 4.4a shows the view rendered using the naivemethod.

Each camera contributes equally to the result in Figure 4.4a, and thecontribution of each camera can be seen in Figure 4.2. Cameras 3 and 4 seethe wall at an angle, and so have effectively lower resolution than the othertwo, making the colors in their contributions seem smeared out.

(a) Camera 1 (b) Camera 2 (c) Camera 3 (d) Camera 4

Figure 4.2: Individual camera contributions of the Wall view rendered usingthe naive method. Black corresponds to no contribution from that camera inthat view.

The weights of the contributing cameras, for use in our method, are foundin Figure 4.3. For this view, Camera 1 is contributing greatly to the finalresult; Camera 2 is also contributing, but not in as large area as Camera 1,and cameras 3 and 4 are almost not contributing at all.


Figure 4.3: Individual camera weight maps of the Wall view in our method.Red corresponds to a weight of 1, yellow corresponds to a weight of 32.

The complete rendering with our method can be seen in figure 4.4b. Fig-ure 4.4 shows the comparison of the Wall view rendered with the naive

22

method and our method, respectively. As is evident, our method produces aresult without visible seams, without adding additional graphical artifacts.

(a) Wall view rendered using thenaive method.

(b) Wall view rendered using ourmethod.

Figure 4.4: Comparison of the naive rendering and thesis rendering for theWall view

23

4.1.2 View 2: Woman

The Woman view depicts the woman in the scene. Figure 4.7a shows theview rendered using the naive method. Using the naive method, this viewdoes not display a lot of seams, but is fraught with other visual artifacts.These artifacts are created by sampling mismatches and the fact that noneof the cameras in the scene can see the woman in her entirety, so there arefew overlapping areas of visibility, especially on her left side.

Figure 4.5 breaks down the individual contributions to the result usingthe naive method shown in Figure 4.7a. From this figure, we can see thatCamera 3 and Camera 1 are responsible for a lot of the artifacts on thewoman’s left side because of sampling mismatches. We can see that none ofthe cameras has an adequate view of the left side of the woman at all, makingall parts of the her facing in that direction suffer from graphical artifacts.


Figure 4.5: Individual camera contributions of the Woman view rendered us-ing the naive method. Black corresponds to no contribution from that camerain that view.

Using our method for weight maps (seen in Figure 4.6) gives good weightsof cameras 1 and 2 for the upper body and front of the legs of the woman.All cameras have very low weights along her left side, meaning that none ofthem will contribute significantly more than any other camera.

24


Figure 4.6: Individual camera weight maps of the Woman view in our method.Red corresponds to a weight of 1, yellow corresponds to a weight of 32.

The complete rendering with our method can be seen in figure 4.7b. Fig-ure 4.7 shows the comparison of the Woman view rendered with the naivemethod and our method, respectively. Our method alleviates some of theproblems seen when using the naive method, but retains some problems aswell. The big artifacts along the woman’s left leg and torso are slightlysmaller, and the woman’s torso is also clearer.

(a) Woman view rendered using thenaive method.

(b) Woman view rendered using ourmethod.

Figure 4.7: Comparison of the naive rendering and thesis rendering for theWoman view

25

4.1.3 View 3: Roof

The Roof view is focused on a piece of wall and roof just above the woman inthe scene. The Roof view displays, when rendered using the naive method,two large seams crossing each other. One of the seams arises from occlusionof another object, whereas the other seam is the consequence of the field ofview of a camera ending. Figure 4.10a shows the view rendered using thenaive method.

As can be seen in Figure 4.8, the contributions to the naive method result(Figure 4.10a) are done almost exclusively by only two of the four cameras.Camera 3 has a view of the geometry from the ground, whereas Camera 4has a closer view, from a higher position. Each camera contributes equallyto the result in Figure 4.10a.


Figure 4.8: Individual camera contributions of the Roof view rendered usingthe naive method. Black corresponds to no contribution from that camera inthat view.

The weights of the contributing cameras, for use in our method, are foundin Figure 4.9. Here, Camera 4 is awarded higher weights than Camera 3,because of its proximity to the geometry.


Figure 4.9: Individual camera weight maps of the Roof view in our method.Red corresponds to a weight of 1, yellow corresponds to a weight of 32.

26

The complete rendering with our method can be seen in figure 4.10b.Figure 4.10 shows the comparison of the Roof view rendered with the naivemethod and our method, respectively. As can be seen, Camera 4 has alarge influence on the final result. Our method produces a result that haseliminated the two prominent seams.

(a) Roof view rendered using thenaive method.

(b) Roof view rendered using ourmethod.

Figure 4.10: Comparison of the naive rendering and thesis rendering for theRoof view

27

4.1.4 View 4: Bucket

The Bucket view shows a bucket hanging in a piece of rope in front of abuilding facade. A naive rendering (see Figure 4.13a) of this view displays ahigh amount of sampling mismatches along the edges of the bucket, as wellas some blurriness caused by projection displacement on the facade.

Each camera contributes equally to the result in Figure 4.13a, and thecontribution of each camera can be seen in Figure 4.11. We can see thatcameras 1 and 4 produce significant amounts of sampling mismatches; froma wall in the case of Camera 1, and from a tile roof in the case of Camera 4.The source of the projection displacement is mainly Camera 2, which viewsthe facade at an angle.


Figure 4.11: Individual camera contributions of the Bucket view rendered us-ing the naive method. Black corresponds to no contribution from that camerain that view. We can observe sampling mismatches on the left side of thebucket in Camera 1, as well as on the right side of the bucket in Camera 4.

The weights of the contributing cameras, for use in our method, are foundin Figure 4.12. The problematic areas of sampling mismatches in cameras 1and 4 are assigned very low weight values. Likewise, the areas subjected toprojection displacement of the facade are assigned low weights compared toCamera 4, which has the most straight-on view.

28


Figure 4.12: Individual camera weight maps of the Bucket view in ourmethod. Red corresponds to a weight of 1, yellow corresponds to a weightof 32.

The complete rendering with our method can be seen in figure 4.13b.Figure 4.13 shows the comparison of the Bucket view rendered with the naivemethod and our method, respectively. In the rendering using our method,the sampling mismatches are almost not visible at all, and the blurrinesscaused by projection displacement has nearly been eliminated. The textureon the handle of the bucket is also more distinguishable, as opposed to thenaive rendering; the handle itself still looks bad because the voxels of thescene geometry are larger than the handle in the original geometry, in effectundersampling it.

(a) Bucket view rendered using thenaive method.

(b) Bucket view rendered using ourmethod.

Figure 4.13: Comparison of the naive rendering and thesis rendering for theBucket view

29

4.1.5 Max distance size comparison

The chosen max distance affects the end result of our method. A smaller maxdistance confers a narrower gradient between the edge of an area of visibilityand the part of the area that is considered in full view.

Figure 4.14 shows the Wall view rendered using our method with a maxdistance of 1, 2, 4, 8, 12, 16, 24, 32, 128, and ∞, respectively. With a maxdistance of 1, all areas of visibility will have a uniform weight of 1, and thus,our method will become identical to the naive method. The difference frommax distance 1 to 32 is noticeable, but higher values than 32 contribute littleto the end result, and from 64 and upwards, the difference is almost zero.

(a) 1 (b) 2 (c) 4 (d) 8

(e) 12 (f) 16 (g) 24 (h) 32

(i) 64 (j) 128 (k) ∞

Figure 4.14: Comparison of different values of max distance for the Wallview

Another example of varying max distance is found in Figure 4.15, showingthe Bucket view with the same max distance values as in Figure 4.14. The

30

sampling mismatch artifacts on the bucket itself does not vanish more aftermax distance 16, and the only thing changing at higher values than thatis the blurriness caused by projection displacement, which sees no furtherimprovement above max distance 64.

(a) 1 (b) 2 (c) 4 (d) 8

(e) 12 (f) 16 (g) 24 (h) 32

(i) 64 (j) 128 (k) ∞

Figure 4.15: Comparison of different values of max distance for the Bucketview

4.2 Performance

Apart from precomputing time, of which the naive method does not requireany, the rendering time is slightly longer with our method. We can assign rel-ative speed values for the methods by comparing the render times. Table 4.1details the total rendering times of six different views (four of which comprisethe views presented in Section 4.1) for the naive method and our method,respectively. Each view was rendered 50 times to provide some statistical

31

soundness. The rendering was performed on a Windows 7 desktop computerwith an Intel i7-4770K CPU, 16 GB RAM, and an NVIDIA GeForce GTX770 GPU. The implementation itself is a CPU-only raycaster with very fewoptimizations. As Table 4.1 shows, our method has a relative speed value atlower resolutions around 0.8, indicating that it runs at 80% of the speed of thenaive method, but the difference becomes smaller as the resolution increases.The fact that our method is performing very slightly better than the naivemethod at higher resolutions can be attributed to runtime uncertainties.

Rendering size 1282 2562 5122 7682

Rendering time (Naive) 00:07:33 00:34:07 02:54:29 06:16:14Rendering time (Our) 00:09:24 00:39:55 02:48:20 06:09:02Relative speed (Our) 0.803 0.854 1.037 1.02

Table 4.1: Rendering times and relative speeds for the naive method andthesis method, respectively.

The preprocessing time varies depending on many variables, such as thescene, number of cameras, and max distance value. Table 4.2 shows howthe preprocessing time correlates to varying values of max distance. Ascan be seen, a low max distance value confers faster preprocessing. This isbecause the algorithm can quit early if no discontinuities are found withinthe max distance. Since the preprocessing algorithm is not optimized forspeed, the preprocessing times could possibly be improved significantly withoptimizations and GPU acceleration.

Max Distance Running Time (hh:mm:ss)1 00:02:222 00:02:274 00:02:488 00:04:0612 00:06:0016 00:08:5224 00:14:2032 00:20:2164 00:44:25128 01:18:18∞ 01:41:07

Table 4.2: Preprocessing times for a single frame using different values ofmax distance.

32

Chapter 5

Discussion

While all tested views show significant improvement with our implementedmethod over the naive method, some show less improvement than others. Inparticular, the Woman view does not display the same improvement as theother ones, because of the fact that the geometry of the woman is relativelythin, combined with the unfavorable positions of the cameras in the scene.This shows that we might benefit from cameras dedicated to the actors inthe scene, as people are more prone to catch visual errors in other humansthan in walls and trees.

Renderings made using our implemented method looks much better thanwhen using the naive method, with only minimal performance loss. Onedownside to our method is that the weights need to be saved alongside thevideo streams, increasing the disk size of the processed FVV sequence. Theweights could be encoded as single-channel video streams to reduce size,but the aesthetical consequences of that remains to be tested. If a fasteralgorithm is developed, it would also be possible to calculate the weights inreal-time, alleviating the need for saved weights altogether.

Our implemented method might need to be tweaked somewhat when usedwith other scenes and setups: Scene size, number of cameras, voxel resolution,and camera resolution all change the conditions under which the algorithmoperates. One variable that needs to be changed according to the situation isthe discontinuity threshold, which is the lowest value by which the distanceof two neighboring pixels must differ for them to be considered discontinuous.With changes in scene size and voxel resolution, the discontinuity thresholdwill probably have to be tweaked to obtain a good compromise between notregistering separate objects as the same, despite them being very close toeach other, and recognizing an angled wall to be a single object. Anothervariable that will need tweaking is the max distance, which determines themaximum weight a pixel can be assigned to have to the nearest discontinuity.

33

The max distance will have to be selected in such a way as to balance visualquality and preprocessing time. A good lowest value for the max distancefor this scene can be estimated by comparing the different figures in section4.1.5; 32 seems to be the value at which most of the seams and other artifactsvanish enough to be indistinguishable. At 64, almost no differences with maxdistance ∞ can be seen. Between 32 and 64 seems to be a value for maxdistance; maintaining high quality while still keeping the processing timesrelatively short, as seen in table 4.2.

Given more time, we would have evaluated some methods for findingapproximate surface normals from the voxel geometry. This would havemade it easier to incorporate some methods from the previous work into thisthesis, such as calculating weights based on how the scene camera’s viewdirection compares to surface normals in the scene.

34

Chapter 6

Conclusion

The main question in this thesis was how to find color values for the surfacesof a voxel geometry in such a way that minimizes seams and other graphicalartifacts.

Given the constraints of a scene of pure voxel data, without any informa-tion of surface normals, and four cameras with video streams, we proposedthree slightly different methods. Each method builds on the common con-cept of blending color information from different cameras based on specificweights. The weights are stored in a weight map, which is a broad term forweights saved in a way that corresponds to the geometry, the current viewbeing rendered, or the views of the cameras in the scene. The weights aregenerated in such a way that influence from a camera fades out near the edgeof the areas of visibility (geometrically continuous areas completely in viewof a certain camera) that camera has on the geometry.

The result is a method that succeeds in minimizing graphical artifactswithin the possibilities of the given scene; it cannot create novel data wherethere was none.

35

Chapter 7

Future Work

An obvious contender for future improvement is optimization. There is a lotto be done in this regard, as almost none of it has been done in this thesis.Speeding up intersection tests, raycasting on the GPU, and using more effi-cient data structures, to name a few. A more efficient and/or approximativeweight generation algorithm, for example, could help the method describedin section 3.2.3 get rid of its precomputation needs, calculating all weightsin real-time.

One interesting future approach is to evaluate the possibilities of imple-menting view-dependent surfaces in the context of this thesis. This wouldaid the realism of the rendering.

In the current suggested methods, all cameras that can be used for con-tributing to a point does so, even if it is a minuscule amount. Investigatingthe possibility of only having a certain number of cameras provide color infor-mation to a surface point could prove beneficial, as some areas are assignedvery similar weights for all cameras because of their small size. The weightedaveraging on these areas will then take bad cameras into consideration morethan they would have if the surface area was larger.

Testing the suggested method’s temporal consistency is a necessary steptoward using it in real FVV applications.

36

Bibliography

[1] Peter Rander et al. “Virtualized Reality: Constructing Time-varyingVirtual Worlds from Real World Events”. In: Proceedings of the 8thConference on Visualization ’97. VIS ’97. Los Alamitos, CA, USA:IEEE Computer Society Press, 1997, 277–ff. isbn: 978-1-58113-011-9.url: http://dl.acm.org/citation.cfm?id=266989.267081.

[2] blender.org - Home of the Blender project - Free and Open 3D CreationSoftware. blender.org. url: https://www.blender.org (visited on12/06/2015).

[3] Patrick H. Kelly et al. “An Architecture for Multiple Perspective Inter-active Video”. In: Proceedings of the Third ACM International Confer-ence on Multimedia. MULTIMEDIA ’95. New York, NY, USA: ACM,1995, pp. 201–212. isbn: 978-0-89791-751-3. doi: 10.1145/217279.215267. url: http://doi.acm.org/10.1145/217279.215267.

[4] Toshiharu Horiuchi et al. “Interactive Music Video Application forSmartphones Based on Free-viewpoint Video and Audio Rendering”.In: Proceedings of the 20th ACM International Conference on Mul-timedia. MM ’12. New York, NY, USA: ACM, 2012, pp. 1293–1294.isbn: 978-1-4503-1089-5. doi: 10.1145/2393347.2396449. url: http://doi.acm.org/10.1145/2393347.2396449.

[5] Takayoshi Koyama et al. “Live 3D Video in Soccer Stadium”. In: ACMSIGGRAPH 2003 Sketches & Applications. SIGGRAPH ’03. New York,NY, USA: ACM, 2003, pp. 1–1. doi: 10.1145/965400.965463. url:http://doi.acm.org/10.1145/965400.965463.

[6] Joel Carranza et al. “Free-viewpoint Video of Human Actors”. In:ACM SIGGRAPH 2003 Papers. SIGGRAPH ’03. New York, NY, USA:ACM, 2003, pp. 569–577. isbn: 1-58113-709-5. doi: 10.1145/1201775.882309. url: http://doi.acm.org/10.1145/1201775.882309.

37

[7] Stefan Hauswiesner et al. “Free Viewpoint Virtual Try-on with Com-modity Depth Cameras”. In: Proceedings of the 10th International Con-ference on Virtual Reality Continuum and Its Applications in Industry.VRCAI ’11. New York, NY, USA: ACM, 2011, pp. 23–30. isbn: 978-1-4503-1060-4. doi: 10.1145/2087756.2087759. url: http://doi.acm.org/10.1145/2087756.2087759.

[8] Saied Moezzi et al. “Virtual View Generation for 3D Digital Video”.In: IEEE MultiMedia 4.1 (Jan. 1997), pp. 18–26. issn: 1070-986X. doi:10.1109/93.580392. url: http://dx.doi.org/10.1109/93.580392.

[9] Chris Buehler et al. “Unstructured Lumigraph Rendering”. In: Proceed-ings of the 28th Annual Conference on Computer Graphics and Inter-active Techniques. SIGGRAPH ’01. New York, NY, USA: ACM, 2001,pp. 425–432. isbn: 1-58113-374-X. doi: 10.1145/383259.383309. url:http://doi.acm.org/10.1145/383259.383309.

[10] Alvaro Collet et al. “High-quality Streamable Free-viewpoint Video”.In: ACM Trans. Graph. 34.4 (July 2015), 69:1–69:13. issn: 0730-0301.doi: 10 . 1145 / 2766945. url: http : / / doi . acm . org / 10 . 1145 /

2766945.

[11] Jordi Salvador et al. “From Silhouettes to 3D Points to Mesh: TowardsFree Viewpoint Video”. In: Proceedings of the 1st International Work-shop on 3D Video Processing. 3DVP ’10. New York, NY, USA: ACM,2010, pp. 19–24. isbn: 978-1-4503-0159-6. doi: 10.1145/1877791.

1877797. url: http://doi.acm.org/10.1145/1877791.1877797.

[12] Akio Ishikawa et al. “Free Viewpoint Video Generation for Walk-throughExperience Using Image-based Rendering”. In: Proceedings of the 16thACM International Conference on Multimedia. MM ’08. New York,NY, USA: ACM, 2008, pp. 1007–1008. isbn: 978-1-60558-303-7. doi:10.1145/1459359.1459553. url: http://doi.acm.org/10.1145/1459359.1459553.

[13] Cesar Palomo and Marcelo Gattass. “An Efficient Algorithm for DepthImage Rendering”. In: Proceedings of the 9th ACM SIGGRAPH Con-ference on Virtual-Reality Continuum and Its Applications in Industry.VRCAI ’10. New York, NY, USA: ACM, 2010, pp. 271–276. isbn: 978-1-4503-0459-7. doi: 10.1145/1900179.1900236. url: http://doi.acm.org/10.1145/1900179.1900236.

[14] Peiming Wang et al. “View Synthesis Framework for Real-time Naviga-tion of Free Viewpoint Videos”. In: Proceedings of International Con-ference on Internet Multimedia Computing and Service. ICIMCS ’14.

38

New York, NY, USA: ACM, 2014, 427:427–427:430. isbn: 978-1-4503-2810-4. doi: 10.1145/2632856.2632943. url: http://doi.acm.org/10.1145/2632856.2632943.

[15] Junya Kashiwakuma et al. “A Virtual Camera Controlling MethodUsing Multi-touch Gestures for Capturing Free-viewpoint Video”. In:Proceedings of the 11th European Conference on Interactive TV andVideo. EuroITV ’13. New York, NY, USA: ACM, 2013, pp. 67–74.isbn: 978-1-4503-1951-5. doi: 10.1145/2465958.2465961. url: http://doi.acm.org/10.1145/2465958.2465961.

[16] Ahmed Hamza and Mohamed Hefeeda. “A DASH-based Free View-point Video Streaming System”. In: Proceedings of Network and Op-erating System Support on Digital Audio and Video Workshop. NOSS-DAV ’14. New York, NY, USA: ACM, 2014, 55:55–55:60. isbn: 978-1-4503-2706-0. doi: 10.1145/2578260.2578276. url: http://doi.acm.org/10.1145/2578260.2578276.

[17] Dan Miao et al. “Resource Allocation for Cloud-based Free ViewpointVideo Rendering for Mobile Phones”. In: Proceedings of the 19th ACMInternational Conference on Multimedia. MM ’11. New York, NY, USA:ACM, 2011, pp. 1237–1240. isbn: 978-1-4503-0616-4. doi: 10.1145/2072298.2071983. url: http://doi.acm.org/10.1145/2072298.2071983.

[18] Viktor Kampe et al. “Exploiting Coherence in Time-varying VoxelData”. In: Proceedings of the 20th ACM SIGGRAPH Symposium on In-teractive 3D Graphics and Games. I3D ’16. New York, NY, USA: ACM,2016, pp. 15–21. isbn: 978-1-4503-4043-4. doi: 10.1145/2856400.


[19] James Imber et al. “Free-viewpoint Video Rendering for Mobile De-vices”. In: Proceedings of the 6th International Conference on Com-puter Vision / Computer Graphics Collaboration Techniques and Ap-plications. MIRAGE ’13. New York, NY, USA: ACM, 2013, 11:1–11:8.isbn: 978-1-4503-2023-8. doi: 10.1145/2466715.2466726. url: http://doi.acm.org/10.1145/2466715.2466726.

[20] Jonathan Starck and Adrian Hilton. “Model-Based Multiple View Re-construction of People”. In: Proceedings of the Ninth IEEE Interna-tional Conference on Computer Vision - Volume 2. ICCV ’03. Wash-ington, DC, USA: IEEE Computer Society, 2003, pp. 915–. isbn: 978-0-7695-1950-0. url: http://dl.acm.org/citation.cfm?id=946247.946683.

39

[21] Paul E. Debevec et al. “Modeling and Rendering Architecture fromPhotographs: A Hybrid Geometry- and Image-based Approach”. In:Proceedings of the 23rd Annual Conference on Computer Graphics andInteractive Techniques. SIGGRAPH ’96. New York, NY, USA: ACM,1996, pp. 11–20. isbn: 978-0-89791-746-9. doi: 10 . 1145 / 237170 .


[22] Jonathan Cohen et al. “Appearance-preserving Simplification”. In: Pro-ceedings of the 25th Annual Conference on Computer Graphics andInteractive Techniques. SIGGRAPH ’98. New York, NY, USA: ACM,1998, pp. 115–122. isbn: 978-0-89791-999-9. doi: 10.1145/280814.280832. url: http://doi.acm.org/10.1145/280814.280832.

[23] Marco Volino and Adrian Hilton. “Layered View-dependent TextureMaps”. In: Proceedings of the 10th European Conference on Visual Me-dia Production. CVMP ’13. New York, NY, USA: ACM, 2013, 16:1–16:8. isbn: 978-1-4503-2589-9. doi: 10.1145/2534008.2534022. url:http://doi.acm.org/10.1145/2534008.2534022.

[24] V. Lempitsky and D. Ivanov. “Seamless Mosaicing of Image-Based Tex-ture Maps”. In: IEEE Conference on Computer Vision and PatternRecognition, 2007. CVPR ’07. IEEE Conference on Computer Visionand Pattern Recognition, 2007. CVPR ’07. June 2007, pp. 1–6. doi:10.1109/CVPR.2007.383078.

[25] Viktor Kampe et al. “High Resolution Sparse Voxel DAGs”. In: ACMTrans. Graph. 32.4 (July 2013), 101:1–101:13. issn: 0730-0301. doi:10.1145/2461912.2462024. url: http://doi.acm.org/10.1145/2461912.2462024.

[26] Anat Levin et al. “Seamless Image Stitching in the Gradient Domain”.In: Computer Vision - ECCV 2004. Ed. by Tomas Pajdla and JirıMatas. Lecture Notes in Computer Science 3024. DOI: 10.1007/978-3-540-24673-2 31. Springer Berlin Heidelberg, May 11, 2004, pp. 377–389. isbn: 978-3-540-21981-1 978-3-540-24673-2. url: http://link.springer.com/chapter/10.1007/978-3-540-24673-2_31.

[27] The Open Toolkit Library — OpenTK. url: http://www.opentk.com/(visited on 12/06/2015).

40

new seam elimination in free-viewpoint videouffe/xjobb/karl_bristav_thesis... · 2016. 8. 17. ·...

Documents