live vr streaming and its challenges - skku

23
1 © 2020 SKKU. All rights reserved. Live VR Streaming and Its Challenges 차세대 미디어 처리 및 전송기술 ICT Convergence Korea 2020 July. 2020 Eun-Seok Ryu (류은석 [email protected]) Jong-Beom Jeong, Inae Kim, Soonbin Lee, Sungbin Kim Multimedia Computing Systems Lab(MCSL) http://mcsl.skku.edu Department of Computer Education Sungkyunkwan University (SKKU)

Upload: others

Post on 08-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Live VR Streaming and Its Challenges - SKKU

1 © 2020 SKKU. All rights reserved.

Live VR Streaming and Its Challenges차세대 미디어 처리 및 전송기술ICT Convergence Korea 2020

July. 2020

Eun-Seok Ryu(류은석 [email protected])

Jong-Beom Jeong, Inae Kim, Soonbin Lee, Sungbin Kim

Multimedia Computing Systems Lab(MCSL)

http://mcsl.skku.edu

Department of Computer Education

Sungkyunkwan University (SKKU)

Page 2: Live VR Streaming and Its Challenges - SKKU

2 © 2020 SKKU. All rights reserved.

Immersive Media - 360 Virtual Reality

• 360° as part of “10 Breakthrough Technologies (MWC)”

• Inexpensive cameras that make spherical

images are changing the way people to

share stories (Mobile World Congress).

• Next-generation real-world media to

overcome space and time constraints and

provide viewers with a natural, 360-

degree, fully immersive experience.

VR Media Hologram3-Dimensional VideoStereo Video

* Source from Prof. Jewon Kang (Ewha Womans University)

Page 3: Live VR Streaming and Its Challenges - SKKU

3 © 2020 SKKU. All rights reserved.

Virtual Reality in Market

• The augmented and virtual reality

(AR/VR) market amounted to a forecast of

29.5 billion U.S. dollars in 2020 and is

expected to expand drastically in the

coming years.

• Virtual reality technology is being applied

not only in games but also in various

industries, including medical services,

school education, military training,

vocational training and broadcasting of

large-scale concerts.

Military training Vocational trainingSchool educationMedical service

Page 4: Live VR Streaming and Its Challenges - SKKU

4 © 2020 SKKU. All rights reserved.

360 Video Processing (1/2)

Image stitching,Equirectangular

mapping

Video rendering on a sphere

Video decoding

Video encoding

Stitched video is coded as regular 2D video by using H.264/AVC and H.265/HEVC

Input from seven cameras

Corresponding stitched image

Stitched image texture mapped onto a sphere

Page 5: Live VR Streaming and Its Challenges - SKKU

5 © 2020 SKKU. All rights reserved.

Technologies of Virtual Reality

• 6 Parts of VR Technologies

• System / Acquisition / Pre&Post-processing / Coding / Streaming / Assessment

System Architecture for Audiovisual Media with 6 Degrees of FreedomSource: Mary-Luc Champel, Rob Koenen, Gauthier Lafruit, Madhukar Budagavi, “ Proposed Draft 1.0 of TR: Technical Report on Architectures for Immersive Media”, document n17685, 122nd MPEG

meeting of ISO/IEC JTC1/SC29/WG11, 2018.

System

Acquisition

Pre, post-processing

Coding

Streaming

Assessment

Ass

ess

me

nt

Page 6: Live VR Streaming and Its Challenges - SKKU

6 © 2020 SKKU. All rights reserved.

Challenges (High BW, Low Latency, Sickness)

• High Bandwidth Requirement of VR

• Requires 40 pix/deg, 12K resolution for High quality VR

• To avoid the sickness, 90 fps and 20 ms MTP are required

• Immersive video contains texture (color) and depth (geometry) (×2)

• Also, immersive video has high quality (nearly 4K) multiple views (×N)

> needs High BW (e.g. 5G, mmWave)

Requirements for high quality VR

Requirement details

Pixels/degree 40 pix/deg

Resolution 11520x6480 (12K)

Framerate 90 fps

Motion-to-photon-

latency20 ms

Source: Mary-Luc Champel, Thomas Stockhammer, Thierry Fautier, Emmanuel Thomas, and Rob Koenen. “Quality Requirements for VR”, document MPEG116/m39532, 116th MPEG

meeting of ISO/IEC JTC1/SC29/WG11.

Sequence Resolution No. of views Frame count

ClassroomVideo 4096x2048 15 120

TechnicolorMuseum 2048x2048 24 300

TechnicolorHijack 4096x4096 10 300

TechnicolorPainter 2048x1088 16 300

IntelKermit 1920x1080 13 300

Characteristics of immersive video

Page 7: Live VR Streaming and Its Challenges - SKKU

7 © 2020 SKKU. All rights reserved.

Immersive Media Standard Roadmap

Video with 6 DoF

Geometry Point Cloud Compression

OMAF v.2

2018 20202019 2021

Internet of Media Things

Descriptors for Video Analysis (CDVA)

6 DoF Audio

Video Point Cloud Compression

MediaCoding

2022

Immersive Media Scene Description Interface

Versatile Video Coding

2023

Systems and Tools

Web Resource Tracks

Dense Representation of Light Field Video

3DoF+ Video

Neural Network Compression for Multimedia

Essential Video Coding

Low Complexity Enhancement Video Coding

PCC Systems Support

Media Orchestration

Network-Based Media Processing

BeyondMedia

Genome Compression ExtensionsGenome Compression

Point Cloud Compression v.2

CMAF v.2Multi-Image Application Format

2024

Color Support in Open Font Format

Partial File Format

Immersive

Media

Standards

Now

VR 360

Page 8: Live VR Streaming and Its Challenges - SKKU

8 © 2020 SKKU. All rights reserved.

Immersive Media Standard Projects

New MPEG project: ISO/IEC 23090

Coded Representation of Immersive Media

8 parts:

1. Architectures for Immersive Media (Technical Report)

2. Omnidirectional Media AF

3. Versatile Video Coding

4. 6 Degrees of Freedom Audio

5. Video Point Cloud Coding (V-PCC)

6. Metrics for Immersive Services and Applications

7. Metadata for Immersive Services and Applications

8. Network-Based Media Processing

9. Geometry Point Cloud Coding (G-PCC)

Page 9: Live VR Streaming and Its Challenges - SKKU

9 © 2020 SKKU. All rights reserved.

Immersive Media Standard

• Step-by-step objective of ISO/IEC MPEG Immersive Video• MPEG-I is responsible for standardizing immersive media in MPEG and specifies

the goals of step 3.

• Goal of Revitalizing VR Commercial Service by 2020

• Goal of 6DoF media support by 2022 after completing 3DoF standard by 2017

Step 1 Step 2 Step 3

Yaw

Roll

Pitch

Yaw

Roll

Pitch

Right

Left

Forward

Backward

Up

Down

Up

YawForward

Roll

Right

PitchDown

Backward

Left

• Complete 3DoF standard by 2017

• Rotate head in a fixed state

• 360 video full streaming by default

• Tiled streaming if possible

• Enable VR commercial servicers by 2020

• Allow head rotation and movement

within a restricted area

• User-to-user conversations and projection optimization

• Support 6DoF by 2022

• 6DoF video will reflect user’s walking

motion

• Support interaction with virtual environments

Three phases of virtual reality defined by MPEG-I

Page 10: Live VR Streaming and Its Challenges - SKKU

10 © 2020 SKKU. All rights reserved.

360-Degree Video with 4K Tiles

• Equirectangular Projection (ERP) format and 4K Tile

*Picture source: KETI 김용환박사발표자료

Page 11: Live VR Streaming and Its Challenges - SKKU

11 © 2020 SKKU. All rights reserved.

Viewport-adaptive v.s. Viewport Independent

• Viewport Independent• Transmit whole picture with pre-processing

• Projection and packing

• Downsampling / adjusting QP

• Viewport-dependent (Adaptive)• Transmit viewport only

• Bitrate saving

• But, delay

• Bitrate savings over sending without pre-

processing

• Compression method focused on region of

interest

• Consider several cases to extract areas, that

shows poor encoding efficiency

• Experience greater visual quality with lower

bandwidth consumption*

Field of View (FOV)

Page 12: Live VR Streaming and Its Challenges - SKKU

12 © 2020 SKKU. All rights reserved.

MPEG-I 3DoF System Architecture

• MPEG-I defined general framework for 3DoF system (n17685)

• Project conversion and down-sampling are applied (e.g. ERP)

• Head/eye tracking information is required to render through HMD

Block diagram for 3DoF System

Acquisition EncodingStitching,

projection

conversion

(Optional)

Down-

sampling

File/segment

encapsulation

Delivery

Head/eye

tracking

File/segment

decapsulationDecodingRenderingDisplay

Metadata

Orientation/viewport

metadata

Metadata

(Optional)

Up-sampling

Orientation/viewport metadata

EAPERP ISP OHP SSPCMP

Page 13: Live VR Streaming and Its Challenges - SKKU

13 © 2020 SKKU. All rights reserved.

Virtual Reality Streaming Technologies

• 360-degree Video Streaming• RTSP/RTP

• MPEG-DASH

• Viewport Adaptive Streaming• Motion Constrained Tile Sets (MCTS)

Ensure

independence

between tiles

…𝑡0 𝑡1 𝑡2

Motion Constrained Tile Sets

Tile/

Slice

Tile/

Slice

Tile/

Slice

Tile/

Slice

Tile/

Slice

Tile/

Slice

Tile/

Slice

Tile/

Slice

Tile/

Slice

Viewport-adaptive decoding and

rendering

Viewport

Tiles

viewport

1080p

720p

360p

z

x

y

Page 14: Live VR Streaming and Its Challenges - SKKU

14 © 2020 SKKU. All rights reserved.

(Viewport-Adaptive) Tiled Streaming (Demo)

Original Bitstream (DrivingInCity / 3840×1920 / Uniform 3×3 9Tiles)

Extracted Bitstream

(DrivingInCity / 1280×640 (3840×1920) / 1Tile (Texture))

• Extract Encoded Bitstream

Extracted Bitstream Rendering

(DrivingInCity / 4Tile)

• Four Tile Rendering

Page 15: Live VR Streaming and Its Challenges - SKKU

15 © 2020 SKKU. All rights reserved.

3DoF+ System Architecture

• MPEG-I defined general framework for 3DoF+ S/W

• Applies codec-independent pre and post-processing structure

• Removes the correlation between input videos

• Small number of decoders

• Basic view / Additional view / Pruning / Packing

Block diagram for 3DoF+ S/W platform

View

Optimizer

360 cameras array

(T+D)

Pruner

Patch

Packer

HEVC

Encoder

HEVC

Decoder

HEVC

Encoder

HEVC

Decoder

HEVC

Decoder

Metadata Parser

Occupancy Map

Generator

Renderer

Viewport

Rendering

Metadata

Video Server Video Client

Metadata

Network

Pre-processing Post-processing

Basic views (BV)

atlases

Additional views (AV)

atlases AV atlases

BV atlases

Page 16: Live VR Streaming and Its Challenges - SKKU

16 © 2020 SKKU. All rights reserved.

Classroomvideo sequence (7680x2560)

Results of Stride Packing for 3DoF+ by MCSL

Page 17: Live VR Streaming and Its Challenges - SKKU

17 © 2020 SKKU. All rights reserved.

6DoF Standard: Point Cloud Coding

• Point cloud coding (PCC) compresses 3D points

• PCC consists of two parts:

video-based PCC (V-PCC), geometry-based PCC (G-PCC)

• V-PCC compresses point clouds using 2D video encoder

• Patches are generated from 3D point clouds, packed into 2D spaces

Source: Vladyslav Zakharchenko, “Algorithm description of mpeg-pcc-tmc2”, document MPEG2018/n17526, 122nd MPEG meeting of ISO/IEC JTC1/SC29/WG11, 2018.

Patch generation,

packing Video encoding

Example of patch generation, packing of point cloud contents

Page 18: Live VR Streaming and Its Challenges - SKKU

18 © 2020 SKKU. All rights reserved.

6DoF Standard: Point Cloud Coding (Cont’d)

• G-PCC uses a new codec which is suitable for 3D point clouds

• On committee draft (CD) stage

• In loseless compression, G-PCC has 20% gain compared to V-PCC

Overview of the G-PCC encoder (left) and decoder (right)

Source: Khaled Mammou, Philip A. Chou, David Flynn, Maja Krivokuća, Ohji Nakagami and Toshiyasu Sugio, “G-PCC codec description v1”, document n18015, 124th MPEG

meeting of ISO/IEC JTC1/SC29/WG11, 2018.

Page 19: Live VR Streaming and Its Challenges - SKKU

19 © 2020 SKKU. All rights reserved.

6DoF Standard: Plenoptic / Multiview coding

• Dense light field (DLF) was included into MPEG-I

• MPEG-I Visual defined common test conditions (CTC) for DLF

• Exploration experiments are on progress

• Conversion from lenslet to multiview is possible

• Arrangement with MPEG-I Visual reference SW will be done

(performance evaluation will be conducted)

Example of lenslet and multiview dense light field video

Colored lenslet

video dataMultiview video data

Resolution: 4088×3068Resolution of each view:

1147×830

Color: 24 bits, BMPViews: 5×5

Frame rate: 30 fps

Number of frames: 300 Number of frames: 300

……

Characteristics of lenslet and multiview

dense light field video

Page 20: Live VR Streaming and Its Challenges - SKKU

20 © 2020 SKKU. All rights reserved.

6DoF Standard: MPEG-I Visual

• MPEG-I requested call for proposals (CfP) on 3DoF+

• Philips, Intel & Technicolor, Nokia, PUT & ETRI, ZJU submitted responses

> test model for immersive video (TMIV)

• Enhancing TMIV for compressing 6DoF videos

• TMIV will be arranged with V-PCC

Block diagram of TMIV architecture

Viewport generated by TMIV

Page 21: Live VR Streaming and Its Challenges - SKKU

21 © 2020 SKKU. All rights reserved.

Class

(M/O)

Sequence name

ResolutionThumbnail Type

No.

of

views

Depth

range

Frame

count

Frame

rateBit depth

CG1-A

(M)

ClassroomVideo

4096x2048ERP 15

[0.8m,

inf]120 30fps

Texture: 10,

Geometry: 16

CG1-B

(M)

TechnicolorMuseum

2048x2048ERP 24

[0.5m,

25m]300 30fps

Texture: 10,

Geometry: 16

CG1-C

(M)

InterdigitalHijack

4096x4096ERP 10

[0.5m,

25m]300 30fps

Texture: 10,

Geometry: 16

CG1-N

(O)

NokiaChess

2048x2048ERP 10

[0.1m,

500m]300 30fps

Texture: 10,

Geometry: 16

CG2-J

(M)

OrangeKitchen

1920x1080Perspective

25

(5x5)[2.2, 7.2] 97 30fps

Texture: 10,

Geometry: 10

NC1-D

(M)

TechnicolorPainter

2048x1088Perspective

16

(4x4)- 300 30fps

Texture: 10,

Geometry: 16

NC1-E

(M)

IntelFrog

1920x1080Perspective

13

(13x1)[0.3, 1.62] 300 30fps

Texture: 10,

Geometry: 16

NC2-L

(M)

PoznanFencing

1920x1080Perspective 10 [3.5, 7.0] 250 25fps

Texture: 10,

Geometry: 16

MPEG-I immersive video test sequences

Immersive Video Test Sequences

Page 22: Live VR Streaming and Its Challenges - SKKU

22 © 2020 SKKU. All rights reserved.

Tiled Streaming with 6DoF 360 Videos by MCSL

• Developed viewport tile selector (VTS) on 6DoF

• Compatible with TMIV and HEVC (or any other codecs, e.g. VVC)

• User viewport tiles (HQ) + entire videos (LQ) simulcast streaming

> low delay and bandwidth adaptive streaming

• In ClassroomVideo test sequence, 19.40% gain was observed

Viewport tile selector based tiled streaming

Page 23: Live VR Streaming and Its Challenges - SKKU

23 © 2020 SKKU. All rights reserved.

Conclusion

• 360 video streaming for VR is emerging

• Requires high BW for reducing motion sickness

• >> Tile-based viewport dependent solution!

• MCTS (Motion-Constrained Tile Set)

• EIS (Extraction Information Sets)

•Updates VPS, SPS, and PPS considering selected viewport tiles

• Implemented solution saves BW significantly

• Contributed to JCT-VC HM with Fraunhofer HHI

• Collaborates with UCSB for intercontinental VR streaming at VR theater

(Allosphere)

• 2D texture and depth based 3DoF+/6DoF solution is implemented for demo

Thank You !Questions > [email protected]