[ieee communication technologies, research, innovation, and vision for the future (rivf) - ho chi...

4
Applying Virtual Reality for In-Door Jogging Trung-Nghia LE 1 , Quoc-Minh BUI 1 , Vinh-Tiep NGUYEN 1,2 , Minh-Triet TRAN 1,2 , Anh-Duc DUONG 1 1 University of Science, VNU-HCM 2 John von Neumann Institute, VNU-HCM Ho Chi Minh city, Vietnam Email: {0812332,0812303}@student.hcmus.edu.vn, {nvtiep,tmtriet,daduc}@fit.hcmus.edu.vn Abstract— We propose an “In-place Virtual Tour” system that generates the smart virtual environment adapting to a user’s activities when jogging. Our proposed system automatically detects the main region of frames captured from a regular camera, analyses the difference in foreground in the main region of consecutive frames to estimate the user’s intensity level, then renders virtual scene with appropriate speed. The system does not require any special devices as existing interactive fitness games. Experimental results demonstrate the effectiveness and the realtime manner of the proposed system. Our proposed method can also be applied to develop various exciting interactive games to stimulate excitement for users when doing in-door fitness exercises or to create virtual environment for museums where visitors just stay in place to explore scenes in museums. I. INTRODUCTION The idea of Virtual Reality (VR) is first proposed by Myron Krueger in 1970s and its concept is further defined by Jaron Lanier in 2002[1]. Within the last three decades, virtual reality has been applied in a lot of applications in various fields, especially entertainment[2]. Doing fitness exercises is an essential daily need to improve health and endurance. However, in modern society, people do not have many chances to do exercises regularly in natural environment. This motivates the development of multimedia and interactive systems to stimulate the excitement for people when doing indoor fitness exercises even at home, or fitness center, or workplace. Provided with various types of multimedia, including sound, images, and video, users can enjoy the experience of immersion in the simulated environment when doing fitness exercises. Special devices can provide extra useful data, such as depth or haptic data, to analyze user’s activities. However with only visual data captured from a regular webcam/camera, meaningful information can be extracted and analyzed to create useful and effective applications related to sports. In this paper, we propose an “In-place Virtual Tour” system that analyses a user’s motion when jogging to create in a realtime manner visual and audio media adapting to a user’s activities. A user can enjoy a virtual tour as a character exploring a virtual scene. The virtual character’s speed is updated in a realtime manner corresponding to the user’s current workout intensity level. The proposed system does not require any special equipment but only regular devices (a regular low cost webcam and a laptop/personal computer) to capture user’s motion, to display virtual reality images and to generate sound effects corresponding to user’s activity. To achieve the realtime property, we first exploit the observation that not all body parts have the same level of activities in exercises. Thus we propose to select only a region of interest, i.e. a fixed area in the frame that contains the most useful information for processing. In jogging/running, as user’s activities mostly target the lower body, we propose to focus on a user’s lower body (including knees, legs, ankles, and feet) to extract the user’s workout intensity level. Second, we also propose an effective but simple measure (RDoF) to express the level of difference in foreground of consecutive frames. The higher the value of RDoF is, the more activity in this region is. We study and find out the exponential relationship between RDoF and the number of steps in a time slot (e.g. second, minute). This paper is structured as follows. In section II, we briefly present and analyze the related work. The proposed system is presented in section III, including the analysis to automatically identify the region of interest in frames and the analysis to find the relationship between the difference in foreground in consecutive frames and the instant speed of the user in jogging/running. The experimental results are presented in section IV. Section V presents conclusions and future work. II. RELATED WORK Reactive Virtual Trainer[3] is one of the virtual reality systems for sports. The goal of this system is to create a virtual trainer assisting a user with his/her workout goals like a real sports trainer. Babu et al [4] presents the Virtual Human Physiotherapist Framework. Their goal focuses on exercise- specific monitoring, based on 3D color markers placed on the body of the trainee. There are some techniques in computer vision that can be useful in sports such as object tracking[8], recognition[9], etc. In team sports like football or volley ball, tracking technique[10] is often used to keep track of and analyze the orbit of each player or even the ball, then suggest a strategy to cope with opponents. Background subtraction[10] is a technique that finds the difference between two frames and supports for object tracking problem. Particularly with inexperienced or young athletes, pose estimation can be used to analyze human body motion and undertake as a coach to guide them visually through video. Auxiliary equipments such as Wii remote or Kinect can be used to develop interesting applications for sports. With the motion sensing capability, players can use Wii remote controls to mimic actions of specific sports such as tennis, boxing, etc. Meanwhile, Kinect with depth sensors captures the depth map 978-1-4673-0309-5/12/$31.00 ©2012 IEEE

Upload: anh-duc

Post on 28-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Ho Chi Minh City, Vietnam (2012.02.27-2012.03.1)] 2012 IEEE RIVF International Conference

Applying Virtual Reality for In-Door Jogging

Trung-Nghia LE1, Quoc-Minh BUI1, Vinh-Tiep NGUYEN1,2, Minh-Triet TRAN1,2, Anh-Duc DUONG1 1University of Science, VNU-HCM

2John von Neumann Institute, VNU-HCM Ho Chi Minh city, Vietnam

Email: {0812332,0812303}@student.hcmus.edu.vn, {nvtiep,tmtriet,daduc}@fit.hcmus.edu.vn

Abstract— We propose an “In-place Virtual Tour” system that generates the smart virtual environment adapting to a user’s activities when jogging. Our proposed system automatically detects the main region of frames captured from a regular camera, analyses the difference in foreground in the main region of consecutive frames to estimate the user’s intensity level, then renders virtual scene with appropriate speed. The system does not require any special devices as existing interactive fitness games. Experimental results demonstrate the effectiveness and the realtime manner of the proposed system. Our proposed method can also be applied to develop various exciting interactive games to stimulate excitement for users when doing in-door fitness exercises or to create virtual environment for museums where visitors just stay in place to explore scenes in museums.

I. INTRODUCTION The idea of Virtual Reality (VR) is first proposed by Myron

Krueger in 1970s and its concept is further defined by Jaron Lanier in 2002[1]. Within the last three decades, virtual reality has been applied in a lot of applications in various fields, especially entertainment[2].

Doing fitness exercises is an essential daily need to improve health and endurance. However, in modern society, people do not have many chances to do exercises regularly in natural environment. This motivates the development of multimedia and interactive systems to stimulate the excitement for people when doing indoor fitness exercises even at home, or fitness center, or workplace. Provided with various types of multimedia, including sound, images, and video, users can enjoy the experience of immersion in the simulated environment when doing fitness exercises. Special devices can provide extra useful data, such as depth or haptic data, to analyze user’s activities. However with only visual data captured from a regular webcam/camera, meaningful information can be extracted and analyzed to create useful and effective applications related to sports.

In this paper, we propose an “In-place Virtual Tour” system that analyses a user’s motion when jogging to create in a realtime manner visual and audio media adapting to a user’s activities. A user can enjoy a virtual tour as a character exploring a virtual scene. The virtual character’s speed is updated in a realtime manner corresponding to the user’s current workout intensity level. The proposed system does not require any special equipment but only regular devices (a regular low cost webcam and a laptop/personal computer) to capture user’s motion, to display virtual reality images and to generate sound effects corresponding to user’s activity.

To achieve the realtime property, we first exploit the observation that not all body parts have the same level of activities in exercises. Thus we propose to select only a region of interest, i.e. a fixed area in the frame that contains the most useful information for processing. In jogging/running, as user’s activities mostly target the lower body, we propose to focus on a user’s lower body (including knees, legs, ankles, and feet) to extract the user’s workout intensity level. Second, we also propose an effective but simple measure (RDoF) to express the level of difference in foreground of consecutive frames. The higher the value of RDoF is, the more activity in this region is. We study and find out the exponential relationship between RDoF and the number of steps in a time slot (e.g. second, minute).

This paper is structured as follows. In section II, we briefly present and analyze the related work. The proposed system is presented in section III, including the analysis to automatically identify the region of interest in frames and the analysis to find the relationship between the difference in foreground in consecutive frames and the instant speed of the user in jogging/running. The experimental results are presented in section IV. Section V presents conclusions and future work.

II. RELATED WORK Reactive Virtual Trainer[3] is one of the virtual reality

systems for sports. The goal of this system is to create a virtual trainer assisting a user with his/her workout goals like a real sports trainer. Babu et al [4] presents the Virtual Human Physiotherapist Framework. Their goal focuses on exercise-specific monitoring, based on 3D color markers placed on the body of the trainee.

There are some techniques in computer vision that can be useful in sports such as object tracking[8], recognition[9], etc. In team sports like football or volley ball, tracking technique[10] is often used to keep track of and analyze the orbit of each player or even the ball, then suggest a strategy to cope with opponents. Background subtraction[10] is a technique that finds the difference between two frames and supports for object tracking problem. Particularly with inexperienced or young athletes, pose estimation can be used to analyze human body motion and undertake as a coach to guide them visually through video.

Auxiliary equipments such as Wii remote or Kinect can be used to develop interesting applications for sports. With the motion sensing capability, players can use Wii remote controls to mimic actions of specific sports such as tennis, boxing, etc. Meanwhile, Kinect with depth sensors captures the depth map

978-1-4673-0309-5/12/$31.00 ©2012 IEEE

Page 2: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Ho Chi Minh City, Vietnam (2012.02.27-2012.03.1)] 2012 IEEE RIVF International Conference

of the scene then analyzes the skeletal structure of the player body. With skeletal tracking and action recognition algorithms, Kinect can be used in sport games. One of the most famous is Dance Central. One thing that makes Dance Central unique is that it allows players to share their games with others.

We inspired by the idea of using multimedia to attract users when doing exercise. However, we do not follow the trend to use special devices, we only use regular equipments such as a personal computer and a low cost camera to capture images of a user when doing exercise. We then analyze the visual data and generate virtual environment including video clips and sound effects. Our approach also uses background subtraction technique but we only focus on a specific region then estimate the speed and display adapted multimedia content.

III. PROPOSED METHOD

A. Overview of Virtual Tour for Jogging The main objective of our proposed system is to create a

virtual environment to stimulate user’s excitement when doing fitness exercises. To assist a user in doing exercises anywhere and at any time, the system does not depend on dedicated sports devices. The user can control the virtual character’s speed by adjusting the relative relationship between the user’s real speed and that of the virtual character. Therefore our main goal is not to estimate the exact value of user’s velocity but to detect how fast a user walks or runs so that the virtual scene can be generated and rendered appropriately.

To support the flexible setup, the system does not require a fixed layout. The user can freely place the camera perpendicular to the direction of his/her movement. Hence the dependency of the proposed method on the resolution of the camera and the distance between the user and the camera should be minimized. Figure 1 illustrates the layout of the proposed Virtual Tour system: a regular low cost camera (C) and a screen (S). There are two main phases in the proposed system: Calibration and Activity Analysis. The first phase is to determine automatically system’s parameters for each fitness session. In the second phase, the system extracts background, estimates user’s speed, and renders the virtual scene.

Figure 1. System’s layout

B. Calibration Phase Because the user can freely deploy the system, the system’s

parameters may be changed for each fitness exercise session. Therefore the calibration must be executed automatically at the beginning of each jogging session to determine system’s parameters of that session.

The first step in the Calibration phase is to estimates the background of the current fitness session. After the system

learns the background, the user enters the view of the camera and stands still for a few seconds so that the system can extracts the whole user’s body (cf. Section C). We observe from experiments that when jogging/running, the lower body has the most activity. Therefore we decide to focus only on that part to get necessary information to estimate the reasonable speed for virtual scene (cf. Section D).

Definition 1: The region of interest (ROI) in Virtual Tour system is the region that can capture the whole runner’s lower body, including the knees, the legs, ankles, and the feet. The ROI can be set to a fixed area in the camera’s view for all frames captured by the camera in the whole fitness session.

C. Background Subtraction Background subtraction is required in both Calibration

phase and Activity Analysis phase. After subtracting the background, the remained foreground is assumed to be the body of runner. The background can be changed in each fitness session. Therefore it is necessary to estimate the background again for each fitness session.

In our experiment, we apply and analyze the performance and efficiency of the two methods for background subtraction: Codebook model[5] and Gaussian Mixture Model (GMM)[6]. The experimental results of background subtraction using these two methods are shown in Figure 2. Because the background in our system does not change much within one session, the Codebook approach is more appropriate for our system than GMM. Furthermore, Codebook Model is also a real-time processing model, efficient in memory. After background subtraction, noise filtering is then applied to remove small foreground regions in the result.

Figure 2. Background subtraction using Codebook Model and GMM

D. Automatic Lower Body Detection

Figure 3. Automatic lower body detection.

Our approach in this system is to apply a simple method to achieve real-time manner of the system. We calculate the difference of the foreground in consecutive frames to estimate the speed of the runner (low or fast movement). The most changing region of the body is shown as blue pixels in Figure 3a. It can be seen that, the most important part that make

Page 3: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Ho Chi Minh City, Vietnam (2012.02.27-2012.03.1)] 2012 IEEE RIVF International Conference

human motion is from knee to foot, so, the condition for this step is: the lower body must contain the knee to the foot. Through our experiments, the result of the lower body detection is not only independent of the camera’s resolution but also affected very little by the distance from the runner to camera. The result of this step is shown in Figure 3b.

E. The Height of the Lower Body Region The goal of this step is to estimate automatically the height

of the lower body region. To minimize the processing cost, this region is fixed for each session in the calibration phase. Let H be the mean height of the runner estimated in the calibration session. For a regular runner’s body, the distance between the knee and the foot is usually about H/3. However when running, the user tends to move the knees up higher than normal level, especially in high knee running. The ROI is determined for the whole jogging session so that the region should contain the whole lower body of the runner for every frame of the session. This region satisfies two conditions: it should be as small as possible and contains the whole lower body of the runner even in high knee running.

We conduct the experiments to study the relationship between H and the ROI’s height in three different scenarios: (1) normal walking, (2) jogging, and (3) high knee running. Let θh be the ratio of ROI’s height over H. For each scenario, we study different values of θh from 0 to 1 (step 0.05). For each value of θh, a fixed ROI is defined. The success rate for each value of θh is defined as the ratio between the number of frames in which the lower body is contained entirely in the ROI over the total frames. The experimental results are shown in Figure 4. For θh ≥ 0.4, the ROI can be applied for all frames in normal walking and jogging because there is little change between the knee and the feet of the runner. For high knee running, the runner’s lower body can be captured entirely in ROI with θh ≥ 0.5. Therefore, to ensure the accuracy for all cases and to optimize processing speed, we choose θh = 0.5.

Figure 4. The success rate for different value of θh

F. The Width of the Lower Body Region The next step is to find out the relationship between the

width of lower body region and the mean height H. We also conduct the experiments similar to those in section III.E. Let θw be the ratio of the ROI’s width over H. We set θw from 0.05 to 1.00 with step 0.05 and calculate the success rate for each value of θw. The results are illustrated in Figure 5.

In the ideal case when a user walks or runs exactly at only one place, the ROI’s width can be determined as 0.45 H. However a user tends to move locally within a small area when

running/jogging and his/her position can change slightly in horizontal direction. Because the ROI can capture the whole user’s lower body in a session, we decide to take the width of the ROI about 0.75 times the runner’s height. This value is approximate the regular length of a treadmill’s run.

Figure 5. The success rate for different value of θw

Therefore, when beginning running, the system will detect automatically lower body of runner and only processes in lower body region. Through various experiments and different camera with various resolutions, the result shows that detecting lower body region do not depend on the resolution of cameras.

G. Speed Estimation We propose the idea based on the total steps of the runner

to estimate the level of the speed. To estimate the speed of the movement of the runner, we calculate the frequency of the step (in this case we use the Steps Per Second). The traditional method is to count the number of feet steps directly but this method requires special devices such as pedometer. So we propose a better method, which is low cost and independent of devices. We observe that the difference of two consecutive frames value is proportional to the frequency of the step. So, there is a relationship between these values. We call that is the ratio of the different of the foregrounds (RDoF).

Definition 2: In each frame, each pixel can be classified as either a foreground pixel or a background one. Let ΔF (t) be the number pixels in the ROI that change from background (in frame t – 1) to foreground (in frame t). Let n be the total number of pixels in the ROI. The Ratio Difference of Foreground (RDoF) is defined to be the ratio between ΔF (t) and the total number of pixels n in the ROI.

Figure 6. Raw value of RDoF and the value after applying Kalman Filter

In experiment, the RDoF’s value tends to be unstable. Hence, we solve that problem by using Kalman Filter[7], the goal is to estimate the accuracy entropy from instantaneous entropy based on the previous entropy. The Figure 6 shows the different before and after applying Kalman Filter.

Page 4: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Ho Chi Minh City, Vietnam (2012.02.27-2012.03.1)] 2012 IEEE RIVF International Conference

We examine the relationship between RDoF and the frequency of the steps in three different scenarios: (1) normal walking, (2) jogging and (3) running with high knee. For each scenario, we label the frame had the foot touch the floor and estimate the current velocity of the runner at that time (its unit is SPS) by the frame between the time the foot touch the floor and the next time over the delay time between each frame. The result is shown in Figure 7. We found that the relationship between the RDoF and SPS follows exponential function and this function is different for each session. In the ideal cases, the distance between camera and runner, and the resolution of camera do not depend on environment conditions. However, in the particular cases, there are some difference between the real SPS and the calculated ones. In our system, we choose to estimate the coefficients to ensure that we get the speed of runner approximately. Thus, we estimate the particular coefficient corresponding to each curve for each case. The exponential function follows the formula:

ν = α . e ax+b + β

where v is the frequency of the steps of the runner (SPS); x is the RDoF; α, β, a, b are the coefficients.

From this observation, our proposed system processes follow: in calibration phase, it estimates the coefficients in the moving process and uses these calculated coefficients for the whole session.

Figure 7. The relationship between RDoF and SPS

IV. EXPERIMENTS We conduct experiments with 20 volunteers in different

environments and various speeds. In each experiment we compare the calculated SPS and the corresponding real SPS. In order to estimate the real SPS, we manually label the frame whenever the user’s foot touches the floor.

The results are shown in Figure 8 with three levels of movement: high knee running (A), jogging (B), and normal walking (C). The dashed lines are real SPS while the solid ones are estimated SPS. In case (A), a volunteer runs with high knee with fast speed. With case (B), a volunteer jogs with an average speed. The result shows that there are some minor error between the calculated value and the real one. In case (C), a volunteer just walks with slow speed and the calculated values and real values are approximately identical. We find out that the calculated SPS is similar with the estimated SPS.

The experiments are performed with the system running Core Quad 2.4 GHz (using 4 processors). On average our system takes about 8 milliseconds to process each frame. Therefore the proposed system can execute in realtime manner.

Figure 8. The comparison of real SPS and calculated SPS

V. CONCLUSION We proposed a new method to estimate the level of action

of a user when jogging. The key idea is based on the fact that when a user runs/jogs fast, the difference in foreground between consecutive frames is high. We also propose the new measure, namely RDoF, to reflect the difference in foreground and study the exponential relationship between RDoF and the number of steps that a user performs within a time slot (e.g. a second or a minute). Our system also can predict the time when the feet touch to the floor. It can be used to create the in-place journey into the jungle or a virtual environment in gallery, museum to the explorers even they stay at their home. In addition, the whole process does not have dependency. Hence, our proposed system can be parallelism. As the result, our proposed system can reduce the processing speed a lot when the system is installed in GPU based processing.

REFERENCES [1] L. Yount, “The Lucent Library of Science and technology – Virtual

Reality”, 2005, 7-50. [2] A. B. Craig, W. R. Sherman, J. D. Will, “Developing Virtual Reality

Applications: Foundations of Effective Design”, Elsevier publisher, 2009.

[3] E. Dehling, “The Reactive Virtual Trainer”, Master thesis in Human Media Interaction, University of Twente, 2011.

[4] S. Babu, et al., “Virtual human physiotherapist framework for personalized training and rehabilitation”, Graphics Interface, 2005.

[5] K. Kim, T. H. Chalidabhongse, D. Harwood and L. Davis, “Background modeling and subtraction by codebook cons-truction”, IEEE ICIP 2004.

[6] Z. Zivkovic, “Improved Adaptive Gaussian Mixture Model for Background Subtraction”, IEEE ICIP 2004.

[7] R. E. Kalman, “A New Approach to Linear Filtering and Prediction Problems”, Transaction of ASME-Journal of Basic Engineering, 1960.

[8] T. Xiaofeng, L. Jia, W. Tao, Z. Yimin, “Automatic player labeling, tracking and field registration and trajectory mapping in broadcast soccer video”, ACM Trans. Intell. Syst. Technol. 2, 2, Article 15 (2011).

[9] S. Ali, M. Shah, “Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning”, IEEE Trans. Pattern Anal. Mach. Intell. 32, 2 (2010), 288-303.

[10] M. Piccardi, "Background subtraction techniques: a review", IEEE SMC 2004.

[11] P. Huang, A. Hilton, “Football Player Tracking for Video Annotation”, CVMP 2006.