tennis stroke detection in video sequences - kth · tennis stroke detection in video sequences ......

Tennis Stroke Detection in Video Sequences

DAVID LUCAS SEGARRA

Master’s Degree Project Stockholm, Sweden 2005

TRITA-NA-E05110

Numerisk analys och datalogi Department of Numerical Analysis KTH and Computer Science 100 44 Stockholm Royal Institute of Technology SE-100 44 Stockholm, Sweden

DAVID LUCAS SEGARRA

TRITA-NA-E05110

Master’s Thesis in Computer Science (20 credits) at the IT Program,

Royal Institute of Technology year 2005 Supervisor at Nada was Stefan Carlsson

Examiner was Stefan Carlsson

Tennis Stroke Detection in Video Sequences

i

Tennis Stroke Detection in Video Sequences Abstract The rapid development of digital media has made it possible for us to have access to such quantities of different kinds of information, that we sometimes experience difficulties in handling it. Soon it will be possible to store digital video information in the same way as text is being stored today. This enormous abundance of video data will make search engines and editing possibilities for video necessary. This is exactly what we are going to work on. In this work we will focus on recognizing different types of human activities in video sequences. The Learning, Recognition & Visualization group is part of the Computer Vision and Active Perception Laboratory (CVAP) in the Department of Numerical Analysis and Computing Science (NADA) at Royal Institute of Technology (KTH). Its research focus on learning, recognizing and visualizing in sports videos and 3D scenes. This thesis report addresses the problem of tracking tennis players using image processing algorithms from a tennis video sequence. After analyzing all the collected data by means of geometric, mathematic and physical procedures; all kind of strokes will be detected. The detection of forehands, backhands and volleys of the players during a whole set will be our objective. And we will demonstrate the validity of a stroke detection algorithm. This characteristic has potential applications for automatic editing, broadcasting, archiving, browsing and training. The suggested algorithm is based on finding the feet position of the tennis players and their acceleration. Before a stroke there is almost always a short run, and this algorithm is demonstrated successfully extracting 86 % of stroke frames during one set of tennis video and ruling out the 57% of the video. The results are promising when the tennis player hits the ball separated from his body, in other words, when the tennis player attacks the ball. To cope with the limitations of strokes without acceleration, a racquet and ball detector or audio information system should be implemented, but this goes beyond the scope of this Master’s thesis.

ii

Acknowledgements After the realization of this Master’s thesis I wish to thank my supervisor Stefan Carlsson for his advice, support and understanding; and also, to Gareth Loy for his help and patience always. In addition, I would like to express thanks to my family because they have always supported me unconditionally. Finally, I have to express gratitude with my friends, both in Spain and in Stockholm, who encouraged me at every moment. Special thanks to Adrián, Juansa, Raúl, Maria and Irina for just being who they are.

iii

Table of Contents 1 Introduction ............................................................................................................................ 1

1.1 Project Description ................................................................................................... 1 1.2 Related Work ............................................................................................................ 2 1.3 Commercial Products ............................................................................................... 3

2 Tracking Algorithm ............................................................................................................... 4

3 Computing Feet Position ....................................................................................................... 7 4 Projective Transformation ................................................................................................. 11 5 Stroke Detection ................................................................................................................... 14

5.1 Getting Acceleration with Least Square Filter ....................................................... 16 5.2 Detecting Irrelevant Shots ...................................................................................... 19 5.3 Generating Stroke Sequences ................................................................................. 21 5.4 Generating Output Data .......................................................................................... 25

6 Experimental Evaluation ..................................................................................................... 26 7 Conclusion ............................................................................................................................. 31 8 References ............................................................................................................................. 32 Appendix A - Tennis Court Diagram ..................................................................................... 33

Appendix B - Projective Transformation Theory ................................................................ 34

Appendix C - MATLAB ......................................................................................................... 36

iv

List of Figures Figure 1.1 Flow chart of the chapters ........................................................................................ 2 Figure 2.1 Different frames ....................................................................................................... 5 Figure 2.2 Outputs of the tracking algorithm ............................................................................ 6 Figure 3.1 Head, torso and feet locations of the player in the baseline and ZOOM ................. 8 Figure 3.2 Head, torso and feet locations of the player in the net and ZOOM ......................... 9 Figure 3.3 Head, torso and feet locations of the player ........................................................... 10 Figure 4.1 Tennis player path (1) in a bird view of the court .................................................. 12 Figure 4.2 Tennis player path (2) in a bird view of the court .................................................. 13 Figure 5.1 Stroke detection flow chart .................................................................................... 15 Figure 5.2 Getting acceleration with Least Square Filter flow chart ...................................... 16 Figure 5.3 Plot of speed in x axis with derivative of the feet position .................................... 16 Figure 5.4 Comparison of normal speed and smooth speed .................................................... 18 Figure 5.5 Comparison of normal acceleration and smooth acceleration ............................... 18 Figure 5.6 Detecting irrelevant shots flow chart ..................................................................... 19 Figure 5.7 Two examples of irrelevant shots .......................................................................... 20 Figure 5.8 Stroke sequences flow chart .................................................................................. 21 Figure 5.9 Graph of the first ball of the set ............................................................................. 22 Figure 5.10 Some important frames of the first ball sequence of set ........................................ 23 Figure 5.11 Comparison between real strokes, alarm frames and stroke sequences ................. 24 Figure 5.12 Output data flow chart ........................................................................................... 25 Figure 6.1 Evolution of the main features (trust detected strokes, irrelevant video and false alarms) with the decreasing of the thresholds ......................................... 28 Figure 6.2 First 14 detected strokes ....................................................................................... 29 Figure 6.3 Not detected strokes .............................................................................................. 30 Figure A.1 Tennis court diagram ............................................................................................. 33 List of Tables Table 6.1 Evolution of the main features (trust detected strokes, irrelevant video and false alarms) with the decreasing of the thresholds .......................................... 27

1

Chapter 1 Introduction In this chapter, a brief introduction, the outline of the chapters and motivation of the thesis is given. The chapter also offers a summary of background and commercial products. 1.1 Project Description The purpose of the thesis addresses automatic high-level content-based video retrieval with an emphasis on recognizing events in tennis game videos. With the increasing use of digital multimedia, there is a corresponding increase in the need for tools that enable the fast and efficient indexing, querying, and browsing of multimedia databases. The novelty of this thesis will be the use of an acceleration-based algorithm in order to detect the strokes of the tennis player in a video recording of a complete tennis game. Annotating digital tennis footage with metadata such a description of the strokes played, would provide us easy access to digital tennis footage in a video archive. A tennis coach, for example, would benefit from this to be able to retrieve training footage of certain strokes of his/her tennis player. Improvement over time or analysis of match footage of the opponent in order to detect the weaknesses could also be done with this technique. Other possible applications would be the automation of the collection of match information, providing match commentators or home viewers with direct access to current match statistic. Next, the outline of the project and its features will be explained. Figure 1.1 shows the scheme of the proposed system and how the chapters match with this system. First of all, the input video will be a whole set of a tennis match extracted from Wimbledon Classic Matches - Borg Vs McEnroe 1980 (second set). This video will be analyzed with the tracking algorithm explained in the Chapter 2 - Tracking Algorithm. The first aim of this step will be to identify the court frames, in other words, to detect in the whole footage only the shots we need. Then, within these frames, the player (head and torso location) and the court lines will be located and tracked.

1. Introduction

2

Figure 1.1 Flow chart of the chapters Second, the feet location will be obtained by an approximation with a simple linear equation system. It is based on the head and torso location in two different positions of the player in the court. Chapter 3 - Composing Feet Position explains this. Then, in Chapter 4 – Projective Transformation, the objective will be to show the tennis court lines and the feet location of the player in a bird view. This process will be carried out by linear mapping of a planar surface. Last, Chapter 5 – Stroke Algorithm describes the acceleration-based algorithm and procedures used in the thesis to detect the strokes of the tennis player located in the bottom of the image. The acceleration-based algorithm is based on the feet location of the player in a bird view. 1.2 Related Work Automatic indexing and retrieval of high-level video information based on content is a defiant research area in which a lot of effort has been invested. We should ask ourselves what is the meaning of content and why is the problem of indexing video with it still unsolved? In the technical aspect, it is generally accepted that content is too subjective to be characterized completely as it usually depends on the objects, context, domain, background, etc. Content has also been characterized based on color, shape, texture and motion in some literature. While these approaches have their merits in being applicable to generic images and video, their main problem is that they just can characterize low-level information, but the end users will almost like to interact at high-level when accessing images and video segments from a database. Manual generation of high-level annotations is quite cumbersome, time consuming and expensive. Hence, there is a need for algorithms that are able to automatically infer high-level content from data. In 1998, G. Sudhir, John C. M. Lee and Anil K. Jain in the paper “Automatic Classification of Tennis Video for High-level Content-based Retrieval” [1] presented an automatic analysis of tennis video to facilitate content-based retrieval. This approach is based on the generation of an image model for the tennis court-lines, and then this model is derived by

1 Introduction

3

using the knowledge about dimensions and connectivity of a tennis court and typical camera geometry used when capturing a tennis video. Next a court line detection algorithm and a robust player-tracking algorithm is developed. In addition, tennis court clips from an input raw footage of tennis video are selected using a color-based algorithm. In 2002, Corrado Calvo, Alessandro Micarelli and Enver Sangineto in the paper “Automatic Annotation of Tennis Video Sequences” [2] proposed an automatic method for annotation of tennis video sequences. The Hough Transform is used to detect court lines, and the whole court is divided in pieces with the intersections between vertical and horizontal lines. Player positions are extracted looking for those edge pixels which orientation is different from the lines of the court. Once these data have been collected; different game actions such as base-line rally, passing shot, serve-volley or net game; are annotated by means of analysis of the initial and final position of players in the court. In 2003, Hisashi Miyamori in the paper “Automatic Annotation of Tennis Action for Content-Based Retrieval by Integrated Audio and Visual Information” [3] introduced a novelty. He proposed a method of automatically annotating tennis action through the integrated use of audio and video information. Audio information will help us to extract ball-hitting times. Player’s actions will be identified comparing ball-hitting times with player and ball location. 1.3 Commercial Products Lucent Technologies is the company which has shown more interest in tennis visualization systems. Their system is called LucentVision [4] and uses real-time video analysis to obtain motion trajectories of players and the ball. At the same time, it offers a rich set of visualization options based on this trajectory data. LucentVision uses eight cameras placed around a tennis stadium to track the players and the ball. Two of them track each player viewing one half of the court, and the remaining six track the motion of the ball during serves. Once a sporting event is stored in a database in the form of motion trajectories, scores, and other domain specific information; viewers can explore and interact with the virtual version of the real event. It can be possible to show coverage maps (color-coded maps where the player trajectories are represented), comparative graphs of speeds of the players, virtual replays of the serves, service landing positions for each player, etc.

4

Chapter 2 Tracking Algorithm In this chapter, we describe now the algorithm used to locate and track the player and the court. The knowledge to develop the algorithm is based in [5]. The algorithm can be divided into three parts. 1. Identifying court frames 'Court frames' are defined as frames from the video showing an overview of the tennis court taken from a camera high in the grandstand. In these frames both players are visible, as is the majority of the court. These frames can be automatically identified by searching for evidence of a tennis court appropriately sized and positioned in the image. The tennis court is detected using templates of the court corners manually selected from several randomly chosen frames. Normalized cross-correlation is then used to find the most likely locations for each corner template and these are compared to a rigid model describing the court corner locations, to estimate whether or not the image contains a tennis court at an appropriate scale and location to be a 'court frame'. This identification is not perfect, and some mistakes will be introduced. In chapter 5, section 5.2 the solution to this problem will be explained. Different kind of frames can be observed in Figure 2.1 2. Locating player in foreground Once court frames have been extracted from the video the location of the player in the foreground is determined. As with most sports footage the players appear with strong contrast against the background of the playing field. To utilize this, a color model is constructed for the grass court surface. The location of the tennis court has already been determined, guided by this information the portion of the image containing the court surface is extracted and chrominance and luminance color models constructed using the approach of Cai and Goshtasby [6]. Pixels that are well represented by the color models are removed and additional color models are made using only the remaining pixels; this process is repeated to build a set of color models that describe the court surface. Even though the players appear in the regions sampled when creating the color models their area

2. Tracking Algorithm

5

is small compared to the comparatively uniform expanse of the court surface, that they are not well represented by the resulting color models. Thus the color models can be used to distinguish the players from the background of the court surface. The approximate apparent size of the players is known as a function of their location on the court (i.e., their distance from the camera) and as this is Wimbledon the players are dressed predominantly in white. Applying the color models to the frame to detect court-like colors gives an image with dark blobs in the court region at places where the court surface is occluded by the players. Potential player head and torso locations are located in this image using the Fast Radial Symmetry Transform (Loy and Zelinsky) [7] and looking for dark blobs of an appropriate size. The additional knowledge that the players are dressed in white allows further attenuation of the torso candidates. Thus a player can be located in a given court frame.

a) b)

c) d)

Figure 2.1 Different frames:

a) Detected court frame b) Zoom of the player frame c) Zoom of the player frame d) Grandstand public frame

2. Tracking Algorithm

6

3. Tracking player location Once the player location is known a template of the player's head is extracted and used to track the player in subsequent frames using normalized cross-correlation. The template is discarded and updated every ten frames to allow recovery from tracking failures, and it is also discarded and updated if the correlation between the template and the tracked player falls below a certain threshold, or if a new sequence of court frames begins (identified by non-continuous frame numbers). Outputs of the tracking algorithm are printed in Figure 2.2. This figure shows us from top to down and left to right: - The located player in a court frame with the potential head (red) and torso (green) marker. - The zoom of the located player - Maximum of torso and head locations. - X and Y head location.

Figure 2.2 Outputs of the tracking algorithm.

7

Chapter 3 Computing Feet Position This chapter describes the approximation used to compute feet position based on the head and torso location. Head and torso location are the outputs of the tracking algorithm explained in chapter 2. A simple linear equation system based on two different landmarks on the player in the court will be used. Head and torso locations will be exactly located. Feet location will be approximately located. Obviously the two landmarks will be taken from the same player, in our case, John McEnroe. A simple linear equation is used because the camera zoom is fixed; therefore the distance between head or torso and feet is different depending of the position of the player in the court. Figure 3.1 illustrates the player near the baseline waiting for the serve and its zoom. Figure 3.2 illustrates the player near the net and its zoom. Once we have two different positions, the next linear equation system to calculate the distance between torso and feet of the player is established:

_ _ ( _ _ )_ _ ( * _ _ )

player height net A player head net Bplayer height baseline A player head baseline B

= ∗ += +

Solving this system, A=0.3333 and B=6.2001 Therefore, feet position of the player in each frame should be approximated as:

_ _ _ __ _ _ _ [( * _ _ ) ]

player feet X player torso Xplayer feet Y player head Y A player head Y B

== + +

In the X axis, torso position is used. Torso movements are far more stable than head movements. For example, head location lowers when the player is going to serve. With our approximation this movement would cause a false acceleration in X axis and therefore a false stroke alarm.

3. Computing Feet Position

8

However, in the Y axis, feet position is calculated adding the player height in each point to the head position. The results are quite satisfactory with our approximation. More precise feet location should be implemented by image processing, but this goes out the objective of this Master’s thesis. Figure 3.3 illustrates the validity of our feet location approximation.

Figure 3.1 Head, torso and feet locations of the player in the baseline and ZOOM.


9

Figure 3.2 Head, torso and feet locations of the player in the net and ZOOM.


10

Figure 3.3 Head, torso and feet locations of the player.

11

Chapter 4 Projective Transformation In this chapter, the objective will be to show the tennis court lines and the feet location of the player in a bird view. This process will be carried out by linear mapping of a planar surface. We will base on the theory explained in Appendix B - Projective Transformation Theory. The resulting linear system of equations is:

11 12 13

21 22 23

31 32 331 1

x h h hy h h h

h h h

αβ

⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

(4.1)

This states that the homogeneous image coordinates of a point on a planar surfaces and the homogeneous affine coordinates defined for the point on that surface are related by a linear transformation. If we can identify this linear mapping, it can be used directly to render the planar surface in 3D thus avoiding the triangulation. We can cancel the arbitrary scale factor by division:

11 12 13

31 32 33

21 22 23

31 32 33

h h hxh h h

h h hyh h h

α βα β

α βα β

+ +=

+ +

+ +=

+ +

(4.2)

4. Projective Transformation

12

By simple manipulation we can write these equations as:

11 12 13 31 32 33

21 22 23 31 32 33

00

h h h h x h x h xh h h h y h y h yα β α βα β α β+ + − − − =+ + − − − =

(4.3)

We see that each corresponding point gives two linear homogeneous equations for determining the nine unknown parameters h11 . . . h33 of the linear mapping matrix H. Since this matrix is only defined up to an arbitrary scale factor, we actually only have 8 parameters. If we divide each term in the linear equations by the element h33 we are left with only eight unknowns since the element h33 is cancelled. A homogeneous linear equation can always be normalized this way, reducing the number of unknowns with one. Solving this linear system of equations, we will know the parameters of the linear mapping matrix H and next, for each frame, we will calculate the new x and y positions of the court lines and feet with equation (4.2) The results of our linear mapping transformation are shown, with some paths of the tennis player, in Figure 4.1 and Figure 4.2

Figure 4.1 Tennis player path (1) in a bird view of the court.

4. Projective Transformation

13

Figure 4.2 Tennis player path (2) in a bird view of the court.

14

Chapter 5 Stroke Detection This chapter describes the acceleration-based algorithm and procedures used in the project to detect the strokes of the tennis player located in the bottom of the image. The flow chart of the stroke detection is illustrated by Figure 5.1. Groups of boxes are grouped in larger figures labeled with the appropriate section, following the logical development of this chapter. Section 5.1 is Getting acceleration with Least Square Filter, section 5.2 is Detecting Irrelevant Shots, section 5.3 is Generating Stroke Sequences and section 5.4 is Generating Output Data. Next, each section will be outlined. Our first objective should be finding out the acceleration of the tennis player. Finding the feet position based on the approximation explained in chapter 3 will be our first step to achieve that. A least square filter is implemented in order to smooth the speed and acceleration data obtained with a simple derivate from the feet position. All this process is explained in section 5.1 Additionally, since the tracking algorithm is not perfect and some irrelevant shots will appear in our input data, and therefore in our feet position, an irrelevant shots detector should be implemented. Empirically, establishing a threshold in the acceleration of x axis is the best option. This is made taking the derivative of the feet position twice. It can be observed in section 5.2 Section 5.3 shows our approximation to generate the stroke sequences from the acceleration data obtained in section 5.1. This process is also based on empirical observation, with the establishment of two different thresholds. After many tests, it can be concluded that it is quite satisfactory for our objective. Finally, in Section 5.4 the output data is just the elimination of irrelevant shots (output of section 5.2) from the stroke sequences frames (output of section 5.3). Our final objective, all the sequences of the stroke frames, will be automatically generated.

5. Stroke Detection

15

The flow chart of the stroke detection looks like Figure 5.1:

Figure 5.1 Stroke detection flow chart.

5. Stroke Detection

16

5.1 Getting Acceleration with Least Square Filter The flow chart of this section looks like Figure 5.2:

Figure 5.2 Getting acceleration with Least Square Filter flow chart.

Figure 5.3 illustrates the plot of the calculated speed in x axis by means of the derivative of the feet position. These frames (0-270) represent the first ball of the first game of the set.

Figure 5.3 Plot of speed in x axis with derivative of the feet position.

Feet position

Least square filter

Least square filter

Acceleration data

Speed data

5. Stroke Detection

17

It can be observed that there are many peaks which can cause false alarms in our detection. The irregular plot must be smoothed to provide something that helps us to establish the thresholds for the stroke detection. If x is modeled with constant velocity V:

x Vt k= + (5.1) And our objective is obtaining the minimum and V k :

( )2

,min ( )i iV k i

Vt k x t+ −∑ (5.2)

Smooth speed data can be obtained solving the constrained linear least-squares problem from the feet position.

2

2

1min2x

Cx d− (5.3)

Where

/ 2

1

1

/ 2

1

111

1

i win

i

i

i

i win

t

tC t

t

t

−

−

+

+

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

V

xk

⎛ ⎞= ⎜ ⎟⎝ ⎠

/ 2

1

1

/ 2

i win

i

i

i

i win

x

xd x

x

x

−

−

+

−

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

(5.4)

win= the value of the window is going to be used to smooth. Applying the least square filter with a window of 6 frames to the feet position in x axis, the smooth speed data in x axis is the value of V. The same process should be done for the acceleration data. We will use least square filter with the smooth speed and a window of 20 frames will be the adequate this time.

5. Stroke Detection

18

Figure 5.4 illustrates the comparison of both speeds and how the smoothing is working.

Figure 5.4. Comparison of normal speed and smooth speed.

Figure 5.5 illustrates the comparison of both accelerations and how the smoothing is working.

Figure 5.5 Comparison of normal acceleration and smooth acceleration.

5. Stroke Detection

19

5.2 Detecting Irrelevant Shots The flow chart of this section looks like Figure 5.6:

Figure 5.6 Detecting irrelevant shots flow chart.

As was said in chapter 2 (Tracking algorithm), the tracking algorithm is not perfect. Some shot frames, such as crowd or zoom of the player, are inserted as player frames. After carrying out many tests, establishing a threshold with the normal acceleration in x axis is the best solution to remove these frames. With normal acceleration is meant that the acceleration is obtained with two derivatives from the feet position. Least square filter is not used in this case, because now, the peaks which were generating false alarms in section 5.1 are very useful. In our case the threshold is 0.3, and 236 irrelevant shots are detected during the whole set.

5. Stroke Detection

20

Figure 5.7 illustrates two examples of irrelevant shots we need to remove.

Figure 5.7 Two examples of irrelevant shots.

5. Stroke Detection

21

5.3 Generating Stroke Sequences The flow chart of this section looks like Figure 5.8:

Acceleration data

> X thresold ? > Y thresold ?

Remove consecutives frames

Fill frames with -7 frames on the left, +25

frames on the right

Sort and remove repeated frames

STROKE SEQUENCES

Figure 5.8 Stroke sequences flow chart. Accordingly with the acceleration data obtained in section 5.1, the extraction of the stroke sequences will be done in 4 steps:

1) Thresholds: Two thresholds are necessary (acceleration in axis x and axis y). A comparative study of the different values of thresholds and their results will be shown later in chapter 6 (Experimental Evaluation). Now it is assumed the values 0.004 for the x axis threshold and 0.01 for the y axis threshold. These values were taken as a result of the observation of the acceleration plots and the real video simultaneously.

5. Stroke Detection

22

The graphs of the first ball of the set are showed in Figure 5.9. There are six plots, three for X axis and three for Y axis. These graphs show frames vs. position (m); frames vs. speed (m/s) and frames vs. acceleration (m2 /s). There is also indicated the acceleration thresholds used. In Figure 5.9, the high level of the X POSITION indicates moving to the right; therefore low level indicates that the player is moving to the right. In the same way, high level of the Y POSITION indicates that the player is approaching to the net; therefore low level indicates the player is approaching to the baseline. Both scales are shown in meters with the official size of the tennis court. See Appendix A - Tennis court diagram.

Figure 5.9 Graph of the first ball of the set: Frames vs. Position (m), Speed (m /s) and

Acceleration (m2 /s) in axis X and Y and the acceleration thresholds used.

5. Stroke Detection

23

Additionally some important frames of first ball sequence of the set are showed in Figure 5.10

Figure 5.10 Some important frames of the first ball sequence of set.

5. Stroke Detection

24

2) Alarm frames: Once the thresholds are defined, we pick only the first frame of each sequence of frames which exceeds the thresholds (see Figure 5.9). These frames will be called Alarm frames. We pick only the first frame because we want to generate stroke sequences in the output of the system.

3) Alarm vector:

Now, seven frames on the left and 25 on the right are joined with the Alarm frames composing the Alarm vector. 4) Stroke sequences: Finally, it is time to sort in ascendant way the Alarm vector. Since the separation of the Alarm frames was not checked before, it is possible that we have repeated frames in our Alarm vector after the filling with 7 frames on the left and 25 on the right. Consequently the repetitions will be deleted. Stroke sequences will be the result of this step. Figure 5.11 illustrates the comparison between real strokes and the Alarm frames, and then, after all the process of section 5.3, the comparison between real strokes and stroke sequences.

Figure 5.11 Comparison between real strokes, alarm frames and stroke sequences.

5. Stroke Detection

25

5.4 Generating Output Data Lastly, irrelevant shots should be ruled out from our stroke sequences. This process is shown in the Figure 5.12 by a flow chart.

Figure 5.12 Output data flow chart.

26

Chapter 6 Experimental evaluation In this chapter, some results after experimental evaluation and its subsequent discussions are given. Firstly and only for the purpose of evaluating our algorithm; manual generation of high-level annotations is necessary. As was said before, this task is quite cumbersome and time consuming. In other words, we had to check in which frames there were strokes. Once we have exactly the real strokes; we just need to check the percentage of trust detected strokes (%), irrelevant video (%) and false alarms of our system. Trust detected strokes (%) Trust detected strokes mean the percentage of real strokes which are included in our stroke sequences (the output data of the system). In our case, during the second set of Wimbledon Classic Matches - Borg Vs McEnroe 1980, there were 100 strokes (forehands, backhands and volleys) excluding the serves. We can confirm that 86% of these strokes are perfectly detected. See Figure 6.2 and Figure 6.3. Irrelevant video (%) Irrelevant video means the percentage of video is ruled out from the start to the end of the system, because it does not contain significant information for our objective. The percentage of irrelevant video is 57%, in other words, only 43% of the tennis video is shown in the stroke sequences (the output data of the system). False alarms False alarms mean alarms the system generates because the acceleration of the player exceeds our thresholds; but, unfortunately, there is no stroke associated with these alarms. This is the most difficult problem to cope with. This time, exactly 119 false alarms were detected. Accelerations of the player without hitting the ball should be the reason for this high level of false alarms.

6. Experimental Evaluation

27

Even though the good results shown before with our empirically tested thresholds; the change of the main features (trust detected strokes, irrelevant video and false alarms) will be tested with a linear variation of the thresholds, which means that the call to the system will be done again, beginning with the increase of our thresholds (1.5 * thresholds) till the decrease of them (0.5 * thresholds) with steps of 0.1 * thresholds. The result can be seen in Table 6.1.

VARIATION OF THRESHOLDS

FALSE ALARMS

TRUST DETECTED

(%)

IRRELEVANT VIDEO

(%) 1,5*th 62 62 74

1,4*th 76 69 71

1,3*th 77 72 69

1,2*th 86 77 66

1,1*th 101 83 60

th1 119 86 57 0,9*th 123 87 52

0,8*th 126 83 47

0,7*th 122 83 42

0,6*th 140 86 36

0,5*th 157 84 31

1The value of th is X_threshold=0.004m2/s and Y_threshold=0.01m2/s

Table 6.1 Evolution of the main features (trust detected strokes, irrelevant video and false alarms) with the decreasing of the thresholds.

The behavior of the evolution of trust detected strokes and irrelevant video will be noticed better with a graph based on the Table 6.1 data. The graph is shown in Figure 6.1. In Figure 6.1, the results shown are logical and what one would expect. The more we decrease the thresholds, the more strokes we detect. On the contrary, the more we decrease the thresholds, the less irrelevant video we rule out. Furthermore, we can see that the thresholds (X_threshold=0.004m2/s and Y_threshold=0.01m2/s) were quite good. Great balance between both features is achieved with these thresholds. Additionally, we have evaluated precision and recall measures. Precision and recall are the basic measures used in evaluating search strategies. Precision is the ratio of the number of relevant data retrieved to the total number of irrelevant and relevant data retrieved. Recall is the ratio of the number of relevant data retrieved to the total number of relevant data in the database. Both are usually expressed as a percentage. Our system gets a precision and recall rates respectively of 39.3 % and 86%, which are quite encouraging.


28

Figure 6.1 Evolution of the main features (trust detected strokes, irrelevant video and false alarms) with the decreasing of the thresholds.


29

Figure 6.2 First 14 detected strokes


30

Figure 6.3 Not detected strokes

31

Chapter 7 Conclusion In this paper we have presented an automatic system for tennis video annotation. The main objective was to facilitate content-based retrieval based on high-level information. The developed tennis tracking system is completely automatic and it shows promising detection results. It can follow the tennis player around the court and take out the 86% of the strokes in the output video sequence. The acceleration-based algorithm is the novelty and the improvement of this system, compared to previous studies. We did not use a position-based algorithm because we consider there is not sufficient information for stroke detection in position only. Specifically in our case, head and torso locations were not significant for stroke detection. Between speed-based and acceleration-based algorithm, we decided to take the second. Most of the times, before a shot, there is always an associated acceleration, even when the player is running fast or slow. In chapter 6, we have presented some preliminary experimental results showing precision and recall rates respectively of 39.3% and 86%. Recall value is the most encouraging data obtained. Other previous studies [2, 3] got worse recall values, despite the use of additional audio information. The reason of such a high value of recall should be the appropriate selection of acceleration-based algorithm for stroke detection purposes. Only few strokes (14 over 100) in the whole set were lost. After analyzing the lost strokes frames with the tennis video, we can conclude that these times, the player was caught unaware; and thereby, the stroke was made without acceleration (see Figure 6.3). False alarms value is high, hence the low level of precision rate. In the same way, false alarms frames were analyzed in depth in the video. The main limitation of our system was detected. Logically, the accelerations of the player attempting to make a stroke or regretting the movement because the ball was out and the point finished, caused stroke alarms in our system. This inconvenience cannot be solved with the tools used in this thesis. Possibly, racquet and ball detectors should be implemented to solve this false alarms problem. Alternatively, the use of audio information could be another solution in the same way as [3]. Audio information would help us to extract ball-hitting times. Player’s actions would be identified comparing ball-hitting times and accelerations with player and ball locations. Anyway, all these improvements fit with our system, go beyond the scope of this Master’s thesis, and they should be approached as future work.

32

Chapter 8 References [1] G. Sudhir, John C. M. Lee, Anil K. Jain. “Automatic Classification of Tennis Video

for High-level Content-based Retrieval”. IEEE Workshop on Content-based Access and Image and Video Databases, pp. 81-90, 1998.

[2] Corrado Calvo, Alessandro Micarelli, Enver Sangineto. “Automatic Annotation of

Tennis Video Sequences”. L. Van Gool (Ed.): DAGM 2002, LNCS 2449, pp. 540-547, 2002. © Springer-Verlag Berlin Heidelberg 2002.

[3] Hisashi Miyamori. “Automatic Annotation of Tennis Action for Content-Based

Retrieval by Integrated Audio and Visual Information”. E.M. Bakker et al. (Eds.): CIVR 2003, LNCS 2728, pp. 331-341, 2003. © Springer-Verlag Berlin Heidelberg 2003.

[4] Gopal Pingali, Agata Opalach, Yves Jean, Ingrid Carlbom. “Visualization of Sports

using Motion Trajectories: Providing Insights into Performance, Style, and Strategy”. 12 Annula IEEE Visualization Conference (Vis ‘2001), San Diego, California, USA, October 21-26, 2001.

[5] Gareth Loy. An introduction to computer vision. Australian National University,

Canberra, Australia. 2001. [6] J. Cai and A. Goshtasby. “Detecting human faces in color images”. Image and

Vision Computing, 18:63-75, 1999. [7] Gareth Loy, Alexander Zelinsky. “A Fast Radial Symmetry Transform for Detecting

Points of Interest”. IEEE PAMI, Vol. 25, No. 8, pp 959-973. 2003. [8] Stefan Carlsson. Geometric Computing in Image Analysis and Visualization.

Department of Numerical Analysis and Computer Science. KTH, Stockholm, Sweden. 2002.

33

Appendix A – Tennis Court Diagram

Figure A.1 Tennis court diagram

< http://www.signaturelandscapes.com/images/diagram_tennis_court.gif>

34

Appendix B – ProjectiveTransformation Theory Next, linear mapping transformation theory will be explained based on [8]. Frequently we know that the 3D surface we are computing actually is a planar surface. This is common e.g. when reconstructing buildings and other man made objects. In this case we can render the whole plane more efficiently, avoiding the triangulation. This is due to the fact that the projection of planes in an image can be expressed as a linear mapping in homogeneous coordinates. Consider a perspective camera mapping:

11 12 13 14

21 22 23 24

31 32 33 3411

Xx m m m m

Yy m m m m

Zm m m m

⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠

⎝ ⎠

(B.1)

Suppose the point (X, Y, Z) T is lies in a plane containing points: (X1, Y1, Z1), (X2, Y2, Z2) and (X3, Y3, Z3). We can then express it as the linear combination:

3 11 2 1

1 2 1 3 1

1 2 1 3 1

X XX X XXY Y Y Y Y YZ Z Z Z Z Z

α β−− ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞

⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ = + − + −⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

(B.2)

This can be written as:

2 1 3 1 1

2 1 3 1 1

2 1 3 1 1 1

X X X X X XY Y Y Y Y YZ Z Z Z Z Z

αβ

− −⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟= − −⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠

(B.3)

By adding a trivial identity we can write this as:

2 1 3 1 1

2 1 3 1 1

2 1 3 1 1 11 0 0 1

X X X X X XY Y Y Y Y YZ Z Z Z Z Z

αβ

− −⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟− − ⎜ ⎟⎜ ⎟ ⎜ ⎟= ⎜ ⎟⎜ ⎟ ⎜ ⎟− − ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠

⎝ ⎠ ⎝ ⎠

(B.4)

Combining this with equation (B.1) we get:

2 1 3 1 111 12 13 14

2 1 3 1 121 22 23 24

2 1 3 1 131 32 33 34 1

0 0 1

X X X X Xx m m m m

Y Y Y Y Yy m m m m

Z Z Z Z Zz m m m m

αβ

− −⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟− −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟=⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠

⎝ ⎠

(B.5)

Appendix B – Projective Transformation Theory

35

But this is a multiplication of a 3×4 matrix with a 4×3 matrix which is a 3×3 matrix. We can therefore write:

11 12 13

21 22 23

31 32 331 1

x h h hy h h h

h h h

αβ

⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

(B.7)

This states that the homogeneous image coordinates of a point on a planar surfaces and the homogeneous affine coordinates defined for the point on that surface, are related by a linear transformation. If we can identify this linear mapping, it can be used directly to render the planar surface in 3D thus avoiding the triangulation.

36

Appendix C – MATLAB MATLAB vs C++ First of all, why did we use MATLAB to program our system? Why not C++? Here we explain some reasons of our choice. MATLAB provides a fast and efficient way to prototype algorithms. While the run-time of the resulting code is typically quite a bit slower than C++, MATLAB is much easier to debug, visualize results, etc. Additionally, it has many useful high level functions that are easily combined, making it a more suitable platform for experimentation. C++ is less intuitive, typically much more time consuming to debug, and requires the programmer to deal with many additional issues (how to set up their classes, what data types to use, linking to external libraries, etc.). However, C++ code can run much faster than MATLAB code, so it is well suited for a final implementation for a product or real-time system. C/C++ code can also be incorporated into a MATLAB project. MATLAB programs Program 1 %function to compute chapter 2, 3 and 4 function[data]=trackingalgorithm_computefeetposition_projectivetransformation(dir_name, datafile,frames_to_play) %CHAPTER 2 - TRACKING ALGORITHM % read data from file data = dlmread(datafile); frame_nums = 1:size(data,1); if nargin<3 frames_to_play = frame_nums; end frames = data(frame_nums,1); torso_x = data(frame_nums,2); torso_y = data(frame_nums,3); head_xS = data(frame_nums,4); head_yS = data(frame_nums,5); court_feat_x = data(frame_nums,6:6+13); court_feat_y = data(frame_nums,6+14:6+13+14); torso_max = data(frame_nums,end-1); head_max = data(frame_nums,end) detected = zeros(size(frames)); % determine indices of beginning and end of each court sequence frame_bool = zeros(1,max(frames)); frame_bool(frames) = 1; diff = frame_bool(2:end) - frame_bool(1:end-1);

Appendix C - MATLAB

37

start_court_seq = diff(frames-1)==1; start_ind = find(start_court_seq==1); end_court_seq = diff(frames(1:end-1))==-1; end_ind = find(end_court_seq==1); if end_ind(1) < start_ind(1) start_ind = [1, start_ind]; end if end_ind(end) < start_ind(end) end_ind = [end_ind, length(frames)]; end % use median filter to smooth torso and head positions fr = 2; % filter radius for i = 1:length(start_ind) torso_x = medfilt1_section(torso_x,start_ind(i),end_ind(i),fr); torso_y = medfilt1_section(torso_y,start_ind(i),end_ind(i),fr); head_xS = medfilt1_section(head_xS,start_ind(i),end_ind(i),fr); head_yS = medfilt1_section(head_yS,start_ind(i),end_ind(i),fr); end % initialise figure set(gcf,'DoubleBuffer','on'); %handle the figure rrr = 50; batch_size = 1; data=zeros(1,39); for qqq = 1:length(frames_to_play) i = frames_to_play(qqq); % to load in 'batch_size' frames at a time mov_i = mod(i-1,batch_size)+1; if mov_i==1 read_to = min(i+batch_size-1, length(frames)); im_array = readvideo(dir_name, frames(i:read_to)); end im_full = double(im_array{mov_i})/255; im = im_full(1:2:end, 1:2:end, :); subplot(2,2,[1 3]); imagesc(im); axis image; axis off; xlabel(i); %ajust axis,quit labels if (torso_max(i) < 0.1) | (head_max(i) < 1) title(sprintf('frame %g, i = %g, %s',frames(i),i,'PLAYER NOT DETECTED')); detected(i) = 0; else detected(i) = 1; hold on; h = plot(torso_x(i),torso_y(i),'go'); set(h,'LineWidth',2); %green circle in torso h = plot(head_xS(i),head_yS(i),'ro'); set(h,'LineWidth',2); %red circle in head

Appendix C – MATLAB

38

%CHAPTER 3 -COMPUTING FEET POSITION %h1=(a*y1)+b; %h2=(a*y2)+b; aeq=0.3333; beq=6.2001; matrix_feet_pos_3D(i,:)=[torso_x(i),(head_yS(i)+((head_yS(i)*aeq)+beq))]; h=plot(matrix_feet_pos_3D(i,1),matrix_feet_pos_3D(i,2),'yo');set(h,'LineWidth',2);hold off; title(sprintf('frame %g, i = %g, t max = %2.2g, h max = %2.2g',frames(i),i,torso_max(i),head_max(i))); end draw_court([court_feat_x(i,:)' court_feat_y(i,:)'], [1 1 0]); matrix_torso_pos_3D(i,:)=[torso_x(i),torso_y(i)]; %CHAPTER 4 - PROJECTIVE TRANSFORMATION pts_new = [1,4,14,11,1]; pts_square=[1,4,14,11]; court_feat_x_new=court_feat_x(i,:); court_feat_y_new=court_feat_y(i,:); matrix_court_pos_x_3D(i,:)=[court_feat_x_new(1,pts_new)]; matrix_court_pos_y_3D(i,:)=[court_feat_y_new(1,pts_new)]; origin_square_x=[court_feat_x_new(1,pts_square)]; origin_square_y=[court_feat_y_new(1,pts_square)]; destination_square_x=[0 10.97 10.97 0]; destination_square_y=[0 0 23.77 23.77]; [a,b,c,d,e,f,g,h]=projective_transformation_param(origin_square_x,origin_square_y,destination_square_x,destination_square_y); [matrix_feet_pos_2D(i,1),matrix_feet_pos_2D(i,2)]=projective_transformation_applied(a,b,c,d,e,f,g,h,matrix_feet_pos_3D(i,1),matrix_feet_pos_3D(i,2)); [matrix_court_pos_x_2D(i,1),matrix_court_pos_y_2D(i,1)]=projective_transformation_applied(a,b,c,d,e,f,g,h,matrix_court_pos_x_3D(i,1),matrix_court_pos_y_3D(i,1)); [matrix_court_pos_x_2D(i,2),matrix_court_pos_y_2D(i,2)]=projective_transformation_applied(a,b,c,d,e,f,g,h,matrix_court_pos_x_3D(i,2),matrix_court_pos_y_3D(i,2)); [matrix_court_pos_x_2D(i,3),matrix_court_pos_y_2D(i,3)]=projective_transformation_applied(a,b,c,d,e,f,g,h,matrix_court_pos_x_3D(i,3),matrix_court_pos_y_3D(i,3)); [matrix_court_pos_x_2D(i,4),matrix_court_pos_y_2D(i,4)]=projective_transformation_applied(a,b,c,d,e,f,g,h,matrix_court_pos_x_3D(i,4),matrix_court_pos_y_3D(i,4)); [matrix_court_pos_x_2D(i,5),matrix_court_pos_y_2D(i,5)]=projective_transformation_applied(a,b,c,d,e,f,g,h,matrix_court_pos_x_3D(i,5),matrix_court_pos_y_3D(i,5)); subplot(2,2,[2 4]);hold on; h=plot(matrix_court_pos_x_2D(i,:),matrix_court_pos_y_2D(i,:),matrix_feet_pos_2D(i,1),matrix_feet_pos_2D(i,2),'yo');axis ij;axis([-3 13 -10 33]);set(h,'LineWidth',2); hold off; %generate the data matrix with all the data data(i,:)=[frames(i),matrix_feet_pos_2D(i,1),matrix_feet_pos_2D(i,2),torso_x(i),torso_y(i),head_xS(i),head_yS(i),matrix_feet_pos_3D(i,1),matrix_feet_pos_3D(i,2),court_feat_x(i,:),court_feat_y(i,:),torso_max(i),head_max(i)]; drawnow; end

Appendix C - MATLAB

39

% function to compute H matrix of projective transformation function [a,b,c,d,e,f,g,h]=projective_transformation_param(origin_square_x,origin_square_y,destination_square_x,destination_square_y) alfa=[origin_square_x;origin_square_y;ones(size(origin_square_x));zeros(size(origin_square_x));zeros(size(origin_square_x));zeros(size(origin_square_x));(-origin_square_x.*destination_square_x);(-origin_square_y.*destination_square_x)]'; beta=[zeros(size(origin_square_x));zeros(size(origin_square_x));zeros(size(origin_square_x));origin_square_x;origin_square_y;ones(size(origin_square_x));(-origin_square_x.*destination_square_y);(-origin_square_y.*destination_square_y)]'; A=[alfa;beta]; B=[destination_square_x,destination_square_y]'; X=inv(A)*B; a=X(1); b=X(2); c=X(3); d=X(4); e=X(5); f=X(6); g=X(7); h=X(8); % function to compute feet location in bird view function [x,y]=projective_transformation_applied(a,b,c,d,e,f,g,h,point_x,point_y) x=((a*point_x)+(b*point_y)+c)./((g*point_x)+(h*point_y)+1); y=((d*point_x)+(e*point_y)+f)./((g*point_x)+(h*point_y)+1); % function to draw court function draw_court(feat_pts,color) %hold on; % draw border pts = [1,4,14,11,1]; h = plot(feat_pts(pts,1), feat_pts(pts,2), 'r-'); set(h,'Color',color); % draw tram lines pts = [2,12]; h = plot(feat_pts(pts,1), feat_pts(pts,2), 'g-'); set(h,'Color',color); pts = [3,13]; h = plot(feat_pts(pts,1), feat_pts(pts,2), 'g-'); set(h,'Color',color); % centre line pts = [6,9]; h = plot(feat_pts(pts,1), feat_pts(pts,2), 'g-'); set(h,'Color',color); % rest of service boxes pts = [5,7]; h = plot(feat_pts(pts,1), feat_pts(pts,2), 'g-'); set(h,'Color',color); pts = [8,10]; h = plot(feat_pts(pts,1), feat_pts(pts,2), 'g-'); set(h,'Color',color); hold off;


40

% function to apply one dimensional median filter function torso_x = medfilt1_section(torso_x,start_ind,end_ind,fr); range = [repmat(start_ind,1,fr), start_ind:end_ind, repmat(end_ind,1,fr)]; temp = medfilt1(torso_x(range), 5); torso_x(start_ind:end_ind) = temp(fr+1:end-fr); % function to build scaling vector same height as image function scaling_im = make_scaling_im(im,court_feat); % build scaling vec same height as image near_size = court_feat(14,1) - court_feat(11,1); far_size = court_feat(4,1) - court_feat(1,1); baseline(1) = mean(court_feat(1:4,2)); baseline(2) = mean(court_feat(11:14,2)); baseline = round(baseline); court_height = baseline(2)-baseline(1); d_scale = (near_size - far_size)/court_height; scaling_vec = [1:size(im,1)]*d_scale; scaling_vec = scaling_vec - scaling_vec(baseline(2)) + near_size; scaling_vec = 1/near_size*scaling_vec; scaling_im = repmat(scaling_vec',1,size(im,2)); %function to read a video finto a cell array function im_array = readvideo(filename,frames,override); %READVIDEO read video file IM_ARRAY = READVIDEO(FILENAME, FRAMES) reads a video FILENAME into the cell array IM_ARRAY. The Ith element of IM_ARRAY{I} will then contain the rgb image corresponding to the Ith frame of the video. FRAMES should be vector of frames to be read. Note sequences of continuous frame are read more quickly than isolated frames. The code reads the video frames as 24-bit RGB images and uses the FFMPEG libraries libavformat and libavcodec to decode the video, see http://ffmpeg.sourceforge.net/ for more information and supported formats. IM_ARRAY = READVIDEO(FILENAME, FRAMES, 'override') to load more than 100 frames at once. Loading too many frames uses a lot of memory and can hang your computer so use override with care. You can change the number of frames loadable before requiring 'override' by changing the max_limit variable in readvideo.m Video frame numbers are indexed from 0 whereas the cell array is indexed from 1. Gareth Loy & Josephine Sullivan, KTH, Stockholm, 2004. % maximum number of frames that can be loaded at once without the override max_limit = 100; % check if file exists (readvideo_mex may seg fault if the file doesn't exist) fid = fopen(filename, 'r'); if (fid == -1) error(sprintf('Can''t open file "%s" for reading;\n it may not exist, or you may not have read permission.', ... filename));

Appendix C - MATLAB

41

else fclose(fid); end if (nargin < 3) & (length(frames)>max_limit) error(sprintf('You have asked to load %g frames. \nTo load more than %g frames you must use the override option. \nSee help readvideo.', ... length(frames), max_limit)); end im_array = readvideo_mex(filename,frames); Program 2 %function to compute stroke sequences function [stroke_sequences_data,alarms]=stroke_sequences(data,win1,win2,thresx,thresy) %compute accelerations by least square filter round1=round(win1/2); round2=round(win2/2); posx=data(:,3); speedlsqx=lsqlin_filter(data(:,1),data(:,3),win1); accelerationlsqx=lsqlin_filter(data(round1+1:end-round1,1),speedlsqx,win2); posy=data(:,4); speedlsqy=lsqlin_filter(data(:,1),data(:,4),win1); accelerationlsqy=lsqlin_filter(data(round1+1:end-round1,1),speedlsqy,win2); %thresholds frames1=find(abs(accelerationlsqx)>thresx | abs(accelerationlsqy)>thresy); hitframes=data(frames1,1)'; %alarm frames Z=hitframes; D=diff(Z); A=Z(find(D~=1)); frames3=A; %alarm vector gf=zeros(length(frames3),1); for i=1:length(frames3) if i==1 gf=(frames3(i):frames3(i)+32); else gf=[gf,frames3(i):frames3(i)+32]; end end gf2=gf-7; %stroke sequences A2=sort(gf2); D2=diff(A2); A3=A2(find(D2==1)); frames2=A3;


42

%irrelevant shots irrelevant_shots_data=irrelevant_shots(data); stroke_sequences_data2=separate(frames2,irrelevant_shots_data); stroke_sequences_data3=separate(stroke_sequences_data2,irrelevant_shots_data-1); stroke_sequences_data4=separate(stroke_sequences_data3,irrelevant_shots_data-2); stroke_sequences_data5=separate(stroke_sequences_data4,irrelevant_shots_data-3); stroke_sequences_data=stroke_sequences_data5; %alarms D2=diff(A3); ALARMS=A3(find(D2~=1)); %function to compute irrelevant shots function [irrelevant_shots_data]=irrelevant_shots(data) posx=data(:,3); speedx=diff(data(:,3)); accelerationx=diff(diff(data(:,3))); posy=data(:,4); speedy=diff(data(:,4)); accelerationy=diff(diff(data(:,4))); irrelevant_shots_data2=find(abs(accelerationx)>0.3); irrelevant_shots_data=data(cut_frames2,1)'; %function to implement our least square filter function [y]=lsqlin_filter(t,x,window) %t is the horizontal axis %x is the axis we want to aprroximate by least square problem %window is the size of window we take. For example if we want from t-3 to t+3, window=6 for i=1:(length(t)-window) C1=t(i:i+window); C=[C1,ones(size(C1))]; d=x(i:i+window); z=lsqlin(C,d); if i==1 y=z(1,1); else y=[y,z(1,1)]; end end %function to delete B data from A data function [Z]=separate(A,B) c=A; for i=1:length(B); c(find(B(i)==A))=0; end salida=c(find(c~=0)); Z=salida;

Appendix C - MATLAB

43

Program 3 %function to compare speeds (taking the derivative vs smoothing with least square filter) and represent them function speed_comparison(data,win1,win2) posx=data(:,3); speedx=diff(data(:,3)); accelerationx=diff(diff(data(:,3))); posy=data(:,4); speedy=diff(data(:,4)); accelerationy=diff(diff(data(:,4))); round1=round(win1/2); round2=round(win2/2); posx=data(:,3); speedlsqx=lsqlin_filter(data(:,1),data(:,3),win1); accelerationlsqx=lsqlin_filter(data(round1+1:end-round1,1),speedlsqx,win2); posy=data(:,4); speedlsqy=lsqlin_filter(data(:,1),data(:,4),win1); accelerationlsqy=lsqlin_filter(data(round1+1:end-round1,1),speedlsqy,win2); plot(data(1:end-1,1),speedx); hold; plot(data(round1+1:end-round1,1)',speedlsqx,'r'); Program 4 %function to compare accelerations (taking the derivative twice vs smoothing with least square filter) and represent them function acceleration_comparison(data,win1,win2) posx=data(:,3); speedx=diff(data(:,3)); accelerationx=diff(diff(data(:,3))); posy=data(:,4); speedy=diff(data(:,4)); accelerationy=diff(diff(data(:,4))); round1=round(win1/2); round2=round(win2/2); posx=data(:,3); speedlsqx=lsqlin_filter(data(:,1),data(:,3),win1); accelerationlsqx=lsqlin_filter(data(round1+1:end-round1,1),speedlsqx,win2); posy=data(:,4); speedlsqy=lsqlin_filter(data(:,1),data(:,4),win1); accelerationlsqy=lsqlin_filter(data(round1+1:end-round1,1),speedlsqy,win2); plot(data(1:end-2,1),accelerationx); hold; plot(data(round1+round2+1:end-round2-round1,1)',accelerationlsqx,'r');

tennis stroke detection in video sequences - kth · tennis stroke detection in video sequences ......

Documents