chia-hao chung and homer chen - ualberta.cavzhao/temp/papers/mmsp15_058.pdf · chia-hao chung and...
TRANSCRIPT
978-1-4673-7478-1/15/$31.00 © 2015 IEEE
VECTOR REPRESENTATION OF EMOTION FLOW FOR POPULAR MUSIC
Chia-Hao Chung and Homer Chen
National Taiwan University
Emails: {b99505003, homer}@ntu.edu.tw
ABSTRACT
The flow of emotion expressed by music through time is a
useful feature for music information indexing and
retrieval. In this paper, we propose a novel vector representation of emotion flow for popular music. It
exploits the repetitive verse-chorus structure of popular
music and connects a verse (represented by a point) and
its corresponding chorus (another point) in the valence-
arousal emotion plane. The proposed vector representation
visually gives users a snapshot of the emotion flow of a
popular song in an intuitive and instant manner, more
effective than the point and curve representations of music
emotion flow. Because many other genres also have
repetitive music structure, the vector representation has a
wide range of applications.
Index Terms—Affective content, emotion flow,
music emotion representation, music structure.
1. INTRODUCTION
It is commonly agreed that music listening is an appealing
experience for most people because music evokes emotion
in listeners. As emotion conveyed by music is important
to music listening, there is a strong need for effective
extraction and representation of music emotion from the
music organization and retrieval perspective. This paper focuses on music emotion representation.
A typical approach to music emotion representation
condenses the entire emotion flow of a song to a single
emotion. This approach is adopted by most music emotion
recognition (MER) systems [1]–[3]. It works by selecting
a certain segment from the song and mapping the musical
features extracted from the segment to a single emotion.
The emotion representation is either a label, such as
happy, angry, sad, or relaxed, or the coordinates of a point
in, for example, the valence-arousal (VA) emotion plane
[4]. The former is a categorical representation, while the
latter is a dimensional representation [5]. A user can query songs through either form of single-point music emotion
representation, and a music retrieval system responds to
the query with songs that match the emotion specified by
the user [6], [7].
However, the emotion of a music piece varies as it
unrolls in time [8]. This dynamic nature has not been fully
explored for music emotion representation, perhaps
because the emotion flow of music is difficult to qualify
or quantify in data collecting and model training [1]. The
work that comes close is called music emotion tracking
[9]–[12], which generates a sequence of points at regular interval to form an affect curve in the emotion plane [13].
Four examples are shown in Fig. 1, where each curve is
generated by dividing a full song into 30-second segments
with 10-second hop size and by predicting the VA values
of all segments. Each curve depicts the emotion of a song
from the beginning to the end. We can see that the
variation of music emotion can be quite complex and that
a point representation cannot properly capture the
dynamics of music emotion.
The representation of emotion flow for music should
be easy to visualize, yet sufficiently informative to convey
the dynamics of music emotion. The conventional point representation of music emotion is the simplest one;
however, it does not contain any dynamic information of
music emotion. On the other hand, the affect curve can
fairly show the dynamics of music emotion, but it is too
complex to specify for users. Clearly, simplicity and
informativeness are two competing criteria, and a certain
degree of tradeoff between them is necessary in practice.
It has been reported that the emotion expressed by a
music piece has to do with music structure. Schubert et al.
[14] showed that music emotion flow can be attributed to
the changes of music structure. Yang et al. [15] reported that the boundaries between contrasting segments of a
music piece have rapid changes of VA values. Wang et al.
Fig. 1. Affect curves of four songs in the VA plane where diamonds indicate the beginning and circles indicate the end of the songs. The black curve is Smells Like Teen Spirit by Nirvana. The blue curve is Are We the Waiting by Green Day. The green curve is Dying in the sun by The Cranberries. The
red curve is Barriers by Aereogramme.
[16] showed that exploiting the music structure of popular
music for segment selection improves the performance of
an MER system. For popular music, the music structure
usually consists of a number of repetitive musical sections
[17]. Each musical section refers to a song segment that
has its own musical role such as verse or chorus. As
shown in Fig. 2, popular music typically has repetitive
verse-chorus structure and its emotion flow changes
significantly during the transition between verse and
chorus sections.
The burgeoning evidence of the strong relation between music structure and emotion flow motivates us to
develop an effective representation of emotion flow for
music retrieval. The proposed emotion flow representation
of a song is a vector in the VA emotion plane, pointing
from the emotion of a verse to the emotion of its
corresponding chorus. This representation is simple and
intuitive, which is made possible by exploiting the
repetitive property of music structure of popular music.
We focus on popular music in this paper because it has
perhaps the largest user base on a daily basis and because
its structure is normally within a finite set of well-known patterns [18]–[22].
In summary, the primary contributions of this paper
include:
A study on the music structure of popular music,
such as pop, R&B, and rock songs, is conducted to
demonstrate the repetitive property of the music
structure of popular music (Section 2).
A novel vector representation of emotion flow for
popular music is proposed. A comprehensive
comparison of the proposed vector representation
with the point and curve representations is presented (Section 3 and 4).
A performance study is conducted to demonstrate
the accuracy and effectiveness of the vector
representation in capturing the emotion flow of a
song (Section 5).
2. MUSIC STRUCTURE OF POPULAR MUSIC
Music is an art form of organized sounds. A popular song
can be divided into a number of musical sections, such as
introduction (intro), verse, chorus, bridge, instrumental
solo, and ending (outro) [18]. Such sections are structured (maybe repeatedly) in a particular pattern referred to as
musical form. Recovering the musical form is called
music structure analysis and can be considered a segmentation process that detects the temporal position
and duration of each segment [19]. Here, we briefly
review the common musical sections and their musical
roles.
Intro and outro indicate the beginning and the ending
sections, respectively, of a song and usually only contain
instrumental sounds without singing voice and lyrics.
However, not every song has intro or outro. For example,
composers may place a verse or a chorus in the beginning
or at the end of a song to make the song sound special.
The sections corresponding to verse or chorus normally
express a flow of emotion as the music unfolds. The verse usually has low energy, and it is the place where the story
of the song is narrated. Compared to verse, chorus is
emotive and leaves significant impression on listeners
[20]. Other structural elements, such as bridge and
instrumental solo, are optional and function as transitional
sections to avoid monotonous composition and to make
the song colorful. Bridge means a transition between other
types of sections, and instrumental solo is predominantly
the special transitional section of instrumental sounds.
To investigate music structure, we conduct an analysis
of NTUMIR-60, which is a dataset consisting of 60 English popular songs [23]. Because the state-of-the-art
automatic music structure analysis is not as accurate as
expected [19], [21], we perform the analysis manually.
The results are shown in Table 1. We can see that verse
and chorus indeed make a large portion of a song and on
the average appear 3.13 and 2.37 times, respectively, per
song. This is consistent with the findings by musicologists
that verse and chorus is a widely used musical form (aka
the verse-chorus form) for song writers of popular music
[20]. This also suggests that verse and chorus are the most
memorable sections of a song [22] and represent the main
affection of the song. The corresponding emotion flow gives listeners an affective sensation.
3. MUSIC EMOTION REPRESENTATION
In either the categorical or the dimensional approach, the
typical representation of music emotion represents the
affective content of a song by a single emotion. The
categorical approach describes emotion using a finite
number of discrete affective terms [24], [25], whereas the
dimensional approach defines emotion in a continuous
space, such as the VA plane [26], [27]. In this section, we first review the point and curve representation in the
dimensional approach and then present the vector
representation in detail.
Table. 1. Music structure statistics of the 60 English popular songs of the NTUMIR-60 dataset.
Intro Verse Chorus Others Outro
Times per song
0.93 3.13 2.37 1.28 0.48
Proportion to song
0.09 0.44 0.29 0.11 0.07
Fig. 2. (a) Music structure of Smells Like Teen Spirit by Nirvana. (b) The arousal values and (c) the valence values of all 30-
second segments of the song.
(a)
(c)
(b)
Verse Verse VerseChorus Chorus Chorus
3.1. Point and curve representation
In the dimensional approach, the VA values of a music
segment can be predicted from the extracted features of
the music segment through a regression formulation of the
MER problem [26]. The emotion of the music segment is
represented by a point in the VA plane. Given the entire
music segments of a song, one may select one of them to
represent the whole song. This gives rise to the single
point representation of music emotion in the VA plane, and a user only has to specify the coordinates of the point
in the VA plane to retrieve the corresponding song.
Although this method provides an intuitive way for music
retrieval, as discussed in Section 1, it is impossible to
represent the emotion flow of whole song by a single
point in the VA plane. In addition, which music segment
really represents the entire song is difficult to determine
automatically by computer.
By dividing a song into a number of segments and
predicting the VA values of each music segment [11], [12],
the collection of VA points forms an affect curve of the song in the VA plane. One may also represent valence and
arousal of the song separately, each as a function of time.
Although such affect curves can indeed show the emotion
flow of a song, the representation is too complex to be
adopted in a music retrieval system, because most users
are unable to precisely specify the affect curve of a song
even if it is a familiar one. In addition, how to measure the
similarity (or distance) between two affect curves with
different lengths is an open issue. Therefore, a simple
approach is desirable.
3.2. Vector representation
By exploiting the repetitive property of music structure of
popular music, we can represent the characteristic of
emotion flow in a much simpler way than the affect curve
representation. As discussed in Section 2, the verse-chorus
form is a common music structure of popular music and
has a strong relation to the emotion flow of a song.
Therefore, we leverage it to construct the emotion flow
representation of a song. The resulting representation is a
vector pointing from a verse to its corresponding chorus in
the VA emotion plane, as illustrated in Fig. 3.
Besides the positional information of the verses and
choruses in the VA plane, the vector representation
indicates the direction and strength of the emotion flow of
a song. Therefore, the vector representation is more
informative than the point representation. Since the two
terminals of a vector represent the emotions of a verse and
its corresponding chorus of a song, this representation is
more intuitive and simpler to use than the affect curve,
which does not explicitly present the structural information of a song.
Indeed, the vector representation does express the
main emotion flow of a song characterized in the verse-
chorus form. Table 2 shows a qualitative comparison of
the point representation, the affect curve representation,
and the proposed vector representation. We can see that
the vector representation of emotion flow is novel, simple,
and intuitive. Users can easily search songs by specifying
a vector in the VA plane as the query, and a music
retrieval system can quickly respond to the query
according to the proximity of a candidate song to the vector. In practice, a set of candidate songs can be
generated and ordered according to the proximity when
presented to the user. With this representation of music
emotion flow, many innovative music retrieval
mechanisms can be developed to match the needs of a
specific application.
Although we focus on popular music in this paper, the
repetitive property of music structure can also be found in
other genres, such as the sonata form and the rondo form
of classical music [28]. The vector representation is good
for the visualization of the emotion flow of such music as well.
4. IMPLEMENTATION
The MER system described in [26] serves as the platform
to generate the VA values of musical sections (segments).
The MER system consists of two main steps, as shown in
Fig. 4. The first step performs regression model training,
and the second step takes musical sections as inputs and
generates their VA values. The details of regression model
training and vector representation generation are described
in this section.
Fig. 3. Illustration of the vector representation of music emotion flow. The two terminals of the vector represent a verse and its
corresponding chorus in the VA plane.
Table. 2. A comparison of the point, the curve, and the proposed vector representations.
Point Curve Vector
Locational information
X* X X
Dynamic
information X X
Structural information
X
Complexity Low High Medium
* A checked box means yes.
4.1. Regression model training
Adopting the dimensional approach for MER, we define the valence and arousal as real values in [–1, 1] and
formulate the prediction of VA values as a regression
problem. Denote the input training data by (xi, yi), where 1
≤ i ≤ N, xi is the feature vector for the ith input data, and yi
is the real value to be predicted for the ith input data. A
regression model (regressor) is trained by minimizing the
mean squared difference between the prediction and the
annotated value [26].
The dataset NTUMIR-60, which is composed of 60
English popular songs, is used for training and testing. For
fair comparison, each song is converted to a uniform format (22,050 Hz, 16 bits, and mono channel PCM WAV)
and normalized to the same volume level. Then, each song
is trimmed to a 30-second segment manually to conduct
the subjective test and the feature extraction. In the
subjective test, each segment is annotated by 40
participants, and the mean of the annotated VA values is
calculated and used as the ground truth of the segment.
Then the MIRToolbox [29] is applied to extract 177
features including the following five types of acoustic
features: two dynamic features (the mean and the standard
deviation of root-mean-squared energy), five rhythmic
features (fluctuation peak, fluctuation centroid, tempo, pulse clarity, and event density), 142 spectral features (the
mean and the standard deviation of centroid, brightness,
spread, skewness, kurtosis, rolloff 85%, rolloff 95%,
entropy, flatness, roughness, irregularity, 20 MFCCs, 20
delta MFCCs, and 20 delta-delta MFCCs), six timbre
features (the mean and the standard deviation of zero
crossing rate, low energy, and spectral flux), and 22 tonal
features (12-bin chromagram concatenated with the mean
and the standard deviation of chromagram peak,
chromagram centroid, key clarity, HCDF, and mode). The quality of NTUMIR-60 for MER is evaluated and reported
in [23].
The regression models of arousal and valence are
trained independently. For accuracy, the support vector
regression (SVR) [30], [31] with radial basis kernel
function is adopted to train the regressors. A grid-search is
applied to find the best kernel parameter γ and the best
penalty parameter C [32], where γ ∈ {10–4, 10–3, 10–2, 10–1}
and C ∈ {1, 101, 102, 103, 104}.
To evaluate the performances of the regressors, ten-
fold cross validation is conducted. The whole dataset is
randomly divided into 10 parts, nine of them for training
and the remaining one for testing. The above process is
repeated 50 times. The average performance in term of the
R-squared value [33] is 0.21 for valence and 0.76 for
arousal. This result is comparable to the one reported in
the previous work [23], [26].
4.2. Generating vector representation
The audio segmentation method proposed in [34] is
applied to segment each song of the NTUMIR-60 dataset.
All verses and choruses are manually selected from the
song based on the result of segmentation, and their VA
values are estimated independently. In our current
implementation the vector representation of the song in the VA plane is generated by connecting the point
representing the average verse with that representing the
average chorus. Fig. 5 shows the resulting vector
representations of all songs of the NTUMIR-60 dataset.
We can see that each vector clearly describes the emotion
flow of a song. For example, a vector in the first quadrant
pointing to the upper right corner indicates that the
corresponding song drives listeners toward a positive and
exciting feeling, whereas a vector in the second quadrant
pointing toward the upper left corner indicates that the
song it represents would drive listeners toward a negative and aggressive mood. We also see that, for most of the
songs, the arousal value of its representative chorus is
higher than that of the corresponding verse. That is, the
emotion vectors usually go upward. This reflects the fact
that the chorus is typically more exciting than its
corresponding verse [20].
5. EVALUATION
An experiment is conducted to evaluate the effectiveness
of the proposed vector representation of music emotion flow in comparison with two ad hoc methods. The
effectiveness of a method is measured in terms of the
approximation error between the method and the emotion
flow of a song. All songs of the NTUMIR-60 dataset are
considered in this experiment.
Fig. 5. The proposed vector representation provides an intuitive visualization of music emotion flow in the VA plane. This chart shows the emotion flows of all the songs in the NTUMIR-60 dataset. Each blue diamond represents the emotion of verses, and each red circle represents the emotion of choruses connected to
the corresponding verses by a line segment.
Fig. 4. Overview of an MER system
As discussed in Section 1, the emotion flow of a song
is difficult for a subject to specify; therefore, we use the
affect curve generated by MER as the ground truth.
Specifically, the affect curve of each song is generated by
dividing the full song into 30-second segments with 10-
second hop size and by predicting the VA values of all
segments. Then, a k-means algorithm [35] is applied to
partition the collection of VA points into two clusters. The
center points of these two clusters are used as reference to
calculate the approximation error of the proposed vector
representation and compare it with that of the two ad hoc methods.
The first ad hoc method randomly selects two 30-
second segments from a song and constructs a vector
representation from them. The second ad hoc method
selects the first segment from the 30th to 60th second of a
song and the second segment from the last 60th to last
30th second of the song. The VA values of the two
selected segments are predicted independently.
Two distance measures are considered: Euclidean
distance and cosine similarity [36]. The former is applied
to compute the difference of two vectors in length, and the latter is applied to compute the angular difference of two
vectors.
The experimental results are shown in Table 3. Note
that the process of randomly selecting two segments from
a song is repeated 100 times and the average results are
presented in the first column of Table 3. Compared with
the two ad hoc methods, the vector representation has the
smallest approximation error in both Euclidean distance
and cosine distance. This shows effectiveness of the
vector representation in capturing the emotion flow of
popular music.
In Fig. 6, the vector representation of the emotion flow of each song is plotted together with the affect curve of
the song and the emotion of each verse and chorus
identified for the song. We can see that most vectors are
located in the repetitive region of the affect curves. The
dangling parts of an affect curve normally correspond to
the intro and outro sections of the song, and hence they
are of no concern. We can also see that the verses are
located on one side of the affect curve of a song while the
choruses are located on the other side. Thus, using the
average verse and average chorus for the vector
representation can effectively characterize the affect curve and the emotion flow.
6. CONCLUSION
In this paper, we have investigated the repetitive property
of music structure and described a novel approach that represents the emotion flow of popular music by a vector
in the VA plane. The vector emerges from a representative
verse of a song and ends at the corresponding chorus. We
have also compared the proposed vector representation
Fig. 6. Most vectors (represented by a diamond-circle pair) generated by our method are in the repetitive region of the affect curves (shown in grey). The hollow diamond represents the emotion of a verse, and the hollow circle represents the
emotion of a chorus of a song.
Table. 3. Results of Euclidean and cosine distances between the ground truth and three different approaches.
Random F30L301 Vector
Euclidean distance
0.10 0.10 0.07
Cosine distance2
0.21 0.20 0.14
1 F30L30 means that the first segment is from the 30th to 60th
second and the second segment is from the last 60th to the last 30th second of a song. 2 Cosine distance is defined as 1 minus cosine similarity.
with point and curve representations of music emotion and
shown that the proposed method is an intuitive and
effective representation of emotion flow for popular music.
This property of our method is supported by experimental
results. This work is motivated by the increasing need for
effective music content representation and analysis in response to the explosive content growth. With the
proposed vector representation, the proximity of emotion
flow between two songs can be easily measured, which is
essential to music retrieval, and many innovative music
retrieval applications can be developed.
REFERENCES
[1] Y.-H. Yang and H. H. Chen, Music Emotion Recognition,
CRC Press, 2011. [2] Y.-H. Yang and H. H. Chen, “Machine recognition of
music emotion: A review,” ACM Trans. Intell. Syst. Technol., vol. 3, no. 3, article 40, 2012.
[3] Y. E. Kim, E. M. Schmidt, R. Migneco, B. G. Morton, P. Richardson, J. Scott, J. A. Speck, and D. Turnbull, “Music
emotion recognition: A state of the art review, ” in Proc. 11th Int. Soc. Music Inform. Retrieval Conf., pp. 255-266, Utrecht, Netherlands, 2010.
[4] J. A. Russell, “A circumplex model of affect,” J. Pers. Soc. Psychol., vol. 39, no. 6, pp. 1161-1178, 1980.
[5] T. Eerola and J. K. Vuoskoski, “A comparison of the discrete and dimensional models of emotion in music,” Psychol. Music, vol. 39, no. 1, pp. 18-49, 2010.
[6] X. Zhu, Y.-Y. Shi, H.-G. Kim, and K.-W. Eom, “An integrated music recommendation system,” IEEE Trans. Consum. Electron., vol. 53, no. 2, pp. 917-925, 2006.
[7] Y.-H. Yang, Y.-C. Lin, H.-T. Cheng, and H. H. Chen, “Mr. Emo: Music retrieval in the emotion plane,” in Proc. ACM Multimedia, pp. 1003-1004, Vancouver, Canada, 2008.
[8] E. Schubert, “Measurement and time series analysis of emotion in music,” Ph.D. dissertation, School of Music &
Music Education, University of New South Wales, Sydney, Australia, 1999.
[9] L. Lu, D. Liu, and H.-J. Zhang, “Automatic mood detection and tracking of music audio signals,” IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 1, pp. 5-18, 2006.
[10] M. D. Korhonen, D. A. Clausi, and M. E. Jernigan, “Modeling emotional content of music using system identification,” IEEE Trans. Syst. Man, Cybern. B, Cybern., vol. 36, no. 3, pp. 588-599, 2006.
[11] R. Panda and R. P. Paiva, “Using support vector machines for automatic mood tracking in audio music,” Audio Engineering Soc. Convention 130, London, UK, 2011.
[12] E. M. Schmidt, D. Turnbull, and Y. E. Kim, “Feature selection for content-based, time-varying musical emotion regression,” in Proc. ACM Int. Conf. Multimedia Inform. Retrieval, pp. 267-274, Philadelphia, USA, 2010.
[13] A. Hanjalic and L.-Q. Xu, “Affective video content
representation and modeling,” IEEE Trans. Multimedia, vol. 7, no. 1, pp. 143-154, 2005.
[14] E. Schubert, S. Ferguson, N. Farrar, D. Taylor, and G. E. McPherson, “Continuous response to music using discrete emotion faces,” in Proc. 9th Int. Symp. Computer Music Modelling and Retrieval, pp. 1-17, London, UK, 2012.
[15] Y.-H. Yang, C.-C. Liu, and H. H. Chen, “Music emotion classification: A fuzzy approach,” in Proc. ACM
Multimedia, pp. 81-84, Santa Barbara, USA, 2006. [16] X. Wang, Y. Wu, X. Chen, and D. Yang, “Enhance popular
music emotion regression by importing structure information,” in Proc. Asia-Pacific Signal and Inform.
Process. Association Annu. Summit and Conf., pp. 1-4, Kaohsiung, Taiwan, 2013.
[17] B. Horner and T. Swiss, Key Terms in Popular Music and Culture, Blackwell Publishing, 1999.
[18] N. C. Maddage, C. Xu, M. S. Kankanhalli, and X. Shao,
“Content-based music structure analysis with applications to music semantics understanding,” in Proc. ACM Multimedia, pp. 112-119, NY, USA, 2004.
[19] J. Paulus, M. Müller, and A. Klapuri, “Audio-based music structure analysis,” in Proc. 11th Int. Soc. Music Inform. Retrieval Conf., pp. 625-636, Utrecht, Netherlands, 2010.
[20] D. Christopher, “Rockin' out: expressive modulation in verse–chorus form,” Music Theory Online, vol. 17, 2011.
[21] J. B. L. Smith, C.-H. Chuan, and E. Chew, “Audio properties of perceived boundaries in music,” IEEE Trans. Multimedia, vol. 16, no. 5, pp. 1219-1228, 2014.
[22] M. Cooper and J. Foote, “Summarizing popular music via structural similarity analysis,” in Proc. IEEE Workshop on Applications of Signal Process. Audio and Acoustic, pp. 127-130, New Paltz, NY, USA, 2003.
[23] Y.-H. Yang, Y.-F. Su, Y.-C. Lin, and H. H. Chen, “Music
emotion recognition: The role of individuality,” in Proc. ACM Int. Workshop on Human-centered Multimedia, pp. 13-21, Augsburg, Bavaria, Germany, 2007.
[24] X. Hu, J. S. Downie, C. Laurier, M. Bay, and A. F. Ehmann, “The 2007 MIREX audio mood classification task: Lessons learned,” in Proc. 9th Int. Conf. Music Inform. Retrieval, pp. 462-467, Philadelphia, USA, 2008.
[25] C. Laurier, J. Grivolla, and P. Herrera, “Multimodal music mood classification using audio and lyrics,” in Proc. IEEE
7th Int. Conf. Machine Learning and Applications, pp. 688-693, San Diego, California, USA, 2008.
[26] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen, “A regression approach to music emotion recognition,” IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 2, pp. 448-457, 2008.
[27] E. M. Schmidt and Y. E. Kim, “Projection of acoustic features to continuous valence-arousal mood labels via
regression,” in Proc. 10th Int. Soc. Music Inform. Retrieval Conf., Kobe, Japan, 2009.
[28] M. Hickey, “Assessment rubrics for music composition,” Music Educators Journal, vol. 85, no. 4, pp. 26-33, 1999.
[29] O. Lartillot and P. Toiviainen, “A MATLAB toolbox for musical feature extraction from audio,” in Proc. Int. Conf. Digital Audio Effects, pp. 237-244, Bordeaux, France, 2007.
[30] A. J. Smola and B. Schölkopf, “A tutorial on support vector
regression,” Stat. Comput., vol. 4, no. 3, pp. 199-222, 2004. [31] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support
vector machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, article 27, 2011.
[32] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A practical guide to support vector classification,” Technical report, National Taiwan University, 2010 [Online]. Available at: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.
[33] A. Sen and M. S. Srivastava, Regression Analysis: Theory, Methods, and Applications, Springer Science & Business Media, 1990.
[34] J. Foote and M. Cooper, “Media segmentation using self-similarity decomposition,” in Proc. SPIE Storage and Retrieval for Multimedia Databases, vol. 5021, pp. 167-175, 2003.
[35] S. P. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Inform. Theory, vol. 28, no. 2, pp. 129-137, 1982.
[36] L. Lee, “Measures of distributional similarity,” in Proc. 37th Annu. Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 25-32, PA, USA, 1999.