BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
0 Activity Recognition System
for Baby Monitoring
T u t o r : D r C a t h a l G U R R I N
P r a c t i c u m C o o r d i n a t o r
D u b l i n C i t y U n i v e r s i t y
3 1 / 0 8 / 2 0 1 2
F i n a l r e p o r t f o r a F i n a l - y e a r
p r o j e c t a t t h e I N S A L y o n ,
T e l e c o m m u n i c a t i o n s , S e r v i c e s &
U s a g e s d e p a r t m e n t
BARREAU Pierrick INSA Lyon, Telecommunications Dept.
DCU, MSc. In Electronic Commerce
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
1
Summary
1. Context of the study ........................................................................................................................ 2
1.1 Practicum presentation ........................................................................................................... 2
1.2 SensAnalytics: Goals and motivations ..................................................................................... 2
1.3 Market Research: Process and Findings .................................................................................. 2
2. Product specifications ..................................................................................................................... 4
2.1 Product definition .................................................................................................................... 4
2.2 Functional analysis .................................................................................................................. 5
3. State of the art Overview ................................................................................................................ 7
3.1 Baby activity recognition: Characteristics and Challenges ...................................................... 7
3.2 Current state-of-the-art .......................................................................................................... 8
3.3 Complete solution overview ................................................................................................. 17
4. Solution development &optimization ........................................................................................... 18
4.1 Development environment ................................................................................................... 18
4.2 The jAudio library .................................................................................................................. 19
4.2.1 Presentation and reliability .................................................................................................. 19
4.2.2 How will we use it? ............................................................................................................... 20
4.3 Solution development ........................................................................................................... 21
4.3.1 Recording sound with Android ............................................................................................. 21
4.3.2 Signal pre-processing system ............................................................................................... 22
4.3.3 Feature extractors ................................................................................................................ 22
4.3.4 Matching function ................................................................................................................ 22
4.4.1 Recognition testing ............................................................................................................... 22
4.4.2 Performance testing ............................................................................................................. 23
4.4.3 Future improvements ........................................................................................................... 24
5. Experience Feedback ..................................................................................................................... 25
Appendices ............................................................................................................................................ 26
References ............................................................................................................................................. 34
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
2
1. Context of the study
1.1 Practicum presentation
Currently pursuing a Master in Electronic Commerce at the Dublin City University, I performed an
innovative company creation project (called Practicum) over the summer with a team of 5 persons.
In order to cater at the same time with my duties as an INSA Lyon engineering student, I turned this
exercise into a Research & Development project fulfilling the requirements for both formations.
Before detailing any further the content of the following R&D project, let us introduce the Practicum
principles. Similar to the Innovation project conducted during the fourth year of the
Telecommunications department’s formation, the Practicum goals are to assess our understanding of
the subjects (both technical and business) taught during the year. The outcome is a start-up creation
project containing insights into both the business aspects (business model and processes, marketing)
and a technical mock-up proving the viability of the concept supported by the team.
As part of an international team composed of 3 business- and 2 engineering-background students,
we developed a start-up called SensAnalytics. I took the role of quality manager, business analyst and
developer, which allowed me to have a complete overview of the R&D process and to realize the
technical implementation required for my engineering degree. Let us introduce the initial goals and
motivations of my team.
1.2 SensAnalytics: Goals and motivations
In our everyday lives, millions of events take place around and inside our bodies. We as humans
naturally capture and interpret some of this data; however most of it is lost or not understood. Our
start-up SensAnalytics seeks to acquire these complex data and turn them into human readable
communication.
With the will of establishing our product in an original market, yet untouched by recent high
technologies we aimed at delivering a product for the baby market. The most promising segment
appeared to be the baby care one, because it is the first expenses budget of parents after food. From
there, we chose to design a baby monitor, because it is the technology-related product. Our initial
idea was to build a technological sustainable advantage through the use of a heterogeneous Wireless
Sensor Network (WSN) which gathers complex biometric data in order to conclude on the child’s
status (health, sleep cycle, emotions …) using machine learning algorithms. However our idea
changed as we considered the market environment.
1.3 Market Research: Process and Findings
In order to answer at best the parenting market’s needs it aims at serving, SensAnalytics’ first task
has been to analyse the worldwide market in order to identify and gather all the positive business
drivers to help settling its products.
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
3
Following the process outlined in Figure 1,
the whole team began by defining what the
core elements of its offer were. We
summarized our project as an improvement
to existing baby monitors that allow
parents to get access to valuable analytics
at any time and any place. The core
element of the value proposition is the
access to comprehensive data helping
parents in their daily decision-making and
empowering the child to communicate with
them. After having determined our target
market as being the UK considering a set of
indicators, we surveyed a pool of first-time
parents collecting their needs. Then we
studied the marketplace identifying the
strengths and weaknesses of the
competitor’s offers and designed an offer
that fits the best the gap between the two.
In order to put some context on our decisions, let us review our key findings.
- 78% of respondents put security guarantees (reliable communication, medical certifications
…) as the most important characteristic of a baby monitor. The price is the second
determinant factor in the purchase decision with 82% ranking it over size, simplicity of use.
- 84% of our target market owns a smartphone, which constitutes a higher penetration rate
than the UK population (62%).
- 96% already searched on Internet if their child was normal compared to the age norms. They
also admit feeling stressed by their child’s mental and physical development and frustrated
by the poor results of a Google search.
- The baby monitor marketplace is crowded and involves big players such as Philips or
Motorola. It’s difficult to establish a hardware product as the R&D process involves heavy
costs and competitors have already rationalised their production chain expenses.
- The baby monitor smartphone app market is competitive as well, but with low-quality
products and no implanted big players.
These statements allow us analysing the market environment to define a product that suits the best
the gap between the users’ needs and the current competitors’ offers.
Figure 1: Market Study Process
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
4
2. Product specifications
2.1 Product definition
After analysing the results of the survey and the market analysis, we define our final solution and opt
for a three step product specification that would integrate the necessary success factors presented
earlier. As illustrated bellow, these three steps would take place in a three year plan to continuously
develop a sustainable business.
In the following technical study, we only consider the app-to-app baby monitor. It works by using two
smartphones, one placed with the child as a monitor, and the other with the parent as a receiver.
The monitor detects auditory events, in particular crying and talking and then notify the receiver that
alerts parents about activity, and allows them to listen to the monitor in real-time. Unlike traditional
baby-monitoring solutions, our product works over WIFI and 3G, and parents can therefore monitor
their child from any location.
Thus, this first development step consists in offering a simple smartphone baby-monitor application
with the most important features to ensure a secure, simple and efficient service easily accessible
and open to important sophistication for the future steps.
Features:
Simple audio Monitoring: The user can listen to his child at any time anywhere
2 way audio talk: the user can talk to his baby anytime anywhere via his smartphone and the
audio will be played on the other smartphone’s speakers.
Alerts when the baby is awake and crying with sensitivity control
Customizable events associated with actions: if the baby cries, the user can configure the
application to automatically play a song or other audio track.
Sleep cycle analysis via auto generated tables.
Figure 2: Product specification plan
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
5
Having identified the core product we aim at bringing to the market, we start the development of
version 1.0 of the mobile application by conducting a functional analysis. This first step allows writing
a complete specification with requirements and constraints, along with identifying the different
development tasks the R&D team will have to perform.
2.2 Functional analysis
External analysis
The functions resulting from the external functional analysis are drawn in the chart below. In order to
translate their importance and to give concrete objectives to development teams, each of them is
given weights along with an indicator of success and a target range the product has to comply with at
the end of the development.
N° Service functions (FS) Weigh Objective indicator(s) Range
Related to baby security
S1 Help configuring settings 2 Learning time 1 – 5 min
S2 Acquire baby data 5 Data loss < 10%
S3 Recognize baby activity 4 Positive recognition rate
False positive rate
> 70%
< 30%
S4 Trigger actions accordingly 3 Nb possible actions
Nb possible events
5
5
Related to baby evolution
E1 Gather baby evolution information 2 Nb milestones info
Nb medical info
Nb monitored info
10
5
All possible
E2 Compare with norms 4 Nb comparison indic.
Comparison time
All possible
< 10s
N° Constraint functions (FC) Weigh Objective indicator(s) Range
Constraints related to parents
1.1 Intuitive interface 4 Learning time
Language supported
1 – 5 min
ENG, FR
1.2 Interface accessible everywhere 3 Access supported Wi-Fi, 3G, Internet
1.3 Need reliability and security guarantees
3 Application downtime
Security guarantees
< 48h / year
Interferences resilience, Battery
monitoring
1.4 Need quality certifications 2 ISO norms ISO 9001
Constraints related to baby
2.1 Should be harmless for health 5 Intensive test phases
Smartphone size
Toxic products
> 3
H > 6cm, W > 6cm
0%
2.2 Should not be reachable from bed 2 Min acquisition range > 50cm
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
6
The “Octopus” chart resulting from the analysis is presented in Appendix 1.
Internal analysis
Having identified the market requirements and constraints, we then conducted a FAST analysis (see
Appendix 2) giving us the different development parts that needed to be considered. The diagram
highlights the main function of the system performed through six service functions. Each of these
service functions evolves technical functions internal to the system and corresponding to physical
2.3 Should be strong 2 Resilience to shock Yes
2.4 Baby-side applications should be silent 4 Block calls & messages
Ensure silent mode
Yes
100% use time
Constraints related to smartphones
3.1 Should be hosted on smartphones with good battery life and computation power
4 Battery life
Processor power
> 10h
> 1.5 GHz
3.2 Should use few computation cost 3 CPU use rate < 15%
3.3 Should use little battery 3 Battery impact < 10%
3.4 Should always keep top priority 2 Priority downtime < 5%
Constraints related to server
4.1 Should be hosted on server with good response time and high availability
4 Server response time
Server downtime
< 1 sec
< 48h / year
4.2 Should have enough space to store data
3 Database initial capacity
DB capacity growth
> 1 To
> 5 To / year
Constraints related to environment
5.1 Should adapt to background noises 4 SNR > 5 dB
5.2 Should be resilient to interferences 5 Intersymbol interference BER < 15%
5.3 Should provide good acquisition range 3 Max acquisition range > 5m
Constraints related to price
6.1 Cheap app purchasing cost 5 Price 0 £
6.2 Should only use smartphones 3 Extra device 0
Legal constraints
7.1 Medical data stored anonymously 5 Anonymous storage Yes
7.2 Sensitive data are secured 5 Development security guidelines
Communication protocol
Common criteria level 2
SSL-TLS
7.3 Customers informed about stored data
5 Terms and conditions Yes
Constraints related to the mobile network
8.1 Should be always connected 4 Alerts to parents Yes
8.2 Should ensure alerts always forwarded
5 Message loss < 2%
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
7
and hardware solutions. Considering the set of identified technical functions and selected solutions,
the FAST diagram imposes several features for the implementation:
- An efficient network architecture,
- A reliable app-server communication over 3G network,
- An activity recognition system,
- An intuitive application design aimed at parents.
Partnering with another engineer, we split the workload into two parts. I chose to focus on the
network architecture and on the implementation of an activity recognition system. In the following I
will outline the R&D process associated with the ARS solution.
In order to explain progressively our solution, we will briefly highlight the main aspects and
challenges of the baby activity recognition field. With a clearer understanding of the domain, we
review the most useful sound features and the most common techniques to consider for waking and
cry detection. By assessing them according to the product’s functional requirements, we conclude on
the solution that fits best our needs.
3. State of the art Overview
3.1 Baby activity recognition: Characteristics and Challenges
As every human activity recognition field, the baby one faces the same difficulty: translating the
complex stimuli information of the human body into computer-understandable data [1]. What our
brain is able to perform innately through cortex and synapse interactions demands many
computation power from machines and a lot of careful studies from scientists [2]. Moreover, on the
same way that our understanding of someone’s behaviour is not entirely reliable, computers
algorithms can only be partially trusted when concluding on emotions or activity recognition.
Therefore we see appear the two of the domain’s characteristics inherent from the humanity nature:
- The complexity of the computation algorithms (regarding both implementation and execution).
- The uncertainty of results, which make any solution only partially reliable.
The fact that we only use sound as an input for the recognition is also determinant. The auditory
environment surrounding the baby can be at any moment polluted by noises coming for different
sources that we cannot identify upfront [3]. These noises can trick the algorithms by presenting the
same characteristics as a baby cry or waking signal and thus alter the results (this situation is called
false positives). They can also overlap and distort an actual baby signal, thus causing the algorithm
not to detect the activity (that situation is called false negatives) [4]. Therefore, a major challenge of
our solution is to reliably differentiate baby signals from any background noises. A first processing
step will thus isolate and amplify this signal to ensure that recognition algorithms will always be
passed good enough quality signals.
Other difficulties are inherent to the voice evolution at the early stages of the child’s life. All humans
have different vocal attributes but some frequencies are similar and thus retrieval of activities based
on sound can be done quite reliably. However, during the 6 to 12 first months of the infant existence,
his voice evolves to get its first stabilized form. This initial state influences greatly his cry or voice
signals and is variable depending on his ethnical origins [5-6], the diseases he may have [7-8], but
also considering the prenatal conditions of his birth (drugs [9], alcohol consumption [10] by the
mother, pre-term/full-term [10-11]…) and its auditory capabilities [12]. This can be seen as a problem
as it creates a requirement for specific cases recognition, but as the research field is currently well
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
8
documented on their sound characteristics it can also be seen as a future opportunity as the
application could be turned into a disease detector [7-8]. However, even giving these slight changes,
the neonatal cry is a reasonably patterned vocal behaviour considered to have innate biological
function.
To sum up, our product need a pre-processing signal segmentation step and to take into account the
sound features that are relevant to the major part of the baby population. Considering that the
activity recognition aims at triggering alerts and actions, false negatives (when a real cry signal is not
detected) are far more critical that false positives (when the algorithms is tricked into recognizing a
fake cry signal), because parents would prefer to be alerted more than needed instead of missing an
important moment. The solution will thus be selected according to 3 characteristics:
- Its false negatives rate.
- Its ability to recognize specific cases.
- But also its computation power demand.
To conclude, even if this research field is still full of technical challenges yet to be answered, there
are some solutions that are able to recognize a baby cry and waking status with a promising success
rate. Let us review the current state-of-the-art and compare them considering our products
requirements.
3.2 Current state-of-the-art
Generally, infant activity automatic classification process is a pattern recognition problem. It
comprises two main stages that are: signal processing and pattern classification [13]. However,
concerning our product, a first stage is added and consists in detecting infant cries from audio
records. Once cry samples are detected and extracted from audio records, it is possible to apply the
signal processing and pattern recognition steps. The signal processing step aims at normalizing,
cleaning and filtering the raw signal before using the suitable feature extraction techniques to build a
vector of relevant values. This vector serves then as input for the classification algorithms which will
compare them against their norms to conclude on the recognition or not of a given activity.
Each step has its own set of technical solutions that can be then associated together to form a
complete baby activity recognition system. We will analyse the different techniques available at each
stages and conclude on the most suitable association for the product.
3.2.1 Signal Pre-processing
The pre-processing step is about isolating the baby sound signal through filtering and amplifying it.
The challenge here is to design a digital filter which can process the sound in real-time and without
too extensive resources. We opt for a low-pass Finite Impulse Response (FIR) filter at the highest
frequency of the baby cry spectrum. As it ranges from 0.1 to 10 KHz for the fundamental frequency
and the formants, we opt for a FIR at 10 KHz with an attenuation of -30 to -50 dB in the stop-band
and a ripple of 3 dB in the pass-band. It will filter the high frequencies coming from mobile networks
or home apparels surrounding the baby [4-5]. By computing a time domain convolution, we end up
with a filtered signal. Associated with a peak detector and an amplifier, the resulting sound is then
altered to only amplify the frequencies coming from the infant.
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
9
The audio has also to be sampled with an accurate frequency in order to reduce the computational
complexity while keeping a sufficient sound quality for future cry detection and feature extraction
steps. The 8 kHz sampling frequency is generally used for infant speech analysis [14], but a 20 KHz
sampling with 16-bit quantization has also be used with success by Robb & al. for the determination
of fundamental frequency and formants of baby cry [28]. Both will be tested during the
implementation.
3.2.2 Features extraction
Once the signal has been cleaned, we can study the most important features for baby activity
recognition and their extraction techniques. Most techniques found were related to cry detection.
The waking process is well described in theory but has not been addressed by scientists. However we
suggest our own way to detect it at the end of this section.
3.2.2.1 STE and STZC approach
Because of the physical limitations of human beings, speech analysis systems have to consider short
duration speech segments. Indeed, speech over short time intervals can be considered stationary,
overlapping these 10-30 ms segments by half is a method used to reduce the amount of computation
needed to analyse the infant cry signal [15].
The combination of two mathematical tools may be used to detect cry events from a pre-processed
audio record: the Short-Time Energy (STE) and the Short-Time Zero Crossing (STZC).
Short-Time Energy (STE)
Short-time energy (STE) is defined as the average of the square of the sample values in a suitable
window. It can be mathematically described as follows [15]:
( )
∑ ( ) ( )
where w(m) are coefficients of a suitable window function of length N. As previously mentioned,
short-time processing of speech should take place during segments between 10-30 ms in length. For
signals of 8 kHz sampling frequency, a window of 128 (which represents a segment of 16 ms) is
suitable. STE estimation is useful as a speech detector because there is a noticeable difference
between the average energy between voiced and unvoiced speech, and between speech and silence
[15]. This technique is usually paired with short-time zero crossing for a robust detection scheme.
Figure 3: Pre-processing system overview
Formula 1: STE formula
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
10
Short-Time Zero Crossing (STZC)
Short-time zero crossing (STZC) is defined as the rate at which the signal changes sign. STZC
estimation is useful as a speech detector because there are noticeable fewer zero crossings in voiced
speech as compared with unvoiced speech. It can be mathematically described as follows [15]:
( )
∑| ( ( )) ( ( ))| ( )
{ ( ( )) ( )
( ( ))
Figure 4 displays the results of short-time signal detection using both STE and STZC tools. STZC allows
to envelop periods when the signal changes sign with a significant rate (which is identified as speech
events) while STE allows to detect significant normalized energy within these envelops to conclude
on infant cry events.
In order to consistently pick up desired cry events, a desired cry was defined as a voiced segment of
sufficiently long duration and sufficiently noticeable STE. We can express it by using two quantifiable
threshold conditions that need to be met to constitute a desired cry:
(1) Normalized energy > 0.05: to eliminate non-voiced artefacts and cry precursors
(breathing, whimpering).
(2) Signal envelope period > 0.1 seconds: to eliminate impulsive voiced artefacts such as
coughing.
Figure 1 (a)
Figure 1 (b)
In Figure 1, each cry envelope is bounded by the STZC and the voiced portion of each cry is bounded
by where the STE meets the t = 0 axis. Figure 1(a) contains two false signals where STZC suggests an
infant vocalization has occurred. However, there is no significant STE to indicate the presence of a
voiced infant cry until the third vocalization. Even though this third vocalization meets the
normalized energy threshold of a voiced event, the duration does not meet the minimum time
period. This third vocalization was actually a cough.
Figure 4: Cry signal detection examples
Formula 2: STZC formula
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
11
The STZC in Figure 1(b) suggests that five vocalizations have occurred, four of which meet criterion
for a voiced cry. However, two of these voiced vocalizations are impulsive and of too short a
duration and thus are ruled out as cries through the envelope period threshold. The final
vocalization lacks the energy to be analysed as a cry event.
3.2.2.2 Frequency domain approach
Another approach to cry detection is to study the frequency domain of the signal by extracting:
- The vocal fundamental frequency (F0), which is the lowest frequency of the voice waveform.
- The formant frequencies, which indicates the acoustic resonance of the human vocal tract. They
are measured as amplitude peaks in the frequency spectrum of sound (see Figure 5).
- The Mel-Frequency Cepstral Coefficients (MFCC) features which allow capturing the spectral
discriminant of each signals.
To study the spectral domain, a first step is to transform the signal representation from time to
frequency domain using the Discrete Fourier Transform (DFT). This allows picturing the main
frequencies of a signal. Our product requires a fast and computation-efficient algorithm to compute
the DFT, thus by reviewing a benchmark of existing Fast Fourier Transform (FFT) algorithms [16], we
choose the solution of Pei-Chen & al. [17] as it allows real-time FFT computing using few
computational resources.
Once the frequency domain of the signal is determined, the fundamental frequency and the
formants can be measured using a peak detector, i.e. a function that finds maxima in the value range.
To increase the reliability of the detection, some techniques aims at smoothing the signal to help real
maxima appear. The Smoothed Spectrum Method (SSM) seems the most promising with an
efficiency of 97.99% against 95.50% for a classical local maximum value detector and 96.86% for the
Cepstrum analysis [23]. The idea is to use a weighted addition to smooth the spectrum and increase
the detection reliability.
Figure 5: Signal
frequency domain
representation
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
12
To determine the MFCCs, we follow the more complex process proposed by Vempada & al. [18]:
- Divide the cry signal into sequence of frames with a frame size of 20 ms and a shift of 10 ms.
- Apply the Hamming window over each of the frames.
- Compute magnitude spectrum for each windowed frame by applying DFT.
- Mel spectrum is computed by passing the DFT signal through Mel filter bank.
- DCT is applied to the log Mel frequency coefficients to derive the desired MFCCs.
The computation of these coefficients is CPU-intensive and is only supported in real-time on
important and optimized infrastructures. Yet it can provide further interesting development as new
initiatives to improve the algorithm are under development and because it allows distinguishing the
cry cause among 3 main types (hunger, pain, wet diaper) with a good reliability [18].
3.2.2.3 Rhythmic organisation of the sound
A final approach to cry detection is to consider it as a dynamic signal. The rhythmic organisation
analysis of the sound takes a look at the infant noise bursts and pauses durations. By monitoring the
magnitude spectrum of the infant expiratory sounds over time, an algorithm proposed by Sandford
Zeskind & al. [19] tries to find temporal features correlation among different individuals. However,
even if this solution can be run in real-time without the requirement for an efficient hardware, recent
investigations have proven that rhythmic organisation is not yet a reliable indicator for cry detection.
3.2.2.4 Waking detection system
In the literature, the detection of infant waking is mainly addressed by recognizing cries. However,
we believe that parents can find value knowing when their child is awake, not only when they cry but
to feed or change them. The current research attempts focus mainly on the sleep stages recognition
using complex biometric sensors such as Electro-encephalogram (ECG), accelerometers or Galvanic
Skin response (GSR) [20-21], but no dedicated auditory study of the temporal waking process of an
infant can be found.
According to Karraker & al. [22], the waking process has some detectable auditory events such as
giggles, sheets movements, or shocks. These are sudden noises, thus sudden changes in the signal
spectrum. This gave us the idea to monitor the signal spectrum changes over time. When sudden
peaks appear in several previously determined frequency ranges (e.g. voice spectrum) at repeated
instants over time, then conclusive evidence of an infant wake can be inferred. To support that idea,
one approach would be to compute the spectral density (PSD) of the signal every sample and to keep
track of the past PSD. If a sudden change appears at a specified frequency, then a variable is
incremented. If after a number of samples, no other change is detected then the variable is set to
zero. Otherwise, if the variable exceeds a threshold value then the waking activity is recognized.
The frequencies and variables involved in this solution will be defined during test sessions with baby
as it is a rather empiric system. The assumptions surrounding this idea will also be further tested with
different baby noises and environment before adding it to the customer-facing application.
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
13
3.2.2.5 Sound features extractors’ comparison
As we previously said, any feature extractor can be coupled with any classification algorithm in order
to form a complete activity recognition system. Therefore before examining the pattern detection
techniques, we decide in the following what features will be chosen for the final solution considering
our product requirements: reliability rate, computational cost, adaptability and evolution potential
towards new features.
With a false positive/negative rate of 75.6/86.5% (used on a real database) [15], the STE and STZC
approach seems promising, but need to be completed by another solution in order to improve its
reliability. It can be run with few resources if optimized and is adaptable and evolutionary for further
functionalities related to speech. It could for example detect when the baby pronounces its first
intelligible words [14].
The frequency domain approach is a very interesting solution. Detecting the fundamental frequency
can be done at a reliable rate (97.9%) using the Smoothed Spectrum Method (SSM) and helps
successfully detecting a cry at 99% (associated with neural network and used on the Baby Chillanto
database) [24]. It does not require extensive computation resources, but its only drawback is that the
solution can only be used for very specific detection.
As for the formant, their determination characteristic is similar to the fundamental frequency. Their
computation can be done in real-time but can impact the smartphone performance. Extensive
testing will be done on this part after implementation. The formants being a good indicator of the
human vocal tract, they could constitute the basis for further development of functionalities related
to emotions and speech. Moreover, by monitoring the signal frequency spectrum, changes can be
found and the waking recognition system could be subsequently implemented and tested.
The MFCCs are too CPU-intensive to be kept for implementation [24-26]. With the rise of
smartphone’s CPU power in the upcoming years, the implementation of a real-time computation
could be imagined, but it is currently unfeasible. However, they would be really interesting to classify
the cry causes and the emotions.
Finally, the rhythmic organisation is also not chosen, because of its low reliability rate (30-40%). If
further investigations are made in that area and new reliable temporal indicators are found, this
solution could be interesting as it does not require a lot of computational power [19]. It would also
allow bringing more contexts to the cry signal and, with the study of expiratory bursts, open new
perspectives for safety risks and diseases detections.
Considering these points, we choose to implement the extraction of the STE/STZC, the fundamental
frequency and the formants as features used for child’s status detection algorithms. The techniques’
comparison is summarized in the chart below.
Technique Reliability rate Computation cost Adaptability Evolution potential
STE/STZC + + + +
Frequency F0 ++ ++ - -
Formants ++ + + +
Mel frequency ++ -- ++ ++
Rhythm -- ++ ++ ++
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
14
3.2.3 Pattern recognition algorithms
Once the features have been extracted and form a vector of value, there are two main approaches to
recognize a pattern from this data:
- A static matching function which compare the values against known and identified norms giving
a matching score between the signal and an ideal activity-related signal. If the score is greater
than a decision-threshold, the activity is recognized.
- Machine learning algorithms which, rather than processing the data, act as a black box that
learn from precedent outputs its own classification and regression model and conclude directly
on a recognized activity giving the vector position in the data space.
Let us further detail and compare them.
3.2.3.1 Matching functions
The design of a matching function is empiric and involves three decisions that can severely impact its
performance. Firstly, different functions can be employed. The most simple and adapted to our case
is the weighted differential addition (see Formula 3). Giving a set of features that we have previously
determined (Normalised energy (STE), Signal envelope period (STZC), fundamental frequency (F0) and
formants (F1 – Fx), the function is the weighted sum of differences between the features values of a
given signal and an ideal activity signal. If the result of that function is lower than a threshold then
the activity is recognized.
∑ (
( ))
Once the function has been defined, the feature weights and the threshold values should be
determined. The weights attribution can be done considering: the importance of the features for the
activity recognition, their reliability (increased if reliable, else lowered), but also the usual gap range
between the signal and the norm in order to reduce the unwanted impact of a non-determinant
feature difference. The threshold value is defined through testing and experiences in order to lower
the false positives and negatives rates.
Once the matching function has been designed, it can be deployed anywhere. Considering our small
set of features, it uses little computational power. Their only drawback is that the determination of
the weights and threshold values should be done every time a new feature is added. However, once
the matching function class has been implemented, it can be used for other functionalities without
the need for any other development.
3.2.3.2 Machine learning algorithms
Machine learning is the branch of artificial intelligence that studies and develops architectures and
algorithms to equip an agent (a machine which is usually a computer) with certain behaviour and an
ability to build internal models from empirical training data in order to solve a certain task [27].
Among them we distinguish the Support Vector Machine (SVM) and the Neural Network (NN) that
are often used for auditory event and activity classification.
With w: output
n: number of indicators
wn: feature weight
vsignal: feature value for the studied signal
vnorm: feature value for the ideal signal
Formula 3: Weighted differential addition formula
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
15
Figure 7: Neural network model
Support Vector Machine (SVM)
A SVM is a two-categorical classifier, i.e. it can be used to conclude on the recognition of a given
activity or not. It is composed of an internal regression model which separates the value space into
two parts: the recognised pattern space and the rest. When the SVM receives a feature vector, it
projects it on the value space and concludes on the recognition or not of the activity considering the
position of the resulting point compared to the regression model. In order to build its internal
regression model, training algorithms (see Figure 6) are used to make it “learn” (i.e. build) it.
These algorithms are based on training samples. At each iteration the SVM is presented with a set of
sample feature vectors and its associated activity (e.g. crying / not crying). By processing these
examples, the SVM maps them into its internal value space and computes the regression model (e.g.
segmenting the space between crying and not crying activities). Once the SVM is trained, when
unmarked feature vectors are given to it, it is able to recognize the pattern in a time and
computation-efficient manner.
Neural Network (NN)
A neural network is a multi-categorical classifier. It is composed of an interconnected multi-layered
set of entities called “neurons”, where each neuron can be “activated” outputting its “activity” which
is a level of confidence in the recognition of a pattern. Each neuron is connected to the neurons at
the next layer by weighted links.
The whole concept relies on the “firing” function φ(). When the sum of all inputs multiplied by their
affected weights exceeds a certain threshold, the neuron is activated and outputs a value yj as
explained in Figure 7. Thus the decision-making algorithm is the combination of multiple neurons’
y1
y2
yn
w1
w2
wn
∑ φ()
𝒙𝒋 ∑𝒘𝒊 ∗ 𝒚𝒊
𝒏
𝒊 𝟏
𝒚𝒋 𝝋(𝒙𝒋)
Inp
uts
of
pre
vio
us
leve
l
Output for next level Neuron
Weights
Figure 6: Support Vector Machine principles overview
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
16
decisions. Initially, scientists configure the network hierarchy and the “firing” rule of each neuron.
The training algorithms then “teach” the network by changing the weights affected to links.
For each data samples, some neurons will “fire” (ie. say to the next level that they recognize the
pattern) and some not. During training time, data samples are marked as belonging to one category
or another. Their features are extracted and serve as inputs in the neural network. The objective of
training algorithms is then to minimize the quadratic error of the output by reducing the weights of
neurons that went wrong and improving the others depending on the level of confidence they
output.
3.2.3.3 Pattern recognition algorithm comparison
As we have previously said, the matching function is an interesting solution because it does not
require much computation power and after a careful design and testing stage can achieve pattern
recognition with good false positive/negative rates (around 90% [24]). Moreover once the Java class
has been implemented, it can be easily reusable for other functionalities. Its only drawback is that
the design stage (determination of feature weights and threshold values) should be performed again
for each new functionality, thus requiring expensive experiences and testing.
On the other hand, the machine learning algorithms feature better recognition rates (from 95 up to
99% [24]). Their main advantage is that once the neural network (NN) has been designed and once
the SVM or the NN training algorithms and procedures have been defined, the deployment of new
functionalities over these solutions only requires computation power and time. No extensive
development is needed. The only drawback is that every time a new feature is considered, since the
value vector form which serves as input is changed, the algorithm will have to be trained again from
the start to build its internal model.
Moreover, the training algorithms’ complexity (e.g. feed-forward) leads to strong requirements on
the computation power of the infrastructure that will support the operation [27]. However, once
they have been trained, these algorithms are able to recognize quickly and efficiently complex
patterns and can be deployed on a smartphone platform. Thus it is possible to consider the
integration of training algorithms on an on-demand cloud computing platform to solve the need and
avoid huge infrastructure costs at the foundation of our start-up. Deployment and integration of pre-
trained recognition algorithms would then be directly performed within the mobile application.
Nevertheless, other ethical issues are also raised by the training phase. To have a good-enough
performance, the algorithms need to be trained with samples recorded in real-life condition. But as
stated by Robb & al. [28], the techniques employed for eliciting cry vocalizations and their
subsequent use for research purposes can be subject to ethical questioning. Moreover, it could form
the basis for a communication and marketing problem with parents if the technical principles at the
roots of our technology are publicly denounced. However, auditory databases of baby cries have
been gathered by scientists and could be used. Thus, these solutions raise extra requirements for
transparency and careful definition of the techniques employed for the samples collection, along
with a special risk strategy.
To conclude, considering that our actual product has just to recognize between 3 activities and would
mainly perform cry recognition, we choose the SVMs over NNs because they need less design and
computation power for training. If the app activity recognition functionality’s importance increases
on the future, we might consider migrating to a neural network solution. The chart below
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
17
Figure 8: Activity recognition system overview
summarizes the assessment of the different solutions. The matching function solution is chosen over
the SVM for a first attempt because it is simpler to implement and allow performing most of the job.
It would allow a faster release of the first version of the mobile app and does not raise the extra risk
of communication problems related to ethical issues.
Techniques Matching function Support Vector Machine Neural Network
Reliability + ++ ++
Computation power ++ + -
Evolution potential - + +
Ethical issues ++ -- --
Development Simplicity ++ - --
3.3 Complete solution overview
When aggregating all these design choices into one solution, we end up with the following Activity
recognition system architecture detailed in Figure 8 below.
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
18
Figure 9: Development Environment overview
4. Solution development &optimization
The most difficult part in designing an activity recognition system is the choice of its components
(features and their extractors, pattern recognition technique) that we previously justified. Once they
are set, the development of the application in itself first involves configuring the development
environment which we will review and then implement it. In our case, the implementation that
followed was eased by all the high-level functions provided by the Android API. Therefore we explain
less the development part as it has less technical challenges and impacts than the previous part.
4.1 Development environment
Before starting to develop any application, it is important to install and configure an accurate and
optimal development environment. Google provides (in addition to the Operating System) a set of
tools for application development projects.
The development environment we use is composed of several layers with specific roles:
- A Java runtime environment - JRE
- A Java development Kit - JDK
- An Android development Kit - SDK
- A development platform – Eclipse
- Modules and libraries related to the project
- An Android device
The architecture of the development environment is detailed in the following diagram.
BARREAU Pierrick – Activity Recognition System for baby monitoring
19
Each of these components has specific roles and provides a set of services to the layer up. Ultimately,
we use the software Eclipse with add-ons for Android development and a library called jAudio to
perform the auditory features extraction (see § 5.2.2). We define the components in the chart below.
JRE
The JRE is a Java Virtual Machine which allows executing Java application on a device. Most of users have a already an installed JRE on their computer especially to browse the Internet and execute specific Java application. However, a JRE does not allow creating Java applications.
JDK The JDK (Java Development Kit) includes development tools such as compilers, debuggers and Java libraries to create Java applications. We can notice that a JDK often includes a JRE, so installing a JDK is sufficient to have a JRE.
SDK
The SDK is a development kit provided by Google that includes a set of tools for Android development projects. Especially, it includes APIs (a set of classes with available functions for developers), code examples, technical documentation, and an emulator. It is freely available on the Google’s website.
Eclipse
Eclipse is a multi-language software development environment comprising an integrated development environment (IDE) and an extensible plug-in system. It is written mostly in Java. It can be used to develop applications in Java and, by means of various plug-ins.
ADT Google provide a compatible module with Eclipse to assist Android application development projects.
Additional Java Libraries
It is possible to import additional Java libraries to the project to take advantage of existing Java classes and functionalities. For example, in our project we have imported the JAudio library to perform audio treatment.
Test Devices
It is possible to test an application either on the emulator provided by the Android SDK or directly on an Android smartphone. It is necessary to configure an emulator before being able to use it. It especially means to specify the screen type, the size of the SD card, etc.
4.2 The jAudio library
4.2.1 Presentation and reliability
JAudio is a new framework for feature extraction designed to eliminate the duplication of effort in
calculating features from an audio signal. This system meets the needs of audio processing
researchers by providing a library of analysis algorithms that are suitable for a wide array of sound
analysis tasks. It provides an easy-to-use GUI that makes the process of selecting desired features
straight forward but also a command-line interface to manipulate its services via scripting.
Here is the common process of using jAudio. The system takes a sequence of audio files as input. In
the GUI, users select the features that they wish to have extracted—letting jAudio take care of all
dependency problems—and either execute directly from the GUI or save the settings for batch
processing. The output is either an ACE XML file or an ARFF file depending on the user’s preference.
In order to address issues related to audio feature extraction, jAudio was designed by taking into
account technical specifications and several design decisions were taken. Many of these design
decisions match our needs for the implementation of the cry detection system presented above:
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
20
Java based
JAudio is implemented in Java in order to capitalize on Java’s cross-platform portability and design
advantages. A custom low-level audio layer was implemented in order to supplement Java’s limited
core audio support and allow those writing jAudio features to deal directly with arrays of sample
values rather than needing to concern themselves directly with low-level issues such as buffering and
format conversions. By importing the jAudio library within our project development environment it is
possible to directly use implemented jAudio classes and feature extraction methods. It permits to
have a homogenous Java based code between the Android application development and the back-
end audio treatment implementation.
XML & ARFF output
JAudio supports multiple output formats, including both the native XML format and the ARFF format.
Both of them provide structured data easily extractable and usable as input for matching functions.
Handling dependencies
In order to reduce the complexity of calculations, it is often advantageous to reuse the results of an
earlier calculation in other modules. JAudio provides a simple way for a feature class to declare which
features it requires in order to be calculated. An example is the magnitude spectrum of a signal. It is
used by a number of features, but only needs to be calculated once. Just before execution begins,
jAudio reorders the execution of feature calculations such that every feature’s calculation is executed
only after all of its dependencies have been executed. Furthermore, unlike any other system, the
user need not know the dependencies of the features selected. Any feature selected for output that
has dependencies will automatically and silently calculate dependent features as needed without
replication. It is especially interesting in terms of calculation speed and power consumption
reduction.
Extensibility
Effort was taken to make it as easy as possible to add new features and associated documentation to
the system. An abstract class is provided that includes all the features needed to implement a
feature. Moreover, meta-features are templates that can be applied against any feature to create
new features. Examples of meta-features include Derivative, Mean, and Standard Deviation. Each of
these meta-features may be automatically applied to all features without the user needing to
explicitly create these derivative features. It allows us to establish exactly the features we have
previously selected.
4.2.2 How will we use it?
As previously stated, jAudio will allow us to define our own feature extractors using their low-level
audio characteristics extraction library. This will allow us implementing the STE and STZC
computation components. There are already in-built functions to extract the fundamental frequency
and the formants. However, we will need to perform tests on a benchmark of smartphones to verify
that the jAudio solutions do not use too much computation power and memory resources.
Once the feature extractors are developed, the Eclipse IDE allows us linking transparently all the
libraries. Thanks to the ADT plugin, a direct interaction between the Android SDK and our software
development platform is possible. And once we have added the JAudio library to the Java Build path,
the compilation of a code gathering Android, JAudio and standard java libraries is successful.
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
21
4.3 Solution development
The principal component of an Android application is the Activity. It is a single, focused thing that the
user can do and the entry point of the SDK. From that central point, one can invoke any object
necessary for the application. In order to start the development with a good overview of what
objects implementation needed to be done, we first draw an UML class diagram. It allows separating
the concerns between 4 main components:
- The activity recognition system and how recording sound using Android SDK.
- The pre-processing system and how filtering the sound to improve its quality
- The feature extractors and how using jAudio to quickly craft our own extractors.
- The matching functions and how adapting weights and threshold to make it more reliable.
4.3.1 Recording sound with Android
The AudioRecord object is provided by Android to directly pull sound from any audio source for the
smartphone. We configure it to take the microphone as an input (MediaRecorder.AudioSource.MIC).
As we previously said we will try two different sampling frequencies. When using the Eclipse
emulator (AVD), the sample frequency is set to 8 KHz as it cannot support more. When deployed on a
real-world smartphone, it is set to 20 KHz. As for the audio encoding, we choose to quantize on 16-
bit (using AudioFormat.ENCODING_PCM_16BIT) as it proves to have good enough results for Robb &
al. [28]. Finally we set the channel configuration to CHANNEL_IN_MONO to effectively pull voice
sound from the microphone. Thus we end up with:
It allows creating a stream to which we can then apply the noise suppression (via the object
NoiseSuppressor), the echo cancelation (via the object EchoCanceler) and the signal normalization
(via the object AutomaticGainControl) pre-processing algorithms. Then we can store this stream as
an array of short in a buffer in order to forward it to the filtering section and then to the feature
extractors. The code used to record sound on Android is provided in Appendix 4.
audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, 8000,
AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, bufferSize);
Figure 10: UML class diagram
Figure 11: Audio recording source code
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
22
4.3.2 Signal pre-processing system
In order to further removing the noise coming from the external environment, we filter the signal
using a digital FIR at 10 KHz with an attenuation of -30 to -50 dB in the stop-band and a ripple of 3 dB
in the pass-band (see §3.2.1). We use Matlab to generate it with the sptool functionality configured
to employ the Hanning Window method. This generates a table of coefficients that we store in the
attribute Coefficients of the Filter class. To filter the signal, we then just have to implement a
convolution algorithm.
4.3.3 Feature extractors
Using the jAudio library, we are able to directly apply an FFT algorithm (using the FFT object) on the
buffer. Then, using the PeakFinder object, we can conclude on the fundamental frequency and the
signal formants. To implement the STE and STZC, we use the FeatureExtractor interface. It provides a
set of common methods that fits well with our project.
4.3.4 Matching function
The implementation of the matching function is straightforward. This a class including the weights,
threshold and ideal values for activity recognition as attributes. In order to be able to update the
sensibility of the app according to the baby voice characteristics, we define getters and setters to
update the weights and the threshold.
The pattern recognition is performed by the method computeFunction. It is a simple implementation
of the weighted differential addition formula presented in section 3.2.3.1. The source code can be
found in Appendix 5.
4.4 Solution testing and optimization
4.4.1 Recognition testing
In order to efficiently test our ARS, we need to have a database to benchmark the system against.
We first thought of using the Baby Chillanto database [30] and ask the researchers involved at the
Instituto Nacional de Astrofisica Optica y Electronica to get access to it. However our requests being
unanswered, we chose to constitute our own baby sounds database. The main drawback is that we
are unable to provide standard figures to compare with other existing solutions.
To constitute this collection, we searched online (mainly on sound sharing platforms such as
findsounds.com) and gathered 15 baby sounds samples. Then we play these sounds near the
smartphone microphone on various environments and assess our system. By constantly refining our
weights and threshold, we finally ended up with a recognition rate of 40% (6 samples over 15).
A second optimisation step was to quantify the effect of the audio effects added during the sound
acquisition on the recognition rate. By successively disabling these extra functionalities, we
discovered that the two most important pre-processing effects were the noise suppression and the
signal normalisation algorithms. Considering those results, we chose to disable the echo cancelation
algorithm, saving computation resources in the process.
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
23
The results of our researches supporting these assumptions are summarized in the chart below:
With all effects Without noise suppression
Without normalisation Without echo canceler
40% (6/15) 13% (2/15) 13% (2/15) 40% (6/15)
As we previously said in the functional analysis, our design goals are a recognition rate of at least
70% to have a reliable baby monitor. We are far from this requirement, mainly because the matching
function algorithm showed its limits. As future work, we plan to migrate towards a SVM solution (see
§4.4.3).
4.4.2 Performance testing
To test the performance of our application we use two separate environments:
- The Android Virtual Devices (AVD) that emulates a smartphone on a computer. Directly
integrated within Eclipse, it allows testing the application among several platforms without
the need to buy them physically. We use that tool to test the application on Samsung Galaxy
Nexus and Motorola MT870.
- 2 real-world smartphones (the HTC Sense and the HTC One).
The logs and resource consumption can be directly viewed in the feedback console of Eclipse. This
allowed us to see the performance of our application when deployed on a broad range of
smartphone.
As previously said in the functional analysis (see §2.2), the application requirements will be a
smartphone CPU rate of at least 1.5 GHz, therefore we choose the smartphones available as AVD
according to that characteristic. Moreover, our design goal is to provide an application that uses at
worst 15% of the CPU use rate. The application CPU use depends on the smartphone performance
and from the Android version deployed [31]. Therefore we conducted a benchmark of different
smartphones on different Android OS version.
The performance test results are summarized in the chart below. The presented CPU use percentage
has been determined by taking the average maximal rate reported by Eclipse during a test session.
Smartphone HTC Sense HTC One Samsung Galaxy Motorola MT870
Real smartphone Real smartphone AVD AVD
Android 4.1 19.4% / 20.5% /
Android 4.0 / / 21.2% /
Android 3.2 / 20.2% 22.4% 24.3%
Android 2.3.3 / / 26.7% 27.6%
As we can see the performance design goals are rarely reached. However, we took a look at some
possible optimisation. By recording asynchronously the sound, we would save some resources as
processing is given much priority while recording can take place when resources are free. To perform
that, we change the implementation of the SoundRecorder so it extends the ASyncTask class. This
change only requires to implement the doInBackground() method which will contain the code to
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
24
record sound. We also choose to disable the echo cancelation algorithm, because it does not strongly
affect the recognition performance (see §4.4.1). With those changes, the performance results are as
followed:
Smartphone HTC Sense HTC One Samsung Galaxy Motorola MT870
Android 4.1 13.8% / 14.5% /
Android 4.0 / / 16.4% /
Android 3.2 / 14.5% 17.3% 19.3%
Android 2.3.3 / / 19.8% 20.3.6%
For the most recent smartphone and OS version the design goals are fulfilled but with short success.
Further work need to be conducted to improve the resource use of the app. A possible future
advancement would be to improve the storage and sharing of the acquired audio signal to be in a
dynamic buffer.
4.4.3 Future improvements
As the matching function proved to be limited as a pattern recognition technique, we plan to
implement a Support Vector Machine (SVM) integrated in the Android Application using the Native
Development Kit (NDK). Indeed, the NDK allows programming in C/C++, which is a programming
language more suitable to implement this type of solution than Java. The ultimate goal would be to
implement some advanced training algorithms and reach a 60-70% recognition rate.
Moreover, we plan to use a shared memory among threads to store the audio buffer which contains
the sound signal pulled from the microphone. This would allow the recorder continuous and
asynchronously storing sound while the activity recognition system object consumes that data to
conclude on a recognised baby state. The goal would then be to reach a CPU use rate lower than
10%.
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
25
5. Experience Feedback
The practicum aims at bringing to market a focused and adapted product. Thus the technical solution
has constantly been changed as the project goes on to fit with the market needs or the business
model evolutions. This challenging experience of continuously refining network architecture and
software to answer evolving users’ issues while keeping the whole product coherent was my first real
experience of R&D process.
Moreover, I had the chance to collaborate with individuals from different background (business,
management, and marketing) and from different countries (France, Ireland, and Spain). This
association of multiple competences, work methodologies and cultures within a team constitute an
interesting insight on what could be a real-world international start-up environment. In addition to
the technical knowledge I developed throughout the solution’s implementation, I also had the
opportunity to help businessmen defining our key value proposition and business model, along with
identifying potential future prospects. This complete overview of the development of a project both
from a business and a technical perspective added an entrepreneurial competence to my resume.
Furthermore, with the rise of users’ will to capture their daily activity and the maturity of wireless
body sensor network, the importance of pattern recognition systems will grow in the upcoming year.
Having a lot of interests in these technologies, and more particularly in machine learning algorithms,
this technological study fits well with my professional career expectations.
Finally, collaborating with the CLARITY research center1 allowed me to get a first experience with a
research activity. Indeed, our project being supervised by Cathal Gurrin and Alan Smeaton, two major
managers of that center, the state-of-the-art and solution definition were performed in a laboratory
context. Thus this master thesis project gave me the opportunity to immerse myself in a highly
technological start-up working with multiple important stakeholders of the field.
1 CLARITY Center for Sensor Web Technologies: http://www.clarity-centre.org/
BARREAU Pierrick – Activity Recognition System for baby monitoring
26
Appendices
Appendix 1: The Octopus Chart
FC5
FC8 FC7
FC4
FS-E2
FS-E1
FS-S3
FS-S2 FC2
FP
FC1
Cost
Physical Environment
Mobile Network
Legal environment
Server
Smartphone
Baby
Parents
FC6
FS-S1
FS-S4
FC3
BARREAU Pierrick – Activity Recognition System for baby monitoring
27
Appendix 2: FAST diagrams
Main Function (FP)
Service Functions (FS)
FS1: “Configure Settings”
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
28
FS2: “Acquire baby data”
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
29
FS3: “Recognize baby activity”
FS4: “Trigger actions”
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
30
FS5: “Gather evolution data”
FS6: “Compare with norms”
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
31
Appendix 3: Sound recording code
package com.example.sensanalytics; import android.os.AsyncTask; import java.io.BufferedOutputStream; import java.io.DataOutputStream; import java.io.File; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import android.util.Log; import android.annotation.TargetApi; import android.media.AudioRecord; import android.media.MediaRecorder; import android.media.AudioFormat; import android.media.audiofx.NoiseSuppressor; import android.media.audiofx.AcousticEchoCanceler; import android.media.audiofx.AutomaticGainControl; public class SoundRecorder extends AsyncTask <Void, Integer, Void> { private File file; private Boolean isRecording; private int frequency = 8000; private int channelConfiguration = AudioFormat.CHANNEL_IN_MONO; private int audioEncoding = AudioFormat.ENCODING_PCM_16BIT; private AudioRecord audioRecord; public File getFile() { return file; } public void setFile(File file) { this.file = file; } public Boolean getIsRecording() { return isRecording; } public void setIsRecording(Boolean isRecording) { this.isRecording = isRecording; }
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
32
@Override @TargetApi(16) protected Void doInBackground(Void... arg0){ setIsRecording(true); try { DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(getFile()))); int bufferSize = AudioRecord.getMinBufferSize(frequency, channelConfiguration, audioEncoding); audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, frequency, channelConfiguration, audioEncoding, bufferSize); // Apply Acoustic Echo Canceler algorithm on recorded sound Boolean isAvailable = AcousticEchoCanceler.isAvailable(); if (isAvailable) { AcousticEchoCanceler aec = AcousticEchoCanceler.create(audioRecord.getAudioSessionId()); if(!aec.getEnabled()) aec.setEnabled(true); } // Apply Noise Suppression algorithm on recorded sound isAvailable = NoiseSuppressor.isAvailable(); if (isAvailable) { NoiseSuppressor ns = NoiseSuppressor.create(audioRecord.getAudioSessionId()); if(!ns.getEnabled()) ns.setEnabled(true); } // Normalize the output signal isAvailable = AutomaticGainControl.isAvailable(); if (isAvailable) { AutomaticGainControl agc = AutomaticGainControl.create(audioRecord.getAudioSessionId()); if(!agc.getEnabled()) agc.setEnabled(true); } int r = 0; short[] audioBuffer = new short[bufferSize]; while(isRecording && r<50){ int bufferReadResult = audioRecord.read(audioBuffer, 0, bufferSize); for(int i = 0; i < bufferReadResult; i++){ dos.writeShort(audioBuffer[i]); Log.e("info", "Ecris la valeur:"+audioBuffer[i]); } r++; } audioRecord.stop(); audioRecord.release(); dos.close(); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) {
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
33
// TODO Auto-generated catch block e.printStackTrace(); } return null; } public void stopRecording(){ setIsRecording(false); } @Override protected void onProgressUpdate(Integer... values) { // TODO Auto-generated method stub super.onProgressUpdate(values); } }
Appendix 4: Matching function source code
package com.example.sensanalytics; import java.lang.Math; public class matchingFunction { private int[] weights = {1,2,2,2,3,3}; private double[] idealValues = {880, 1020, 3340, 4510, 0.05, 0.1}; private double threshold = 0.8; // Getters and Setters to allow updating the weights and Threshold for sensitivity control public int[] getWeights() { return weights; } public void setWeights(int[] weights) { this.weights = weights; } public int getThreshold() { return threshold; } public void setThreshold(int threshold) { this.threshold = threshold; } // Compute the matching function public double computeFunction(double extractedFeaturesValues[]){ double res = 0; for(int i=0;i<5;i++){ res += weights[i]*(extractedFeaturesValues[i] - idealValues[i])/Math.max(extractedFeaturesValues[i], idealValues[i]); } return res; } }
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
34
References
[1] Bao L., Intille S. S. "Activity Recognition from User-Annotated Acceleration Data", In Proceedings
of the Second International Conference in Pervasive Computing (PERVASIVE '04). Vienna, Austria, pp.
1-17, 2004.
[2] Stikic M., Laerhoven K. V., Schiele B., "Exploring semi-supervised and active learning for activity
recognition", 12th IEEE International Symposium on Wearable Computers, 2008, pp. 81-88.
[3] Wasz-Hockert, O., Lind, J., Vuorenkoski, V., Partanen, T. and Valanne, E., “The infant cry: a
spectrographic and auditory analysis”, Clinics in Developmental Medicine No. 29, London: Spastics
International Publications, 1988.
[4] Clarkson B., “Extracting context from environmental audio”, Digest of Papers. Second
International Symposium on Wearable sensors 1998, 1998, pp. 154-155.
[5] Murry, T. “Acoustic and perceptual characteristics of infant cries”. In: Murry, T., Murry, J. (Eds.),
Infant Communication: Cry and Early Speech. TX: College Hill Press, 1980, pp. 251-271.
[6] Wasz-Hockert, O., Michelsson, K. and Lind, J. (1985) Twenty-five years of Scandinavian cry
research. In: Lester, B.M, Boukydis, C.F.Z. (Eds.) Infant Crying: Theoretical and Research Perspectives.
Plenum, New York, pp. 83-104.
[7] Michelsson, K., Sirvio, P., Koivisto, M., Sovijarvi, A. and Wasz-Hockert, 0., “Spectrographic analysis
of pain cry in neonates with cleft palate”, Biol. Neonate 26, 1975, pp. 353-358.
[8] Michelsson, K., Sirvio, P. and Wasz-Hockert, 0., “Sound spectrographic cry analysis of infants with
bacterial meningitis”, Devel. Med. Child Neurol. 19, 1977, pp. 309-315.
[9] Blinick, G., Travolga, W.N. and Antopol, W. “Variations in birth cries of new-born infants from
narcotic addicted and normal mothers”, Am. J. Obstet. Gynecol. 110, 1971, pp. 48-958.
[10] Cacace, A. T., Robb, M. P., Saxman, J. H., Risemberg, H., Koltai, P., "Acoustic features of normal-
hearing pre-term infant cry", International journal of pediatric otorhinolaryngology, Volume 33, Issue
3, 1995, pp. 213 – 224.
[11] Murray, A.D., Javel, E. and Watson, C.S., "Prognostic validity of auditory brainstem evoked
response screening in new-born infants", Am. J. Otolaryngol. 6, 1985, pp. 120-131.
[12] Oller, D.K., Eilers, R.E., Bull, D.H. and Carney, A.E., "Prespeech vocalizations of a deaf infant: a
comparison with normal metaphonalogical development", J. Speech Hear. Res. 28, 1985, pp. 47-63.
[13] Saraswathy, J., Hariharan, M., Yaacob, S., Khairunizam, W., " Automatic Classification of Infant
Cry: A Review", International Conference on Biomedical Engineering, 2012, pp. 534-549.
[14] Kevin Kuo, “Feature Extraction and Recognition of Infant Cries”, 2010 IEEE International
Conference on Electro/Information Technology (EIT), 2010, pp. 1-5.
[15] Kondoz, A. M., “Digital Speech”, John Wiley & Sons Ltd, West Sussex, England, 2004.
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
35
[16] Balducci, M., Ganapathiraju, A., Hamaker, J., Picone, J., "Benchmarking Of FFT Algorithms", IEEE
Proceedings on Southeastcon ‘97. 'Engineering New Century', 1997, pp. 328-330.
[17] Pei-Chen, L., Yun-Yun, L., "Real-Time FFT Algorithm Applied To On-Line Spectral Analysis", Circuit
System Signal Process, Vol. 8, No. 4, 1999, pp. 377-393.
[18] Vempada, R.R., Kumar, B.S.A., Rao, K.S., "Characterization of infant cries using spectral and
prosodic features", National Conference on Communications (NCC), 2012, pp. 1-5.
[19] Sanford Zeskind, P., Parker-Price, S., Barr R.G., "Rhythmic organization of the sound of infant
crying", Developmental Psychobiology Volume 26, Issue 6, 1993, pp. 321–333.
[20] Sadeh, A., Acebo, C., Seifer, R., Aytur, S., Carskadon, M.A., "Activity-Based Assessment of Sleep
Wake Patterns during the 1st Year of Life", Infant Behavior and Development Vol.18, 1995, pp. 329-
337.
[21] Heiss, J.E., Held, C.M., Estévez, P.A., Perez, C.A., Holzmann, C.A., Pérez, J.P., "Classification of
Sleep Stages in Infants: A Neuro Fuzzy Approach", IEEE Engineering in Medicine And Biology, 2003,
pp. 147-151.
[22] Karraker, K., "The Role of Intrinsic and Extrinsic Factors in Infant Night Waking", Journal of Early
& Intensive Behavior Intervention, Vol. 5 Issue 3, 2008, pp. 108-121.
[23] Várallyay Jr., G., Benyó, Z., Illényi, A., Farkas, Z., Kovács, L., "Acoustic analysis of the infant cry:
classical and new methods", Proceedings of the 26th Annual International Conference of the IEEE
EMBS, 2004, pp. 313-316.
[24] Saraswathy, J., Hariharan, M., Yaacob, S., Khairunizam, W., "Automatic Classification of Infant
Cry: A Review", International Conference on Biomedical Engineering (ICoBE), 2012, pp. 543-548.
[25] Garcia, J.O., Reyes García, C.A., "Mel-frequency cepstrum coefficients extraction from infant cry
for classification of normal and pathological cry with feed-forward neural networks”, INAOE, IEEE,
2003.
[26] Mansouri Jam, M., Sadjedi, H., "Identification of hearing disorder by multi-band entropy
cepstrum extraction from infant’s cry", IEEE, 2009.
[27] Martel J. Convolutional Neural Networks - A Short Introduction to Deep Learning. Not published
yet. 2012.
[28] Robb, M.P., Goberman, A.M., Cacace, A.T., "Methodological Issues in the Acoustic Analysis of
Infant Crying",
[29] Aggarwal, J.K., Ryoo, M.S., "Human Activity Analysis: a Review", ACM Computing Surveys (CSUR)
Surveys, Vol. 43, Issue 3, Article 16, 2011.
[30] O.F. Reyes-Galaviz, S. Cano-Ortiz and C. Reyes-Garca, “Evolutionary-neural system to classify
infant cry units for pathologies identification in recently born babies”, in 8th Mexican International
Conference on Artificial Intelligence,MICAI 2009, Guanajuato, Mexico, pp. 330–335, 2009.
BARREAU Pierrick – Activity Recognition System for baby monitoring
31
ao
ût
20
12
30
36
[31] Huang, J., Xu, Q., Tiwana, B., Mao, Z.M., Zhang, M., Bahl, P., "Anatomizing application
performance differences on smartphones", Proceedings of the 8th international conference on
Mobile systems, applications, and services, 2010, pp. 165-178.