voice over ip for sony ericsson phones

8/8/2019 Voice Over IP for Sony Ericsson Phones

1/51

Master ThesisSoftware EngineeringThesis no: MSE-2005:16October 2005

Voice over IP for Sony Ericsson CellularPhones

Petter Theander, Thomas Hultgren

School of EngineeringBlekinge Institute of TechnologyBox 520

SE - 372 25 RonnebySweden


2/51

This thesis is submitted to the School of Engineering at Blekinge Institute of Technologyin partial fulfillment of the requirements for the degree of Master of Science in SoftwareEngineering. The thesis is equivalent to 2 x 20 weeks of full time studies.

Contact Information:Author(s):Petter TheanderE-mail: [email protected]

Thomas HultgrenE-mail: [email protected]

External advisor(s):Tobias kessonCompany/Organisation: Sony Ericsson Mobile Communications ABAddress: Nya Vattentornet, SE - 221 83 LundPhone: +46 46 193 986

Pr OlssonCompany/Organisation: Sony Ericsson Mobile Communications AB

Address: Nya Vattentornet, SE - 221 83 LundPhone: +46 46 212 67 03

University advisor(s):Hkan GrahnSchool of Engineering, BTH

School of Engineering Internet : www.bth.se/tekBlekinge Institute of Technology Phone : +46 457 38 50 00

Box 520 Fax : +46 457 271 25SE - 372 25 RonnebySweden


3/51

ABSTRACT

This report presents an investigation of thepossibilities to implement voice over IP (VoIP) inSony Ericsson cellular phones. The results fromthis investigation show that it is partially possibleto implement such a solution. The best option fordoing so is to make use of the support for the Session

Initiation Protocol and the Real-time TransportProtocol offered by the architecture. Another goal isto evaluate if Bluetooth is able to handle the require-ments needed for the solution. The whole concept isproven by implementing a prototype. Measurementson this prototype show that Bluetooth will be able tohandle the requirements of most IP-based voice com-munication, i.e., in respect to latency and bandwidth.

Keywords: VoIP, Cellular phone, SIP, RTP


4/51

Contents

Contents ii

1 Introduction 2

2 A Need For New Communication Technologies 3

2.1 Circuit-switched Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Packet-switched Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 The Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 The Initial Idea 5

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3 The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3.1 Making an Outgoing Call . . . . . . . . . . . . . . . . . . . . . . . . . 63.3.2 Handling Incoming Calls . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.4 Technical Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.4.1 The Cellular Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.4.2 The Base Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Investigating the Options 9

4.1 Interview Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.2 Interview Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.3 Investigating the Current Architecture . . . . . . . . . . . . . . . . . . . . . . . 104.4 IP Multimedia Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.4.1 The SEMC IMS Architecture . . . . . . . . . . . . . . . . . . . . . . . 10

5 Design of the VoIP Prototype 12

5.1 Solution Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.1.1 Maintaining Flexibility and Modularity using SIP . . . . . . . . . . . . . 125.1.2 Using SIP and SDP for Negotiating the Media Format . . . . . . . . . . 135.1.3 Bluetooth with IP Capabilities . . . . . . . . . . . . . . . . . . . . . . . 135.1.4 Overview of the SIP Solution . . . . . . . . . . . . . . . . . . . . . . . 13

5.2 Prototype Design and IMS Relationship . . . . . . . . . . . . . . . . . . . . . . 14

5.2.1 SEMC IMS Client Interaction . . . . . . . . . . . . . . . . . . . . . . . 145.2.2 IMS SL and the VoIP Server . . . . . . . . . . . . . . . . . . . . . . . . 155.2.3 The VoIPCore Component . . . . . . . . . . . . . . . . . . . . . . . . . 155.2.4 The VoIPMediaHandler Component . . . . . . . . . . . . . . . . . . . . 165.2.5 The VoIP Callback Interface . . . . . . . . . . . . . . . . . . . . . . . . 16

5.3 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.3.1 Registering with a SIP Registrar . . . . . . . . . . . . . . . . . . . . . . 175.3.2 Sending a SIP Invite Request . . . . . . . . . . . . . . . . . . . . . . . . 175.3.3 Starting the Media Session . . . . . . . . . . . . . . . . . . . . . . . . . 185.3.4 Requesting to Talk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.3.5 Incoming Request Talk . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.3.6 Incoming SIP Invite Request . . . . . . . . . . . . . . . . . . . . . . . . 205.3.7 Sending a SIP Bye Request . . . . . . . . . . . . . . . . . . . . . . . . 215.3.8 Incoming Bye Request . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

ii


5/51

6 Prototype Implementation 23

6.1 Bluetooth Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236.2 The VoIP Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6.2.1 Changes in the Underlying Architecture . . . . . . . . . . . . . . . . . . 236.2.2 No Support for Full-duplex Audio . . . . . . . . . . . . . . . . . . . . . 24

7 Evaluation of the Prototype 25

7.1 Answers to the Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 257.1.1 Reasonable Response Times . . . . . . . . . . . . . . . . . . . . . . . . 25

7.1.2 Possible to Implement IP-Telephony . . . . . . . . . . . . . . . . . . . . 267.1.3 Support for New Communication Technologies . . . . . . . . . . . . . . 267.2 Suggestions for Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . 26

8 Discussion and Related Work 28

8.1 Network Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288.1.1 VoIP in NAT Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . 288.1.2 Avoiding the NAT Problem . . . . . . . . . . . . . . . . . . . . . . . . 29

8.2 VoIP Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298.3 Public Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308.4 Related Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

9 Conclusions 32

Acknowledgements 33

Bibliography 34

A The Session Initiation Protocol 36

A.1 Introduction to SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36A.2 The Architecture of a SIP Network . . . . . . . . . . . . . . . . . . . . . . . . . 36

A.2.1 User Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37A.2.2 Registrars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37A.2.3 Location Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

A.2.4 Redirect Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37A.2.5 Proxy Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

A.3 Signaling in SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.3.1 Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39A.3.2 Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

A.4 SIP Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42A.4.1 Request Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42A.4.2 Status Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42A.4.3 Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42A.4.4 Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A.5 Bridging SIP and the PSTN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

B The Session Description Protocol 45

C The Real-time Transport Protocol 46

D Glossary 47

1


6/51

Chapter 1

Introduction

This master thesis work was undertaken to investigate the possibilities of introducing a newcommunication technology into an already established communication interface. As new com-munication technologies are emerging more rapidly today than a couple of years ago, the need tomerge these is also becoming greater. The general trend amongst emerging technologies is thatthey are more or less exclusively developed to fulfill the needs of voice communication in an IP-based packet-switched network as the Internet. Such technologies are commonly known as Voiceover IP (VoIP). Traditional telephony technologies, like the Public Switched Telephony Network

(PSTN), were on the other hand designed to work in circuit-switched networks.The motivation for undertaking this investigative work was that we saw a general disappoint-

ment of the fact that a new communication technology often meant that one, as a user, were forcedto use a computer without any other really good alternatives. Thus, there was a need for a solutionthat made it possible to use the emerging technologies in a more comfortable way, as for examplethrough a cellular phone.

To us, this lacking was a major drawback, and probably one of the facts that imposes a prob-lem when introducing a new communication technology. It was these facts that led to the initialsolution proposal presented in chapter 3. This proposal was sent to Sony Ericsson Mobile Com-munication (SEMC), and earned us the opportunity to undertake more extensive research of whatis actually needed in order to introduce support for a new communication technology in a cellular

phone.This report presents an investigation of the possibilities for introducing a new communicationtechnology, like VoIP, into a Sony Ericsson cellular phone. The investigation is based on thefollowing research questions:

1. Will Bluetooth be able to handle the communication between the cellular phone and thebase unit in accordance to what is seen as "normal" response times and quality in traditionaltelephony?

2. Is it possible to integrate IP-telephony support into a cellular phone based on the SonyEricsson architecture?

3. Is it possible to use any pre-existing techniques from the Sony Ericsson mobile phone ar-chitecture in order to ease the implementation?

4. Is it possible to integrate support for more communication technologies based on the se-lected communication protocols and the Sony Ericsson mobile phone architecture?

Interviews and implementation of a prototype was used in order to evaluate whether the SEMCarchitecture supports new technologies. We find that the best option is to use the Session InitiationProtocol (SIP) and the Real-time Transport Protocol (RTP), which both are supported through theuse of the SEMC IP Multimedia Subsystem (IMS) architecture. We will also see that a SIP andRTP based solution will support interaction with other voice communication technologies throughthe use of gateways. The evaluation of the prototype showed that Bluetooth will suffice for most

voice communication, i.e., in respect to latency and bandwidth.

2


7/51

Chapter 2

A Need For New CommunicationTechnologies

In order to understand why new voice communication technologies are introduced, when therein fact already exists a working and well accepted system, one must understand the main dif-ferences between the traditional Public Switched Telephony Network (PSTN), which is circuit-switched, and the new IP-based technologies which are used in packet-switched networks. Due tothis reason there will be a short summary of the most important aspects of both circuit-switchednetworks and packet-switched networks, along with their respective benefits and drawbacks.

2.1 Circuit-switched Networks

There exist different types of circuit-switched networks. The first, and probably the simplestone, is a dedicated cable between two users. This system is however not very flexible when itcomes to adding more users, as each user would need a dedicated cable to every other user. Thiswould in fact mean that the number of cables in the network would grow exponentially [1]. Tosolve this issue a switch could be introduced. This means that adding a new user only impliesconnecting the new user to the switch. In the simplest case one could say that the task of the

switch is to form a connection between two users, and in this way attach the two, as if they wereactually connected to the same dedicated cable [1].Although simplified, this is the main concept of a circuit-switched network; the network sim-

ply allocates resources along a path, between two or more end users, to form a dedicated line [1].Over the years this paradigm has of course been refined and developed. Todays circuit-switchednetworks uses, e.g., Frequency Division Multiplexing (FDM), Digital transmissions, and TimeDivision Multiplexing, to better utilize the capacity of its bearer (cables) [1]. The main task ofthe circuit-switch telephony network is still the same, i.e., to manage and setup dedicated pathsand resources between end users, without any care being taken to what is actually being sent overthe connection. This means that much of the intelligence in the system resides in the network,as it is the network that decides how to setup the path and manage the path throughout an entirecall-session [1].

2.2 Packet-switched Networks

Packet-switched networks were designed with focus on data transmission, i.e., with care takento the bursty nature (the amount of data sent during a session is not constant) of data transmissions[1].

In a packet-switched network a packet of data is created by one node in the network, and theaddress of the receiver is attached to the packet. The packet is then sent to the first network node,or router as it is called in packet-switched networks. The packets address field is examined by therouter and passed on to the next appropriate node on the network. When the packet arrives at itsdestination the data in the packet is processed [1]. One could say that a packet-switched networkoperates in a very similar way as the traditional postal service.

The nature of a packet-switched network means that there are no dedicated resources allocatedfrom the network, i.e., the network offers no quality of service (QoS) [1]. This in turn means that

3


8/51

no resources will be wasted when sending bursty data, as is the case in circuit-switched networks.The fact that most packet-switched networks do not offer any QoS means that a client usingthe network can not assume that a sent packet actually is received by the recipient. It thereforebecomes the clients responsibility to handle the QoS aspects of a session [1]. This is howeveronly true when using UDP and not TCP, as TCP adds transport control functionality to handlethese issues.

2.3 The Internet

The Internet was designed as a dumb network which soul purpose is to provide connectivitybetween senders and receivers, no matter what type of data is carried [1]. Internet is constructedas a packet-switched network with the Internet Protocol (IP) as its base for addressing and routing.Therefore the structure of Internet is independent on the actual bearer of the data, as long as theendpoints of each network support the IP paradigm.

As the Internet is a dumb network, and only provides unreliable transmissions, it is left tothe sender and receiver of the data to handle retransmissions, flow control, error detection, etc.The network (Internet) itself is almost stateless and does not care for the arrival of the packetssent [1]. This very fact makes the network itself very failure safe, as if one node in the networkmalfunctions, this is only perceived by the receiver as a loss of packets, and a resend can be issued.The packets can this time take another way through the network [1]. This is a great step away from

what is seen as normal conditions in traditional circuit-switched networks, as the PSTN, whereQoS is central. However, as the utilization of resources is better in a packet-switched network, andthe fact that the Internet has grown so large, along with the fact that its more or less free to use, hasled to that voice communication is shifting towards solutions for packet-switched networks [1].

It is with these facts at hand that we first became interested in the evolving communicationtechnologies, and thus started to think about the possibilities to integrate the new communicationsinto an already established communication interface.

4


9/51

Chapter 3

The Initial Idea

3.1 Background

In this chapter the initial idea will be presented. This idea was used as reference material whenwe applied for a master thesis project at SEMC. As stated, this is the initial idea, and as can be seenthroughout this report there will be adaptations and modifications to it. Why there is deviationfrom this initial idea is quite natural, as the idea presented in this section was not derived fromany pre-study, but rather out of creative thinking and logical reasoning. In short, it was quite clear

to us from the very start that this material would mostly be used as a means of describing onepotential solution to implement IP-telephony in cellular phones. This means that this idea wasderived without any insight on what possibilities were available in the SEMC architecture. Fornow we will leave it at this, and describe the idea which earned us a position within SEMC toinvestigate the true possibilities for IP-telephony within their architecture.

The source for the idea was that we felt dissatisfied with the fact that one was more or lessforced to either buy a new phone or get stuck in front of a computer, if one should use a newcommunication technology, like for instance IP-telephony. This of course leads to that one hasto change phone dependent on which communication technology one would like to use. The factthat a new communications technology imposes the need to use new physical equipment is inour opinion one of the main obstacles when introducing new technologies, as people are often

reluctant to change their behavioral patterns [bok].

3.2 Vision

To address the problems described above, we conclude that it would be a good idea to gatherall communication technologies under one physical interface. In order to overcome the problemwith peoples reluctance to change, it was decided that a cellular phone could be a good hardwareinterface for all different technologies. This decision was based on that the cellular phone isalready a well accepted way to handle communication, both voice and video. It also has theadvantage, compared to other solutions, that it is mobile. This means that one would always havethe choice to choose freely among the supported communication technologies, independent of the

physical location.The freedom to choose communication technology and the possibility to fairly easy support

new technologies, without changing the physical equipment, would also lead to economical ben-efits. This would be true for both companies and home users, as they can easily shift to the mostcost effective communication technology. The greatest economical gains would of course be forlarge companies, due to the larger traffic volumes.

The value of having a solution like this will only increase in the future, as new technologiesand communication protocols will emerge more rapidly. Therefore being able to support thesenew technologies without major hardware modifications will be even more important than is thecase today. Another benefit with having this solution available on the market is that it can con-tribute to the development of new communication technologies and protocols, as they can more

easily be introduced to the market.

5


10/51

3.3 The Basic Idea

The general idea, which can be seen in figure 3.1, revolves around a cellular phone (1), which isconnected via Bluetooth (2) to a base unit (3), which in turn is connected to an appropriate bearerfor that specific media type (4).

Figure 3.1: An overview of the basic idea

3.3.1 Making an Outgoing Call

If one looks at the flow when making a call using this solution it would mean that the cellularphone first checks to see if it is within coverage of the base unit. If it does not have coverage itinitiates the call as a normal call for a cellular phone, i.e., using GSM, UMTS, etc. In the casethat the cellular phone does have coverage from a base unit, it passes the connection informationto the base unit, which in turn selects the most appropriate bearer, i.e., based on the connectioninformation given. The base unit then sets up the call between the cellular phone and the intendedrecipient.

3.3.2 Handling Incoming Calls

When the base unit receives an incoming call, on one of the connected bearers, one of thefollowing things can happen: If the cellular phone has coverage by the base unit, the base unitsets up the call with the specific cellular phone. If the cellular phone however would not be withinthe coverage area of the base unit, the call could for instance be connected to the reception orforwarded to, e.g., an answering machine.

3.4 Technical Requirements

In this section the technical requirements for the solution will be presented. There will also be a

more technically detailed presentation of the different components that are needed by the solutionand suggestions on how these components could be implemented.

6


11/51

3.4.1 The Cellular Phone

The main requirement for the cellular phone, in this solution, is that it has Bluetooth capa-bilities. This is quite natural as Bluetooth is the bearer for all data traffic between the cellularphone and the base unit. However, the exact Bluetooth requirements are not fixed. There aresome alternative ways to solve the actual data transfer over Bluetooth. One of these is to let thecellular phone implement the Bluetooth profile normally used for headsets. This solution meansthat the base unit can communicate with the cellular phone using the same standard as it were justsending audio to an ordinary headset. This solution would however also require that the cellularphone is able to communicate the connection information using one of the Bluetooth profiles fordata communication. The second alternative is to simply handle all communication, i.e., controlinformation and voice packets using normal data communication and not separating the two. Ex-cept for the requirement already mentioned there will of course also be requirements for codecsupport, coverage handling, etc.

3.4.2 The Base Unit

The base unit could almost be seen as a router between different bearers and communicationtechnologies. This means that the main purpose of the base unit is to redirect and repack the datareceived. This further means that there are real-time requirements when handling these packetsif not to introduce unacceptable delays. The handling and repacking of voice data must also be

done without any noticeable loss of sound quality.In order to make the base unit as flexible as possible, a modular design is suggested. This willmean that the base unit could support new communication technologies just by adding a softwaremodule. Figure 3.2 describes the module-based base unit.

Figure 3.2: Overview of the module-based base unit

Bluetooth Interface. This part of the base unit represents the communication interface towardsthe cellular phone, and is used when receiving and sending data. This data could be both controland voice packets.

Packet Handling. This layer is used to filter the incoming packages, which are received onthe Bluetooth interface, according to their type, i.e., control- and audio packets. These packetsare then forwarded to the appropriate module. The packet handling layer is also responsible

for repacking of the data received by the base unit to the correct Bluetooth packet type, beforeforwarding these to the Bluetooth interface.

Communication Logic. This module is responsible for handling connection logic, i.e., the logicneeded for setting up and maintaining the connection between the incoming and outgoing inter-face. This means that it is this module that handles the selection of which bearer to use andmanages the connection with the cellular phone. The choice of which bearer to use is based onthe connection information given. The intention is to make it possible to manually configure thisrouting table.

Audio Transformation. This module handles the incoming Bluetooth voice packets and trans-

forms these into an intermediate format. When packets are received by the base unit, this moduletransforms the intermediate format into voice packets for Bluetooth.

7


12/51

Bearer Packing. These modules are represented in figure 3.1 as "PSTN", "IP-telephony" and"...". This type of modules are used to repack to and from the intermediate audio format to theformat expected by the specific bearer. This means that it is these modules that decide whichcommunication technologies and protocols that are supported. The intention is to make this mod-ule layer easy to expand, and thereby introduce support for new technologies. It should also bementioned that care must be taken when choosing the intermediate format, in order to maintainflexibility.

Bearer Interfaces. These modules are the physical interfaces needed by the software models

discussed in the previous section, this could, e.g., be hardware interface for PSTN, LAN, andWLAN, etc. The hardware interfaces that are available also affect which communication tech-nologies that can be supported.

8


13/51

Chapter 4

Investigating the Options

In order to understand the problem domain and the options, the first thing undertaken was aseries of interviews with people who have insight in the current phone architecture and the futuredevelopment of the cellular phones at SEMC. Interviews were a quite natural means of obtaininginitial knowledge about the capabilities offered by todays phone architecture at SEMC, as we hadno previous personal knowledge about the internal architecture of their cellular phones. This lackof previous knowledge means that the ideas presented so far in this report will be modified quitea bit. However, it is our opinion that the initial idea presented previously may be of interest, as it

presents at least our visions about the project, and this was in fact what earned us the possibilityto conduct this master thesis at SEMC. This said, it should be pointed out that many of the ideaspresented in the initial proposal will be possible to implement using the technology we finallydecided to use. In the rest of this section the main focus will be on the options offered by theSEMC architecture, i.e., which parts of the architecture that can be used in order to implement asolution that fulfills the vision for this master thesis.

4.1 Interview Methodology

We had no previous knowledge of what was offered by the SEMC architecture at all, andthis influenced the way the interviews were conducted quite a bit. This fact made us decide

to use an iterative interview process to investigate the options offered by the architecture. Thismeans that the first interviews were conducted with SEMC personnel, whom had a fairly goodsystem overview, but did not posses detailed knowledge about all parts of the system. These initialinterviews gave us the needed initial knowledge of the architecture. After having gained initialunderstanding of which parts of the architecture that could be of interest, the interviews entereda new phase. As the architecture is quite complex, this phase more or less lasted throughout theentire project. The interviews in this new phase had the goal to get in-depth knowledge aboutdifferent capabilities offered by the architecture. Because of this reason, the interviews wereconducted with different persons, depending on who would be most likely to have the neededinformation. As some parts of the needed architecture is developed abroad, some interviewswere conducted using telephone conferences, or when people from the concerned sections were

visiting.

4.2 Interview Results

After conducting the initial interviews it became apparent that the solution for implementingIP-telephony in SEMCs cellular phones was to be closely connected to SEMCs IP MultimediaSubsystem (IMS) architecture. In fact, it became quite clear that this was the best, and maybeonly option, if we were to implement a working IP-telephony prototype within the time frame forthe master thesis project.

Another issue that was revealed during the other phases of the interviews was that there maybe complications handling IP-based connections over Bluetooth in an satisfactory manner, as this

is something that have not really been used extensively. According to the interviewee this shouldnot pose any major problems, as it probably should be rather straightforward to fix.

9


14/51

4.3 Investigating the Current Architecture

Even though the indications from the initial interviews were quite synonymous, i.e., IMS wasthe way to go, we still decided to look into the phone architecture first hand. The reason fordoing so was two-folded, one reason was to investigate the options, and the other reason was tofamiliarize ourselves with the phone architecture. This insight knowledge was also used to directthe interview process and questions in its next phases.

This investigation proved to be quite valuable for two reasons. First and foremost we learnedhow applications in a cellular phone is generally designed and implemented. This may seem

trivial, but the truth is that the internal architecture of a phone differs quite a bit from what is seenas normal application development. In a Windows based environment, for instance, one does notreally need to care about process registration and process intercommunication in the same way asin an embedded system.

The other reason was that we became certain that IMS really was the only option, i.e., with thetime frame in mind. This became clear as the architectural investigations found no good supportfor redirecting and managing voice calls in a packet-switched manner. The reason for this wasthat there simply was no design support in the current base architecture for manipulating, or evengetting hold of, audio streams in a satisfying manner. The investigations also showed that therewere no good enough native support for media protocols, which could be used for transportingmedia data over IP-connections.

These facts meant that if we were to implement a solution with only the support found in thecurrent base architecture, we would have to first of all make modifications to the current architec-ture, and secondly develop, or at least implement, a whole new protocol stack. As this would haveshifted the attention away from the initial goals, and would have taken too long to actually realize,the focus from now on were to make further investigations of IMS and the capabilities offered bythe SEMC IMS architecture.

4.4 IP Multimedia Subsystem

IP Multimedia Subsystem (IMS) is a term used for merging Third Generation (3G) mobilecellular networks with the Internet [2]. IMS is in fact one of the first steps away from the tra-

ditional circuit-switched domain. Although there have been data and Internet capabilities in thecircuit-switched networks, like PSTN and the mobile 2G networks, these networks are optimizedfor handling voice transmissions, and only offer custom data capabilities by the use of a modem.IMS, on the other hand, follows the current trend, and makes use of the packet-switched capa-bilities in the third generation networks [2]. It should be noted that it is not the IMS that bringspacket-switched capabilities to the phone, as this is a feature of the third generation network. TheIMS is rather a term used for a system managing QoS, billing, and mobility aspects that is neededin addition to the packet-switched capabilities of the third generation network, in order to make itappealing to for both network operators and end users. In short, IMS is a system to make use ofthe IP-protocol in a mobile network.

4.4.1 The SEMC IMS ArchitectureIMS is a quite general term, but represents the transition towards an architecture that better

conforms to the capabilities needed in a packet-switched data network like the Internet. In thissection there will be a presentation of the capabilities offered by the SEMC implementation of theIMS. Focus will be on the aspects of the SEMC IMS that will be of value for implementation ofIP-telephony. The presentation below is however only a summary of the capabilities offered. Fora complete overview we refer to appendix A, B, and C, which were constructed as an investigationand pre-study of the internal capabilities offered by each relevant part of the IMS architecture.

Session Initiation Protocol. Along with the SEMC implementation of the IMS architecture,there will be support for the Session Initiation Protocol (SIP) [3], which is a standard for initiating

and managing media sessions over an IP-network [3]. For more detailed information about SIPplease look at appendix A.

10


15/51

Session Description Protocol. In the SEMC IMS architecture there will also be support for theSession Description Protocol (SDP) [4], which is used in combination with SIP. SDP is actuallycarried in a SIP message, and is used to describe the actual media that is going to be used after thatthe session has actually been established with the help of the SIP signaling. For more informationabout SDP please look at appendix B.

Real-time Transport Protocol. The SEMC IMS architecture also facilitates the Real-time Trans-port Protocol (RTP) [5], which is a protocol used to actually carry real-time data streams like audioand video, over an IP network. RTP employs real-time capabilities by the use of timestamps and

sequence numbers, which are applied to the packet header. Parallel to every RTP session thereis also a Real-time Control session, which uses the Real-time Control Protocol [5]. The RTCP isused for synchronization between sender and receiver, as well as handling other session specificcontrol information. For more detailed information about RTP and RTCP please look at appendixC.

11


16/51

Chapter 5

Design of the VoIP Prototype

This chapter describes the design of the VoIP prototype that needs to be created. First there willbe a description of how the protocols investigated in the pre-study (appendix A, B, and C) can beused in order to fulfill the goals for this project. After this there will be a detailed description ofthe VoIP prototype and its relation to the SEMC architecture. In order to illuminate the design, aset of scenarios showing the interaction between the different parts (VoIP UI, VoIP-server, IMSSL, etc.) are described in the last section of this chapter.

5.1 Solution Design

As could be seen by the initial investigation, there were some architectural restrictions thatnarrowed the options for implementing a working IP-telephony prototype within the given timeframe. As a result, the focus shifted towards making use of the capabilities offered to us by theSEMC IMS architecture, i.e., SIP, SDP, and RTP. This said, it is however our opinion that thecapabilities offered by these protocols are really powerful and would be one of the best solutionsfor the prototype implementation, even if there would have been other options to consider.

One of the goals of this project was to investigate and make use of the possibilities offeredby the SEMC architecture. The main option offered, is to use of the IMS architecture. The othergoals were to have a solution that was flexible and could easily be adopted to make use of new

communication technologies. The chosen solution should furthermore be able to use Bluetoothas the communication interface. These are all capabilities offered by the initial idea, which is notvery strange as the initial idea proposal was constructed to really stress these capabilities.

In the remainder of this chapter there will be a presentation of the possibilities offered bySIP, SDP and RTP, and how these protocols can be used to fulfill the goals of this project. Thesolutions presented will be put in contrast to what was proposed by the initial idea. This is donein order to show that a solution, which is based on the IMS capabilities, can really fulfill the goalsfor this project, and to some extent even surpass the visions we had for this project.

When reading this chapter it is assumed that the reader is familiar with the capabilities offeredby SIP, SDP and the RTP protocols. The needed background information can be obtained byreading appendix A, B, and C.

5.1.1 Maintaining Flexibility and Modularity using SIP

As will be shown in this section, it is fully possible to maintain the modularity concept presentedin the initial idea, by the use of SIP. In fact almost all aspects of the base unit presented in the initialidea, can be constructed by the facilities provided by a normal SIP solution. The main differencefrom the initial idea would be that instead of having one central base unit with many differentcapabilities, there would in a SIP solution be a SvirtualT base unit, with the same capabilities,but these would be distributed among the different servers found in a normal SIP network, i.e.,registrar, proxy and gateway servers.

The registrar and proxy sever will handle user registrations and the passing of communicationlogic or call signaling to and from the end users of the system. Gateways are used as a bridgebetween different communication technologies. As a matter of fact there already exists welltested and accepted bridging between PSTN and SIP by the use of PSTN-gateways. These are allcapabilities offered by the base unit presented in our initial idea.

12


17/51

In short, by using SIP, there will be the possibility to add new technologies by adding a newtype of gateway to the network. In fact, the SIP solution allows for the separation of the differentservers and gateways in a network, and thus there is much better load balancing, reliability andflexibility than was actually the case with the initial idea.

5.1.2 Using SIP and SDP for Negotiating the Media Format

Instead of using a fixed intermediate format for communication between the user interface andthe base unit as described in the initial idea, and then translate this intermediate format into the

bearer specific media format and protocol, one could with a SIP/SDP solution simply skip thistranslation, as SDP and SIP allows for communication and negotiation of which media formatand protocol to use. This is done by the parties of the call telling each other their capabilitiesand matching these. This means that when communicating there is no need for intermediateprocessing of the media format or protocol, as in the initial idea. This is of course only true if therecipient is also connected to a technology capable of handling SIP and SDP. If, e.g., the recipientis using PSTN, the actual SIP and SDP communication takes place between the user interface(in this case a cellular phone) and the PSTN-gateway, and the gateway handles the conversionbetween SIP/SDP and its negotiated format to and from the PSTN.

5.1.3 Bluetooth with IP Capabilities

SIP is an IP-based protocol. This means that in order to have direct communication betweenthe cellular phone and the recipient using SIP, there is also a need to have an IP-connection be-tween the cellular phone and the rest of the network. This requirement makes it quite obviousthat the best protocol, or Bluetooth profile, to use would be one that allows for normal IP-basedcommunication over Bluetooth, i.e., a Bluetooth connection which tunnels IP-based traffic. In theIMS-based solution we decided on using the Bluetooth Network Access Profile (NAP) in order toprovide the needed connection.

5.1.4 Overview of the SIP Solution

As can be seen in figure 5.1, the entity depicted as the base unit in the initial idea, is now

represented by several network connected servers. It should however be noted that the solutionstill offers the same possibilities as the initial idea. There is for instance still the option to initiatea call to and from different communication technologies, through the use of gateways. The factthat communication bridging between technologies are done through the use of special purposegateways servers, actually have some benefits that did not exist in our initial idea. First and fore-most there will be even greater flexibility for new technologies, as there is actually no requirementto add a gateway for the new technology in ones own domain (or base unit). That is, the onlyrequirement is that the service is offered by someone connected to the Internet, and that access tothis service is allowed. Another benefit is that this enables better load balancing than was offeredby the initial solution.

Figure 5.1 also shows the option to communicate with other SIP capable entities on the net-work or Internet. This is done using normal SIP signaling (appendix A) between the caller andthe recipient. After the call session has been established, the session data is transmitted over apeer-to-peer (P2P) connection. The exact protocol being used is negotiated using SIP and SDP.The scenario is almost the same when a call is initiated to a different technology, e.g., PSTN.The main difference is that the SIP signaling and P2P session establishment takes place betweenthe gateway and the caller, i.e., the SIP-enabled cellular phone. If one uses a solution that en-ables PSTN entities to initiate the call, a similar thing happens, the gateway is informed of theincoming call, and then initiates SIP signaling and session establishment towards the recipient.In this case the SIP enabled cellular phone. The gateway then answers the PSTN call, and startsthe intermediate processing of the media data. The scenario is similar for communication withother technologies; the difference simply resides in the translation and protocol capabilities of thegateway.

13


18/51

Figure 5.1: Overview of the SIP solution

5.2 Prototype Design and IMS Relationship

The VoIP prototype is a client-server based solution, i.e., there is one application running as a

server, the VoIP-server. The client, or user interface, interacts with the VoIP-server to get infor-mation about incoming calls as well as to initiate calls. It is the VoIP-server that in turn interactsand uses the SIP capabilities offered by the IMS Service Layer (SL).

AAs time was limited and as the purpose was to create a prototype rather than a finishedproduct, the main focus was on the VoIP-server. This means that no great effort was taken toimplement a neat user interface. The VoIP-server, however, offers support for a client, and thereshould thus be little work integrating a user interface at a later stage.

This section will describe the internal structure of the VoIP prototype. First, there will be adescription of the IMS architecture and how it generally interacts with its clients and vice versa.This is needed in order to better understand the other design descriptions in this section

5.2.1 SEMC IMS Client InteractionThe SEMC IMS architecture parts that are of interest for the VoIP-solution can be split into

two categories: the IMS SL (service layer) and RTP. The IMS SL is the part of the underlyingarchitecture that supplies the VoIP server with support for handling SIP sessions. This meansthat it is quite easy to make SIP requests like register and invite; the only thing needed is to setthe SIP-specific parameters and call the specific functionality in the IMS SL. In the same easymanner, by implementing the IMS SL callback interfaces, the VoIP server will be notified by theIMS SL when incoming SIP requests, like invites and byes, are received and will therefore be ableto act accordingly.

Not only does the IMS SL offer support for sending and receiving SIP requests and responses,it also actually helps the overlying application, in this case the VoIP server, with setting up the

media session that has been offered by the SIP invites. This is achieved using the IMS SL specificinterfaces that an overlying application should, directly or indirectly, implement. Through thedifferent stages of an invite, or other request, the IMS SL calls the application-specific implemen-tations of the IMS interfaces in order to handle the current operation.

14


19/51

5.2.2 IMS SL and the VoIP Server

The VoIP-server can be split up into two parts: the VoIPCore, which is the actual runningapplication, and the VoIPMediaHandler, which handles the media sessions. The VoIP server usesthe IMS SL for all SIP requests and responses. As said, the IMS SL also helps the overlyingapplication to setup the negotiated media session. In figure 5.2 can be seen that the VoIPCorecomponent uses the IMS SL to handle SIP requests. Incoming SIP requests are received by theVoIPCore as events sent by the IMS SL.

Figure 5.2: Interaction between the VoIP Server and the SEMC IMS Architecture

Figure 5.2 also shows that the IMS SL uses the VoIPMediaHandler component. This is doneusing the IMS specific interfaces implemented by the VoIPMediaHandler. The VoIPMediaHan-dlers responsibility is to set up the actual media sessions. This is done by using other parts ofthe IMS architecture, mainly the RTP and CStreamingMedia. Once the connections between thetwo peers have been established using RTP, it becomes the VoIPMediaHandlers responsibility tomake sure that data is being recorded and sent as well as received and played.

The actual recording and playback of data is done by using the StreamingMedia component.This is a component that allows for recording and playback to and from a memory buffer, whichis really a must for this solution. The StreamingMedia component also supports full duplex audio,i.e., simultaneous recording and playback. This will however prove to not be completely true, but

more about this in the implementation chapter.

5.2.3 The VoIPCore Component

This component is the part of the VoIP-server solution that is the actual running server applica-tion, i.e. it is this component that a user of the VoIP-server, i.e., a VoIP-client (GUI), uses to makeoutgoing calls and to receive incoming calls. Therefore, a public interface called IVoIPCore wascreated, which defines the functionality needed by a client, e.g., registering with a SIP registraror ending a VoIP-call. All of the methods defined by the IVoIPCore interface are asynchronous,which means that in order for the client to know what happened with their request (function call) apublic callback interface is needed. Another fact of why the callback interface is needed is that theVoIPCore component must notify the client when incoming calls are received. The VoIP callback

interface is further explained in section 5.2.5.In short, the VoIPCore component notifies the client about the status of ongoing SIP transac-

tions. In order to be able to do this, it needs to implement callback interfaces offered by the IMS

15


20/51

SL. Figure 5.3 shows what interfaces the VoIPCore component implements and also some of itsmethods.

Figure 5.3: The main functionality of the VoIPCore component

5.2.4 The VoIPMediaHandler Component

The VoIPMediaHandler component handles the media sessions. This means that it is respon-sible for sending and receiving the voice data that is transmitted between the peers using theReal-time Transport Protocol. To do this, the VoIPMediaHandler uses a utility component offeredby the SEMC IMS, called RTP.

Besides making sure that data is sent and received, the VoIPMediaHandler component alsohas the responsibility of recording as well as playing this data. This is accomplished using the

StreamingMedia component, which is able to record as well as to play streaming media.In order for the VoIPMediaHandler to be able to do all this, it first needs to be informed by

the VoIPCore that a new session is about to start. Therefore, the VoIPMediaHandler componentimplements the IVoIPMedia interface. Using the functionality provided the IVoIPMedia interface,the VoIPCore component can allocate (and deallocate when that is needed) resources that areneeded before the media session is started. The design of the VoIPMediaHandler interface alongwith the functionality that should be offered by implementing the IVoIPMedia interface can beseen in figure 5.4.

Figure 5.4: The main functionality of the VoIPMediaHandler Component

5.2.5 The VoIP Callback Interface

In order to notify a client using the VoIP-server about ongoing SIP requests, as well as aboutincoming SIP requests, the client needs to implement the ICBVoIP interface. This is because ofthe fact that the functionality that the VoIP-server offers the client is asynchronous. The need forthis is quite obvious, the client UI should not be locked while it is waiting for a specific functionto complete. Therefore, the results of such an operation are provided using a callback interface,in this case ICBVoIP. The functionality that the ICBVoIP offers can be seen in figure 5.5.

Figure 5.5: The functionality that the VoIP Callback Interface provides

16


21/51

5.3 Scenarios

This section will show, with help of scenarios, how the VoIP-server interacts with the rest ofthe system, and vice versa, in its most crucial parts. Each scenario contains a sequence diagramand a descriptive text explaining the scenario.

5.3.1 Registering with a SIP Registrar

In order to send and receive invites (make a call and receive a call) it is necessary to first have

registered with a SIP server. Figure 5.6 is a sequence diagram of the register scenario.1. When the register method in the VoIPCore component is called, it sets up the register pa-

rameters needed for a successful SIP registration.

2. After this setup has been complete, the register method is called, and upon a response fromthe SIP server (or some other network error) a response code is received. The user of theVoIPCore component is notified with a callback method.

Figure 5.6: The VoIPCore component uses the IMS SL to perform a SIP registration

5.3.2 Sending a SIP Invite Request

Having registered, it should be possible to send and receive invitations to media sessions viaSIP. This section describes what happens when a SIP invite is sent to another user that accepts theinvitation.

Figure 5.7: The VoIPCore component uses the IMS SL to initilize a SIP invite request

1. When the Invite method is called in the VoIPCore component, it sets up the invite parame-ters needed for a SIP invite request.

2. The next thing it does is to request that the IMS SL sends the invite.

17


22/51

Figure 5.8: The IMS SL uses the VoIPMediaHandler component to create the SIP invite and tosetup the media streams

3. The IMS SL uses implemented functionality in the VoIPMediaHandler component to bothcreate the SDP part of the SIP invite (GetSupportedMedia), as well as to prepare the to-bemedia session by creating and opening sockets (OpenMediaSockets).

4. After the invite has been sent and a response has been received from the remote end, the

IMS SL uses the VoIPMediaHandler to figure out which media sessions that matched (Com-pareMedia). Using that information, the IMS SL closes the sockets that will not be used(CloseMediaSockets), and completes the setup of the media session sockets (SetConnec-tionInfo).

5. The IMS SL notifies the VoIPCore component about the status of the sent SIP invite request,and the status is forwarded to the user of the VoIPCore component.

5.3.3 Starting the Media Session

After an invite has been sent (or received) and it has been accepted, all of the pre-conditionsare set (i.e,. the correct sockets for sending and receiving data have been set-up) to finally start

having a conversation. A normal phone call using either a cellular phone, a standard PSTN-connect phone, or a VoIP-phone, are usually full-duplex, i.e., it is possible for both participants totalk at the same time. Because of the current limitations in the architecture mentioned in chapter6, we have been forced to half-duplex conversations, i.e., only one participant may talk at thesame time.

Figure 5.9: Preparing the VoIPMediaHandler for the actual media session

1. When an invite-process has been successfully completed, the VoIPCore component callsthe StartSession method in the VoIPMediaHandler in order to get it ready to either startlistening or talking.

2. The VoIPMediaHandler creates the necessary components in order to record and playbackaudio.

18


23/51

5.3.4 Requesting to Talk

When the user wants to say something to the other participant, he must make a talk "request".

Figure 5.10: Interaction between the different components when requesting to talk

1. Once the request talk has been received by the VoIPMediaHandler component, it requestsan audio channel used for recording.

2. When the request has been approved (happens immediately unless some other part uses thatchannel) and thus opened, the recorder is configured.

3. Once a successful configuration of the recording has been completed, a message represent-ing a request-talk is sent to the remote end.

4. When an ack from the remote end is received, the recorder is started and the VoIPCorecomponents user is notified.

5. Every time there is new data available to send to the remote end, an RTP-packet is createdand sent. This happen frequently until a request talk is received from the remote end,signaling that it is time to start listening instead (see the incoming request talk scenario).

5.3.5 Incoming Request Talk

This scenario will describe what happens when the remote end wants to talk, i.e., when anincoming request-talk is received.

19


24/51

Figure 5.11: Interaction between the different components when a "request talk" is received

1. When a request-talk message is received the current recording is stopped (if there is a cur-rent recording) and an audiochannel used for playback is requested.

2. Once the request has been approved (happens immediately unless some other part uses thatchannel) and thus opened, the player is configured.

3. Upon a successful configuration of the playback has been completed, a message represent-ing an ack is sent to the remote end.

4. When the first data packet (RTP) arrives, a buffer holding temporary RTP packets is created.The data from the packet is unpacked and sent to the player for playback.

5. Every time that a new RTP packet is received it is put in the buffer holding the temporarypackets.

6. Whenever the player runs out of data, the next packet is retrieved from the buffer holdingthe temporary packets, unpacked, and sent to the player.

5.3.6 Incoming SIP Invite Request

This scenario deals with what happens when an incoming SIP invite has been received by the

IMS SL.

20


25/51

Figure 5.12: The interaction between the IMS SL and the VoIP-server when a SIP invite is re-ceived

1. When an incoming SIP invite request is received by the underlying architecture it notifies

the VoIPCore component, which in turn notifies its user.2. Should the user accept the incoming invite, this is forwarded to the underlying architecture,

which sets up the media session sockets in a manner very much alike the one shown inthe Invite scenario above. Once this is completed, the StartSession method is called in theVoIPMediaHandler (see Start media session above), and the VoIPCore components user isnotified with the results.

3. If the user chooses to reject the incoming SIP invite, this is merely forwarded to the un-derlying architecture, which notifies the VoIPCore component when it is completed. Thisresult is forwarded to the user of the VoIPCore.

5.3.7 Sending a SIP Bye Request

Whenever the user feels that the conversation is over, he may terminate the media session.There is also a possibility that the remote user terminates the conversation, but that is covered inthe next scenario (Incoming bye request).

Figure 5.13: The interaction between the VoIP-server and the IMS SL when sending a SIP bye

1. When the VoIPCore component receives a terminate request from its user, it simply for-wards this request to the underlying architecture.

2. The IMS SL makes sure that all the media specific sockets are closed by calling imple-mented functionality in the VoIPMediaHandler component.

3. When the termination is completed, the StopSession method in the VoIPMediaHandler iscalled, which de-allocates resources, and the the VoIPCore components user is notifiedabout the finished termination.

21


26/51

5.3.8 Incoming Bye Request

This scenario describes what happens when a SIP bye request is received from the remote end.

Figure 5.14: The interaction between the VoIP-server and the IMS SL when a SIP bye is received

1. When an incoming SIP bye request destined for VoIPCore is received, the StopSessionmethod in the VoIPMediaHandler is called in order de-allocate resources and the VoIPCorecomponents user is notified.

2. The underlying IMS SL architecture makes sure that the media specific connections areclosed bye calling implemented functionality in the VoIPMediaHandler.

22


27/51

Chapter 6

Prototype Implementation

In this chapter there will be a brief presentation of what was implemented in order to make aworking prototype. The aim of this chapter is simply to give a slight insight on some of the moreimportant things that had to be implemented in order to make the prototype reality. Focus willthus be on the most important aspects and issues that were encountered during the implementationof the prototype.

6.1 Bluetooth ConnectivityOne of the first things done, after having realized that the IMS would be one of the key factors

for making our VoIP prototype a reality, was to look into what requirements the IMS had on thedata connection it should use. This was especially important as one of the goals of the masterthesis was to see if Bluetooth would suffice as data carrier for the chosen solution.

The investigations of what was needed by the IMS, in order to use a certain data carrier,soon revealed that it could handle any type of normal data accounts, like GSM/UMTS basedpacket-switched and circuit-switched accounts. However, as was indicated during the investiga-tive interviews, there proved to be no current support for handling and managing Bluetooth basedaccounts. This obstacle was remedied by implementing a module that created Bluetooth basedaccounts for every paired device in the vicinity that provide a service for network access. After

having managed to create the accounts, the focus shifted towards manipulating the "connectionmanager", which is a module used for setting up the connection described by the data accounts.When this module had been altered to support Bluetooth accounts, there were no longer any ob-stacles for using a Bluetooth connection in the same way as any other connection. It was possiblefor the IMS to use it as well as for any other service on the phone, e.g., the web browser.

6.2 The VoIP Prototype

As have been seen in the design chapter the actual VoIP solution is implemented as two majorblocks, i.e., the VoIPCore and the VoIPMediahandler. As could also be seen in the design, theseparts interact with the SEMC IMS in order to get things done. This section describes the issuesreveled during the implementation of these components.

6.2.1 Changes in the Underlying Architecture

One thing that influenced the implementation of the VoIPCore and VoIPMediaHandler, was thefact that some parts of the needed architecture were undergoing some changes. This meant thatthere were some uncertainties when work first started on the VoIP-solution. For the implemen-tation of the VoIP-solution this meant that working code sometimes had to be discarded, as theimplementation became obsolete by an update of the underlying architecture. This fact meant thatdevelopment took longer time than would have been the case if all design aspects could have beenaccounted for from the very start.

23


28/51

6.2.2 No Support for Full-duplex Audio

Another thing that was revealed during the implementation was that there was no actual supportfor full-duplex audio in the base platform. This meant that it would only be possible to eitherrecord or playback audio, but not both at the same time. The fact that this lack resided in the basearchitecture of the platform meant that there were very little to do about it, as the base platformis developed by a third-party company. As the goal for this thesis was to investigate and developa prototype to prove the possibilities for supporting new communication technologies with thecellular phone as the interface, this was obviously a major drawback as it limits the scope to onlyhalf-duplex solutions.

However, it was our opinion, after having implemented large parts of the VoIP-solution, thatwhen this lack in the architecture is removed there will be no problem handling full-duplex audioconversations. In order to temporarily avoid the problem, and still be able to provide some form ofproof that a VoIP-solution with the cellular phone as the interface will still be possible, we shiftedtowards a half-duplex solution.

We decided that the simplest way would be to pass an application specific token betweenthe recipient and the caller, using the RTCP-protocol. This solution was chosen as there is goodsupport for this kind of token passing through the use of RTCP. In fact, the RTCP-protocol alreadyprovides the possibility to create and pass application specific data with different subtypes.

This token passing solves the problem by only letting the one with the token speak, while theother party listens. The token passing is described in the design scenarios "Requesting to Talk"

and "Incoming Request Talk" in the previous chapter. However, it must be understood that thesescenarios are not part of the actual VoIP design, but were added as a workaround for the fact thatthe platform does not handle full-duplex audio. It should also be noted that it is our belief thatwhen the audio problem is fixed, there will be little problem shifting from the half-duplex solutionto a real full-duplex VoIP solution. In fact, less work is needed for a full-duplex solution as notoken passing and state handling is needed, which is the case with the half-duplex workaround.

24


29/51

Chapter 7

Evaluation of the Prototype

In this chapter we will look back at the initial research goals and see what was actually con-cluded. During the evaluation of the possibilities for IP-telephony in SEMC cellular phones, wehave also come across a topic that we feel might need further investigation. This topic will alsobe presented below.

7.1 Answers to the Research Questions

In this section there will be a presentation of the answers to the research questions. As willbe seen most of the results have been tested and answered with the help of the VoIP-prototypedeveloped, while others questions have been logically derived, i.e., from the capabilities offeredby SIP and related concepts already available on the market.

7.1.1 Reasonable Response Times

Question: Will Bluetooth be able to handle the communication between the cellular phone andthe base unit in accordance to what is seen as "normal" response times and quality in traditionaltelephony?

Answer: With help of the prototype the Bluetooth connection has been empirically to seewhether it provides acceptable latency for voice communication. However as have been statedbefore our current prototype only operates with half-duplex audio, i.e., audio in only one direc-tion at a time. This means that the studies do not actually test if Bluetooth is able to handle realVoIP communication. To provide a likely answer to the question we refer to what is normally seenas acceptable latencies when dealing with real-time voice communication. The general opinion isthat latencies below 400 ms will be acceptable for the parties of a conversation, however latencybelow 150 ms is recommended [6].

The Bluetooth connection has been empirically tested in respect to this criterion. This wasdone by measuring the one way latency between the cellular phone and the PC providing theIP-connection, i.e., the latency imposed on the data when traveling over the actual Bluetooth

connection.The results presented below are from tests with two different packet-sizes. These were chosen

in respect to the real sizes of the data packages traversing the link during normal conversation,i.e., best and worst quality when using the codec in question.

The test was conducted in a normal open office environment, over distances of 1-12 meters.The fact that the test was conducted in an office environment means that the results should notbe interpreted as a true test of Bluetooth, but rather as an indication of the capabilities offeredto the solution in question, i.e., the results may be influenced by sources of disturbance in thesurrounding environment, such as wireless LAN, performance variations of the cellular phoneand/or PC, etc.

25


30/51

Packet Size (bytes) Distance (m) Average (ms) Max. (ms) Min. (ms)

194 1 36 65 17424 1 40 61 35Diff. - -4 4 -18194 4 34 61 19424 4 40 63 32Diff. - -6 -2 -13194 8 33 62 18424 8 39 62 24

Diff. - -6 0 -6194 12 37 75 22424 12 44 72 32Diff. - -7 3 -10

Table 7.1: Bluetooth latency measurement results

As can be concluded from table 7.1, there seems to be little impact when examining factorslike packet size and distance. This is however just true up to a certain point, but what is actuallyshowed is that the Bluetooth connection in itself should be able to handle voice communication,even for the larger packets, in a "normal" open office environment.

7.1.2 Possible to Implement IP-Telephony

Question: Is it possible to integrate IP-telephony support into a cellular phone based on theSony Ericsson architecture?Is it possible to use any pre-existing techniques from the Sony Ericsson mobile phone architecturein order to ease the implementation?

Answer: As have been described in this report there is evolving support for implementing so-lutions like VoIP in the SEMC architecture. This is mainly because of the features of the SEMCIMS. However, it is at todays date not possible to implement a fully working VoIP-solution, due

to the lack of support for full-duplex audio in the base architecture, i.e., the architecture on whichthe current phones are built. This is however just a temporary problem and as soon as it have beenremedied there will be little work to actually convert the current solution to actually work as a truefull-duplex audio VoIP-prototype.

7.1.3 Support for New Communication Technologies

Question: Is it possible to integrate support for more communication technologies based on theselected communication protocols and the Sony Ericsson mobile phone architecture?

Answer: Regarding the support for new communication technologies, we have already seen that

it is possible to support new technologies through the use of media gateways. Even if the supportis not directly part of the SEMC architecture, it is however a support that comes from the fact thatthe SEMC architecture supports the Session Initiation Protocol (SIP). The fact that SIP supportsnew communication technologies through the use of gateways, means that the support is separatedfrom the internal architecture, and this leads to some nice features like extended flexibility, loadbalancing etc.

7.2 Suggestions for Further Research

During the investigation of the possibilities for a VoIP-solution based on the SEMC architec-ture, we have come across a topic that we feel might need further research and attention. As

we have seen the transition from traditional telephony (Mobile or ordinary PSTN), towards apacket-switched technology is quite a big step. One fact that makes IP-telephony even more in-teresting is that packet-switched communication is supported by the Third Generation Networks(3G). However as the focus shifts more and more towards VoIP communication transported over

26


31/51

the Internet, the main requirement becomes that each entity or cellular phone maintains a con-stant IP-connection. This in turn leads to that the cellular phone will potentially be as exposedto malicious attacks as every other entity connected to the Internet. Adding to our concerns, webelieve that parts of the SEMC architecture and base architecture were not designed with focus onhandling potentially unsafe data. It is our opinion that there needs to be further resources devotedto investigating the potential gaps in the design and implementation of the IMS and base architec-ture, in order to make sure that all unsafe data communication is treated as if it were potentiallyharmful.

27


32/51

Chapter 8

Discussion and Related Work

When conducting this master thesis we have naturally had a general interest in what is happen-ing in different areas related to VoIP. What has been noticed is that the general interest in VoIP hasmore or less exploded during the last years (2004-2005). The fact that the general public becomesmore and more interested in using this new technology means that there is also an increased focuson the strengths and weaknesses of VoIP. In this chapter there will be a short presentation of theissues that we have found the most interesting. The topics have been selected in order to addressaspects that are important to both developers and the general public.

8.1 Network Address Translation

One major problem that one faces when dealing with SIP based VoIP are NAT situations. NATis short for Network Address Translation, and is a method used for mapping IP addresses withdifferent address scopes [7]. This often means translating between a public globally recognizedIP-address and an IP-address residing within a private network. One of the main reasons for em-ploying NAT is the fact that there actually is a shortage of public global IP-addresses on Internet.This has lead to the creation of private network with own private IP-addresses where NAT is usedto allow computers on the private networks to access Internet [8]. To make things worse thereexist different types of NATs [9], this means that there can be different behavioral patterns from

one situation to another.The general idea, although simplified, is that when an entity on the private network whishes to

exchange information with an entity residing on another network, the outgoing request is passedthrough the NAT. The NAT then creates a session for the outgoing request. It also changes theaddress and port field in the outgoing request from the internal IP address and source port of theinitiating entity, to the public address of the NAT entity, the port will also be redefined by theNAT. This means that the recipient will perceive the incoming request as if originated from theNAT entity. The answer to the incoming request will therefore be sent to the address of NAT entity,with the destination port that was previously set by the NAT in the outgoing request. When theNAT entity receives the answer on the specified port from the address of destination it remembersthat it assigned that session (port) to a request belonging to a certain entity on the private network

and can thus forward the answer to the intended recipient. For more extensive information onNAT see [10].

8.1.1 VoIP in NAT Situations

The NAT issue is quite troublesome when it comes to VoIP, if one looks at the general problemNAT imposes on a SIP/SDP based VoIP-solutions, like the one described in this paper. Theproblem is that IP and port information are not only stored in the IP header of the packet, but alsoin the message body of both SIP and SDP. This means that addressing information is embeddedin the application layer, and thus a normal NAT would be unaware of this. This leads to thefact that a message retrieved from a VoIP-client residing behind a NAT would contain addressinginformation belonging to the private domain of that client. When the recipient tries to reply it willfail as the address it is trying to reach does not exist in its addressing realm.

28


33/51

8.1.2 Avoiding the NAT Problem

There exist different types of NATs. This is one of the reasons that makes the NAT issue, orrather issues, even harder to solve, as a solution which is fully functional in one situation mightbe inadequate in another. Because of this reason there exist different types of solutions, all withtheir own benefits and drawbacks. Some solutions can handle every type of NAT, but this comesat the expense of complexity.

Common solutions for handling the VoIP NAT problem are application layer aware firewallsand NATs, MIDCOM, TURN and STUN [8].

MIDCOM is an architecture for controlling and modifying firewalls and NATs from a trustedMIDCOM agent [8]. This of course means that this must be supported by the firewalls and NATs.In a MIDCOM architecture these entities are referred to as middleboxes. In short a SIP clientresiding inside a NAT, should also implement an MIDCOM agent. This agent should thus beallowed to modify the settings and port forwarding of the NAT (middlebox), i.e., providing it istrusted by the middlebox [11]. This means that the actual address information written in the SIPand SDP messages are provided by the MIDCOM agent, and should thus be valid.

STUN and TURN approaches the problem in a slightly different manner. Instead of actuallytrying to control the NAT, they try to use the properties of NAT to avoid the problem. The generalidea is that there is a STUN or TURN server residing on the public network. The SIP client thenexchanges information with this server in order to find out which public IP and port it should writein the SIP and SDP. The exact configuration and information exchange between the SIP client and

STUN or TURN server varies dependent on the type of NAT being used.Which solution to use depend much on which scale one is operating in, and which NAT

situation one is trying to solve. For more information about the NAT issue and proposed solutionsplease see [12].

8.2 VoIP Security

As VoIP solutions are becoming more widely spread, the concern for security has also becomemore apparent. The main concern is eavesdropping, which means that someone might listen toyour call [13]. In VoIP, this listening might be more that just listening to the actual voice data,it could also mean that someone is picking up on the metadata being sent in order to actuallyset up the call. This data can then be used for denial of service attacks, unwanted advertising,and hijacking of services as it contains system specific information about ports and capabilities[13,14].

To solve these issues, there is a need to secure both the signaling protocol and the mediasession, i.e., both the call setup and the actual sending of voice data [14]. To solve these issuesin a solution like the one in this report, which is based on the session initiation protocol (SIP),one could use either end-to-end security or hop-by-hop security [14]. End-to-end point securityis achieved using SIP features specifically developed for the purpose of establishing a secureconnection between the caller and the callee. SIP does, however, not provide any mechanisms forsupporting hop-by-hop security, i.e., providing security between different SIP entities that takepart in the call signaling [14]. Security between hops is instead handled by the use of IPsec (IP

Security) or TLS (Transport Layer Security) [14]. The actual need for hop security arises fromthe fact that intermediate SIP entities, like SIP-proxies, might need to read or write informationto or from the SIP message [14]. As security between different hops along the signaling pathis handled separately between entities, the entire messages can be encrypted and secured. Thismeans that information, like the via, from, and to headers, will not be visible to outside parties, asis the case with end-to-end security. This means that it will not be possible for outside parties tofigure out who is calling who and through which servers [14]. It should however be noted that theusage of IPsec and TLS has limitations. IPSec can only work between SIP entities which have astatic relation, i.e., IPSec enables a secure connection between known entities, while TLS has thelimitation that it does not work with UDP [14]. In order to secure an entire call, one might need tocombine different techniques, end-to-end or hop-by-hop security for the call signaling and when

exchanging information over the secure connection established, and then use for example SRTP(Secure Real-time Transport protocol) for the actual media [15]. One should also be aware of thefact that security impacts the performance aspects of the call session [14].

29


34/51

8.3 Public Safety

An issue that has gained more attention as VoIP solutions has become more widely used is thepublic safety issue. This issue arises when using VoIP in emergency situations. The basis forbeing able to use VoIP at all in an emergency situation is that the VoIP service provider offerssome form of emergency handling. This handling could be more or less advanced, i.e., the serviceprovider could offer a special emergency solution or just put you through to the emergency serviceby using PSTN bridging. The problem with forwarding emergency calls using PSTN bridging isthat this may actual confuse, as the number being received by the emergency services is the phone

number of the PSTN gateway. This is troublesome as emergency services use the callers phonenumber to find out the geographical position of the caller. This is something that works whena call really originates from a real PSTN phone, but in the case of the call originating from aVoIP phone, this may lead to that the emergency response is directed to the address of the PSTNgateway and not to the actual callers address [16].

Another cause for concern, when it comes to using VoIP for placing critical calls, is the factthat one can not expect the same quality of service from VoIP as from PSTN, as Internet, overwhich the call is placed, is a best effort network. Even though the quality of service, when itcomes to VoIP, is getting better all the time, it is something that must be considered. As VoIPuses normal computer networks for placing calls there is also an extended risk for not being ableto place an emergency call in case of power outs [17]. To understand the severity of a power outone could just look at a normal power out situation. In case of power failure in a normal familyhome, all computer communication will not work as the computer based home network relies onpower to function properly, i.e., all equipment, like cable modems, routers, and VoIP boxes, needan external power supply in order to function.

As VoIP becomes more widely used, the public safety issue has also received more focus, anddifferent solutions have been proposed in order to handle the safety issues. The proposed solutionsvary in complexity; everything from manually entering your location when signing on the VoIPnetwork [17], to solutions like direct truncating, where emergency calls are automatically routedto public safety answering points [18], have been proposed.

It is however our opinion that the solution presented in this report, where the actual VoIPclient is implemented in a cellular phone, offers some good solutions to these issues, as it enablesthe option to route all emergency calls through the cellular network instead of relying on the

capabilities offered by the VoIP network. Although there are opinions that one should not rely onother services to provide emergency handling, as this will slow down the development of VoIP, westill believe that a solution like the one presented in this report will serve a purpose until the VoIPemergency handling have matured and there have been a standard developed for public safetyusing VoIP.

8.4 Related Solutions

In this section there will be a presentation of products and solutions that have capabilities relatedto the solution presented in this report. The goal is to provide a simple comparison between thecapabilities offered by the solution presented in this report and other solutions.

When work first started on this master thesis project, the products on the VoIP market werequite immature, at least this was our opinion, and this was basically what made us interested inthe subject for this project.

Most solutions revolved around software phones for different platforms or different USBbased solutions for handling the voice. The USB solutions were often based on handset that con-nects to a computer, which in turn runs a soft phone application. The handsets, in such solutions,are more or less only used as a microphone and loudspeaker [19,20].

Other solutions that were common, and still are quite common, are solutions that use ATAboxes. These boxes are used to translate from the VoIP protocol (like SIP), to standard PSTNsignaling, which enables the use of standard PSTN telephones, wired or wireless [21]. Althoughthese solutions are still used, the general trend is towards using WIFI enabled Pocket PCs [22].],

or even more, solutions which combine VoIP capabilities with the capabilities of cellular phones[23], much like the solution in this report. Most of these products have however not yet hit themarket.

30


35/51


36/51

Chapter 9

Conclusions

In this report there has been a presentation of the investigative work on the possibilities forimplementing VoIP in the Sony Ericsson cellular phones, using the SEMC architecture. Thepossibilities to support such a solution over Bluetooth have also been investigated.

The investigations in this report have shown that there is partial support for VoIP in the SEMCarchitecture. In order to have full VoIP-support, the issue of the base architecture only handlinghalf-duplex audio must be addressed. It has also been concluded that the best option for im-plementing a VoIP-solution, in a Sony Ericsson cellular phone, is to use the Session Initiation

Protocol (SIP) for call signalling and the Real-time Transport Protocol (RTP) for media stream-ing. The SIP and RTP protocols are supported through the use of the SEMC IMS architecture. Ithas also been concluded that a SIP and RTP based solution could support other communicationtechnologies like PSTN, through the use of gateways.

The VoIP-support in the SEMC architecture was empirically tested by implementing a proto-type. Measurements performed on this prototype show that Bluetooth will fulfil the requirementsfor most VoIP-solutions, i.e., in respect to latency and bandwidth.

32


37/51

Acknowledgements

First of all we would like to thank everyone at UMTS and GSM Services at Sony EricssonMobile Communications AB in Lund, Sweden. Everyone have been very understanding, helpful,and willing to spend time answering questions regarding the SEMC mobile phone architectureand the development environment.

We would like to give special thanks to Anna Gransson, Gary Cole, Hkan Grahn, MikaelKanstrup, Pr Olsson, Suri Maddhula, and Tobias kesson, as they have been particularly helpful.

33


38/51

Bibliography

[1] Gonzalo Camarillo. SIP De

voice over ip for sony ericsson phones

Documents