chapter 2 theoritical foundation 2.1 theoretical...

6

CHAPTER 2

THEORITICAL FOUNDATION

2.1 Theoretical Foundation

This section explains the generally used terms throughout this thesis report.

Also, it describes all required theories that support all the described statements and

provide the solution to overcome the problem.

In this project, there are three essential steps that will be the milestones of the

project. They are creating an eye-tracking algorithm to detect the eyeballs,

synchronize the eyeballs’ movement with the pointer inside the Android platform

application, and creating a gaze-tracking algorithm to determine which way - on the

screen - the user’s eyes look into.

2.1.1 Android

Android is a software bunch comprising, not only operating system, but also

middleware and key applications. Android Inc. was founded in Palo Alto of

California, U.S. by Andy Rubin, Rich miner, Nick sears and Chris White in 2003.

Later Android Inc. was acquired by Google in 2005. After original release there have

been number of updates in the original version of Android [10]. These are the

overview for differences in several versions of Android releases [10]:

7

Figure 2.1.1: Android Releases Differences Overview

Google Android presents a software stack for mobile devices constituting an

operating system, middleware and key applications. The Android SDK provides the

necessary tools and APIs required to develop custom applications using the Java

programming language. Android OS and Android SDK comes with a set of core

applications, all written using the Java programming language, including an email

client, SMS program, calendar, maps, browser, contacts, and others [11].

Google's Android is much more available and talked about than any other

mobile OS. Smartphones, unlike a few years ago, have attracted a mass today that

are not professionals but are looking for more entertainment out of their cell phones.

Google Android phones have become fairly popular because of the deluge of

advantages it offers, which are:

8

• Open Platform: Google Android with Android OS and Google Android SDK

is an open platform, which means that the Google code is available for people

to look at and edit for making their projects fairly innovative and gives we

quality features for anyone to program them into the system.

• In addition to this, Google Open Source Platform also indicates that the

device could be used on multiple networks. An Android phone makes itself

available on most popular networks these days. This helps when we are

switching from one to another because we won't have to make a complete

change in whatever we are using.

• Android OS permits third parties to develop applications for the Android

phone that could be installed and used by anyone. This is in contrast to many

other platforms that demand a permission in order to install the software.

With the Google Android phone, we are free to choose which software we

wish to install.

• An Android phone comes with a guarantee that it works well with Google

products. Google products have a huge customer base for the variety of

features and flexibility they offer. Whether it's YouTube, Gmail, Google

Docs or any other Google product; an Android phone gives we access to a

wide variety of applications that we can comfortably use within our phone

and our computer.

• Eventually this platform will work on netbook and computers. This means

that we could have devices that share the same platform giving we the ability

to purchase applications that will work on all our devices [11].

9

In the most-used Android platform, version 2.3.3 or Gingerbread, the

Android Compatibility Definition Document clearly stated that an Android device is

highly recommended to have a front-facing camera with, at least, VGA resolution

(that is, 640 * 480 pixels) [12]. This requirement is one of the main reasons that

Android is chosen to be the base of this project development.

2.1.1.1 Software

Android applications are written in Java – a relatively easy to learn, friendly

language for new developers. Android apps are developed on a computer – PC or

Mac – and then compiled and sent to the device for testing. If we don’t have an

Android device yet, there are emulators that simulate an Android device on our

computer, meaning that we can still develop an Android game or application without

owning one [13].

The major components of Android are:

• Linux kernel. Android relies on Linux 2.6. The kernel acts as an abstraction

layer between the hardware and the rest of the software stack and provides

core system services: security, memory management, process management,

network stack, and driver model.

• Android runtime. Android includes a set of core libraries that provides most

of the functionality available in the core libraries of the Java programming

language. Every Android application runs in its own process, with its own

instance of the Dalvik virtual machine. Dalvik has been written so that a

device can run multiple VMs efficiently. The Dalvik VM executes files in the

Dalvik Executable (.dex) format which is optimized for minimal memory

footprint.

10

• Libraries. Android includes a set of C/C++ libraries used by various

components of the Android system. These capabilities are exposed to

developers through the Android application framework.

• Application framework. Developers have full access to the same framework

APIs used by the core applications. The application architecture is designed

to simplify the reuse of components.

• Applications. Android ships with a set of core applications including an email

client, SMS program, calendar, maps, browser, contacts, and others [14].

Figure 2.1.2: Android Major Components

To develop an Android application, it is required to use a customized IDE

(Integrated Development Environment) supporting Android platform development.

The most used Android-supporting IDE is Eclipse because it’s the easiest and most

11

hassle-free development tool for Android in the time of this writing [13]. Other

alternatives include NetBeans, IntelliJ IDEA, DeuterIDE, and many more.

2.1.1.1.1 SDK (Software Development Kit)

The Android SDK is composed of modular packages that we can download

separately using the Android SDK Manager. There are several different packages

available for the Android SDK. The table below describes most of the available

packages and where they're located once we download them [15].

Package Description File Location

SDK Tools

Contains tools for debugging and testing,

plus other utilities that are required to

develop an app. If we've just installed the

SDK starter package, then we already have

the latest version of this package. Make

sure we keep this up to date.

<sdk>/tools/

SDK Platform-

tools

Contains platform-dependent tools for

developing and debugging our application.

These tools support the latest features of

the Android platform and are typically

updated only when a new platform

becomes available. These tools are always

backward compatible with older platforms,

but we must be sure that we have the latest

version of these tools when we install a

<sdk>/platform-

tools/

12

new SDK platform.

Documentation An offline copy of the latest documentation

for the Android platform APIs. <sdk>/docs/

SDK Platform

There's one SDK Platform available for

each version of Android. It includes

an android.jar file with a fully compliant

Android library. In order to build an

Android app, we must specify an SDK

platform as our build target.

<sdk>/platforms/<

android-version>/

System Images

Each platform version offers one or more

different system images (such as for ARM

and x86). The Android emulator requires a

system image to operate. We should always

test our app on the latest version of

Android and using the emulator with the

latest system image is a good way to do so.

<sdk>/platforms/<

android-version>/

Sources for

Android SDK

A copy of the Android platform source

code that's useful for stepping through the

code while debugging our app.

<sdk>/sources/

Samples for

SDK

A collection of sample apps that

demonstrate a variety of the platform APIs.

These are a great resource to browse

Android app code. The API Demos app in

<sdk>/platforms/<

android-

version>/samples/

13

particular provides a huge number of small

demos we should explore.

Google APIs

An SDK add-on that provides both a

platform we can use to develop an app

using special Google APIs and a system

image for the emulator so we can test our

app using the Google APIs.

<sdk>/add-ons/

Android

Support

A static library we can include in our app

sources in order to use powerful APIs that

aren't available in the standard platform.

For example, the support library contains

versions of the Fragment class that's

compatible with Android 1.6 and higher

(the class was originally introduced in

Android 3.0) and the ViewPager APIs that

allow we to easily build a side-swipeable

UI.

<sdk>/extras/andr

oid/support/

Google Play

Billing

Provides the static libraries and samples

that allow we to integrate billing services in

our app with Google Play.

<sdk>/extras/goog

le/

Google Play

Licensing

Provides the static libraries and samples

that allow we to perform license

verification for our app when distributing

<sdk>/extras/goog

le/

14

with Google Play.

Table 2.1.1: Android SDK Available Packages

2.1.1.1.2 Java

Android uses Java programming language to develop applications on

Android platform. Therefore, it is necessary to install Java SDK (Java SE

Development Kit) inside the computer that will be used to develop Android

application. Java Platform, Standard Edition (Java SE) lets we develop and deploy

Java applications on desktops and servers, as well as in today's demanding embedded

environments. Java offers the rich user interface, performance, versatility, portability,

and security that today’s applications require [16].

2.1.1.1.3 NDK (Native Development Kit)

The NDK is a toolset that allows us to implement parts of our app using

native-code languages such as C and C++. For certain types of apps, this can be

helpful so that we may reuse existing code libraries written in these languages and

possibly increased performance [17].

Before downloading the NDK, we should understand that the NDK will not

benefit most apps. As a developer, we need to balance its benefits against its

drawbacks. Notably, using native code on Android generally does not result in a

noticeable performance improvement, but it always increases our app complexity. In

general, we should only use the NDK if it is essential to our app—never because we

simply prefer to program in C/C++ [17].

Typical good candidates for the NDK are self-contained, CPU-intensive

operations that don't allocate much memory, such as signal processing, physics

simulation, and so on. When examining whether or not we should develop in native

15

code, think about our requirements and see if the Android framework APIs provide

the functionality that we need [17].

To conclude, the NDK provides [18]:

• A set of tools and build files used to generate native code libraries from C

and C++ sources

• A way to embed the corresponding native libraries into application packages

files (.apks) that can be deployed on Android devices

• A set of native system headers and libraries that will be supported in all

future releases of the Android platform, starting from Android 1.5

• Documentation, samples, and tutorials [18]

2.1.1.1.4 C/C++

C is a programming language originally developed for developing the Unix

operating system. It is a low-level and powerful language, but it lacks many modern

and useful constructs. C++ is a newer language, based on C that adds many more

modern programming language features that make it easier to program than C [19].

Basically, C++ maintains all aspects of the C language, while providing new

features to programmers that make it easier to write useful and sophisticated

programs. [19]

For example, C++ makes it easier to manage memory and adds several

features to allow "object-oriented" programming and "generic" programming.

Basically, it makes it easier for programmers to stop thinking about the nitty-gritty

details of how the machine works and think about the problems they are trying to

solve [19].

C++ is a powerful general-purpose programming language. It can be used to

create small programs or large applications. It can be used to make CGI scripts or

16

console-only DOS programs. C++ allows we to create programs to do almost

anything we need to do. The creator of C++, Bjarne Stroustrup, has put together a

partial list of applications written in C++ [19].

2.1.1.1.5 Application Components [20]

Application components are the essential building blocks of an Android

application. Each component is a different point through which the system can enter

our application. Not all components are actual entry points for the user and some

depend on each other, but each one exists as its own entity and plays a specific

role—each one is a unique building block that helps define our application's overall

behavior [21].

A unique aspect of the Android system design is that any application can start

another application’s component. For example, if we want the user to capture a photo

with the device camera, there's probably another application that does that and our

application can use it, instead of developing an activity to capture a photo ourselves.

We don't need to incorporate or even link to the code from the camera application.

Instead, we can simply start the activity in the camera application that captures a

photo. When complete, the photo is even returned to our application so we can use it.

To the user, it seems as if the camera is actually a part of our application [21].

When the system starts a component, it starts the process for that application

(if it's not already running) and instantiates the classes needed for the component.

For example, if our application starts the activity in the camera application that

captures a photo, that activity runs in the process that belongs to the camera

application, not in our application's process. Therefore, unlike applications on most

other systems, Android applications don't have a single entry point (there's no main()

function, for example) [21].

17

Because the system runs each application in a separate process with file

permissions that restrict access to other applications, our application cannot directly

activate a component from another application. The Android system, however, can.

So, to activate a component in another application, we must deliver a message to the

system that specifies our intent to start a particular component. The system then

activates the component for us [21].

There are four different types of application components. Each type serves a

distinct purpose and has a distinct lifecycle that defines how the component is

created and destroyed [21].

Here are the four types of application components [21]:

2.1.1.1.5.1 Activities [21]

An activity represents a single screen with a user interface. For example, an

email application might have one activity that shows a list of new emails, another

activity to compose an email, and another activity for reading emails. Although the

activities work together to form a cohesive user experience in the email application,

each one is independent of the others. As such, a different application can start any

one of these activities (if the email application allows it). For example, a camera

application can start the activity in the email application that composes new mail, in

order for the user to share a picture.

An activity is implemented as a subclass of Activity.

2.1.1.1.5.2 Services [21]

A service is a component that runs in the background to perform long-

running operations or to perform work for remote processes. A service does not

provide a user interface. For example, a service might play music in the background

while the user is in a different application, or it might fetch data over the network

18

without blocking user interaction with an activity. Another component, such as an

activity, can start the service and let it run or bind to it in order to interact with it.

A service is implemented as a subclass of Service.

2.1.1.1.5.3 Content Providers [21]

A content provider manages a shared set of application data. We can store the

data in the file system, an SQLite database, on the web, or any other persistent

storage location our application can access. Through the content provider, other

applications can query or even modify the data (if the content provider allows it). For

example, the Android system provides a content provider that manages the user's

contact information. As such, any application with the proper permissions can query

part of the content provider (such as ContactsContract.Data) to read and write

information about a particular person.

Content providers are also useful for reading and writing data that is private

to our application and not shared. For example, the Note Pad sample application uses

a content provider to save notes.

A content provider is implemented as a subclass of ContentProvider and

must implement a standard set of APIs that enable other applications to perform

transactions.

2.1.1.1.5.4 Broadcast Receivers [21]

A broadcast receiver is a component that responds to system-wide broadcast

announcements. Many broadcasts originate from the system—for example, a

broadcast announcing that the screen has turned off, the battery is low, or a picture

was captured. Applications can also initiate broadcasts—for example, to let other

applications know that some data has been downloaded to the device and is available

for them to use. Although broadcast receivers don't display a user interface, they may

19

create a status bar notification to alert the user when a broadcast event occurs. More

commonly, though, a broadcast receiver is just a "gateway" to other components and

is intended to do a very minimal amount of work. For instance, it might initiate a

service to perform some work based on the event.

A broadcast receiver is implemented as a subclass of BroadcastReceiver and

each broadcast is delivered as an Intent object.

2.1.1.2 Hardware

When building a mobile application, it's important that we always test our

application on a real device before releasing it to users. This page describes how to

set up our development environment and Android-powered device for testing and

debugging on the device [22].

We can use any Android-powered device as an environment for running,

debugging, and testing our applications. The tools included in the SDK make it easy

to install and run our application on the device each time we compile. We can install

our application on the device directly from Eclipse or from the command line with

ADB. If we don't yet have a device, check with the service providers in our area to

determine which Android-powered devices are available [22].

2.1.1.2.1 Minimum Specifications

While Android is designed to support a wide variety of hardware platforms

and configurations, this section provides recommended minimum device

requirements [23].

Feature Minimum

Requirement Notes

20

Chipset ARM-based

For the first release, Android is primarily

targeted towards mobile handsets and

portions of the platform, such as Dalvik

VM graphics processing, currently

assume an ARM architecture.

Memory 128 MB RAM; 256

MB Flash External

Android can boot and run in

configurations with less memory, but it

isn't recommended.

Storage Mini or Micro SD Not necessary for basic bring up, but

recommended.

Primary

Display HVGA required

The current Android interface targets a

touch-based HVGA resolution display

with a touch-interface no smaller than 2.8

inches in size. However, smaller displays

will suffice for initial porting.

Navigation

Keys

5-way navigation with

5 application keys,

power, camera and

volume controls

Camera 2MP CMOS Not required for basic bring up.

USB Standard mini-B USB

interface

Android uses the USB interface for

flashing the device system images and

21

debugging a running device.

Bluetooth 1.2 or 2.0 Not required for initial bring up.

Table 2.1.2: Android Device Minimum Requirements

If available, our Android device can also benefit from the following optional

device characteristics [23]:

• QWERTY keyboard

• WiFi

• GPS

2.1.2 User Interface

User Interface, abbreviated UI, is the junction between a user and a computer

program. An interface is a set of commands or menus through which a user

communicates with a program. A command-driven interface is one in which we enter

commands. A menu-driven interface is one in which we select command choices

from various menus displayed on the screen [24].

The user interface is one of the most important parts of any program because

it determines how easily we can make the program do what we want. A powerful

program with a poorly designed user interface has little value [24].

2.1.2.1 Graphical User Interface

Graphical User Interface, abbreviated GUI, is a program interface that takes

advantage of the computer's graphics capabilities to make the program easier to use.

Well-designed graphical user interfaces can free the user from learning complex

command languages. Graphical user interfaces, such as Microsoft Windows and the

one used by the Apple Macintosh, feature the following basic components [25]:

22

• Pointer: A symbol that appears on the display screen and that we move to

select objects and commands. Usually, the pointer appears as a small angled

arrow. Text -processing applications, however, use an I-beam pointer that is

shaped like a capital I.

• Pointing device: A device, such as a mouse or trackball that enables us to

select objects on the display screen.

• Icons: Small pictures that represent commands, files, or windows. By moving

the pointer to the icon and pressing a mouse button, we can execute a

command or convert the icon into a window. We can also move the icons

around the display screen as if they were real objects on our desk.

• Desktop: The area on the display screen where icons are grouped is often

referred to as the desktop because the icons are intended to represent real

objects on a real desktop.

• Windows: We can divide the screen into different areas. In each window, we

can run a different program or display a different file. We can move windows

around the display screen, and change their shape and size at will.

• Menus: Most graphical user interfaces let us execute commands by selecting

a choice from a menu.

2.1.2.2 User Interface Control

An element of a graphical user interface such as a button, menu, list box, text

window and dialog box [26].

2.1.3 Human-Computer Interaction

Human-computer interaction (HCI) is an area of research and practice that

emerged in the early 1980s, initially as a specialty area in computer science

23

embracing cognitive science and human factors engineering. HCI has expanded

rapidly and steadily for three decades, attracting professionals from many other

disciplines and incorporating diverse concepts and approaches. To a considerable

extent, HCI now aggregates a collection of semi-autonomous fields of research and

practice in human-centered informatics. However, the continuing synthesis of

disparate conceptions and approaches to science and practice in HCI has produced a

dramatic example of how different epistemologies and paradigms can be reconciled

and integrated in a vibrant and productive intellectual project [27].

In interaction with a computer, the human input is the data output by the

computer vice versa. Input in humans occurs mainly through the senses and output

through the motor controls of the effectors. Vision, hearing and touch are the most

important senses in HCI. The fingers, voice, eyes, head and body position are the

primary effectors [28].

2.1.3.1 Eye Tracking

Eye tracking is a research method that determines what part of an

advertisement consumers look at, by tracking the pattern of their eye movements

[29].

In the simplest terms, eye tracking is the measurement of eye activity. The

concept is basic, but the process and interpretation can be quite complex [30].

Eye tracking data is collected using either a remote or head-mounted ‘eye

tracker’ connected to a computer. While there are many different types of non-

intrusive eye trackers, they generally include two common components: a light

source and a camera. The light source (usually infrared) is directed toward the eye.

The camera tracks the reflection of the light source along with visible ocular features

such as the pupil. This data is used to extrapolate the rotation of the eye and

24

ultimately the direction of gaze. The eye tracker also detects additional information

such as blink frequency and changes in pupil diameter. The aggregated data is

written to a file that is compatible with eye-tracking analysis software. There are

many different methods of exploring eye data. The most common is to analyze the

visual path of one or more participants across an interface such as a computer screen.

Each eye data observation is translated into a set of pixel coordinates. From there, the

presence or absence of eye data points in different screen areas can be examined.

This type of analysis is used to determine which features are seen, when a particular

feature captures attention, how quickly the eye moves, what content is overlooked

and virtually any other gaze-related question. Beyond the analysis of visual attention,

eye data can be examined to measure the cognitive state and workload of a

participant [30].

2.1.3.2 Eye Gaze

Eye Gaze allows us to control the computer by looking at it, while wearing

special glasses, head-mounted boxes etc. By tracking a laser beam’s reflection in the

eye, the direction in which the eye is looking is determined. The system needs to be

tuned and is very expensive, but also very accurate [28].

2.1.3.3 Blink Control

Blink Control is an assistive technology to assist people to overcome their

disability by providing other means of control and communication [31].

2.1.4 Input Devices

An input device is any hardware device that sends data to the computer,

without any input devices, a computer would only be a display device and not allow

users to interact with it, much like a TV [32].

25

2.1.4.1 Front-Facing Camera

Front-facing camera is a camera on the front of the phone, facing the user.

This enables two-way video chat, and is also useful for capturing self-portraits. The

resolution and quality of the front-facing camera is often inferior to the rear, main

camera [33].

2.1.5 Computer Vision

Computer Vision, abbreviated CV, is the process of taking a live raster image

(represented as a matrix of numeric values) and interpreting it into higher-level data

abstractions and symbolic objects (such as humans, limbs, faces, props, poses,

gestures, etc.). In this way, Computer Vision is the inverse of Computer Graphics (in

a similar fashion, the camera is the dual of the projector) [34].

Further, computer vision is the area of research that deals with computers

abstracting high-level data constructs from video-based data streams. By applying

intelligence to the signal, the purpose is to get the machine to recognize objects,

humans, poses, and gestures with a decent degree of accuracy. The broader field is

called Machine Vision. Computer vision is a subset of Computer Science. Aligned

fields include Robotics, Signal Processing, and Artificial Intelligence [34].

2.1.5.1 OpenCV

OpenCV (Open Source Computer Vision) is a library of programming

functions mainly aimed at real-time computer vision application development.

Originally developed by Intel, it takes particular advantage of the Intel Performance

Primitives (IPP) libraries [34].

OpenCV has a modular structure, which means that the package includes

several shared or static libraries. The following modules are available [35]:

26

• core - a compact module defining basic data structures, including the dense

multi-dimensional array Mat and basic functions used by all other modules.

• imgproc - an image-processing module that includes linear and non-linear

image filtering, geometrical image transformations (resize, affine and

perspective warping, generic table-based remapping), color space conversion,

histograms, and so on.

• video - a video analysis module that includes motion estimation, background

subtraction, and object tracking algorithms.

• calib3d - basic multiple-view geometry algorithms, single and stereo camera

calibration, object pose estimation, stereo correspondence algorithms, and

elements of 3D reconstruction.

• features2d - salient feature detectors, descriptors, and descriptor matchers.

• objdetect - detection of objects and instances of the predefined classes (for

example, faces, eyes, mugs, people, cars, and so on).

• highgui - an easy-to-use interface to video capturing, image and video

codecs, as well as simple UI capabilities.

• gpu - GPU-accelerated algorithms from different OpenCV modules.

• ... Some other helper modules, such as FLANN and Google test wrappers,

Python bindings, and others.

2.1.5.2 OpenCV4Android

OpenCV Manager is an Android service targeted to manage OpenCV library

binaries on end users devices. It allows sharing the OpenCV dynamic libraries of

different versions between applications on the same device.

The Manager provides the following benefits [36]:

27

• Less memory usage. All apps use the same binaries from service and do not

keep native libs inside themselves;

• Hardware specific optimizations for all supported platforms;

• Trusted OpenCV library source. All packages with OpenCV are published on

Google Play service;

• Regular updates and bug fixes;

2.1.6 Display Calibration

Display calibration is the process of using a display’s setting controls to

adjust the on-screen image so that it matches the original source content. This allows

the calibrated display to accurately reproduce the color and clarity of the video

signals from any source device, be it a computer, digital signage player,

cable/satellite box, or Blu-Ray player. Whether it is to optimize a TV display for the

best image quality possible or to ensure that we are working on a properly profiled

monitor, display calibration removes any doubt that the digital content we are

viewing and/or creating is accurately represented on-screen [37].

2.2 Theoretical Frameworks

The research methodology that the author will use in developing the

application is by combining algorithm from available sources to create stable, fast

and effective algorithm to detect user’s eyes and gaze. An extensive literature study

of eye tracking and eye-gaze will also be done to support the author in finding the

right and exact solution. Available sources are including Internet journals, websites,

governmental paper, scientific papers, and books. The author will also ask the thesis

supervisor which is already an expert in computer vision field.

28

Below are the development steps in developing Android Eye-Tracking Based

User Interface Control Application, they are:

• Analyze similar applications and possible solutions which are being

developed

• Create overview the existing system of eye-tracking application, Android

native application with pointer, eye-tracking algorithm, and blink control

algorithm from trusted and detailed source

• Combine and improve existing algorithms into own customized efficient and

effective algorithms

• Compare the differences, both advantages and disadvantages, between

existing system with the new solution

• Design the application prototype

• Implement the application prototype

• Test the application prototype

• Evaluate the prototype

chapter 2 theoritical foundation 2.1 theoretical...

Documents