chapter 2 theoritical foundation 2.1 theoretical...
TRANSCRIPT
6
CHAPTER 2
THEORITICAL FOUNDATION
2.1 Theoretical Foundation
This section explains the generally used terms throughout this thesis report.
Also, it describes all required theories that support all the described statements and
provide the solution to overcome the problem.
In this project, there are three essential steps that will be the milestones of the
project. They are creating an eye-tracking algorithm to detect the eyeballs,
synchronize the eyeballs’ movement with the pointer inside the Android platform
application, and creating a gaze-tracking algorithm to determine which way - on the
screen - the user’s eyes look into.
2.1.1 Android
Android is a software bunch comprising, not only operating system, but also
middleware and key applications. Android Inc. was founded in Palo Alto of
California, U.S. by Andy Rubin, Rich miner, Nick sears and Chris White in 2003.
Later Android Inc. was acquired by Google in 2005. After original release there have
been number of updates in the original version of Android [10]. These are the
overview for differences in several versions of Android releases [10]:
7
Figure 2.1.1: Android Releases Differences Overview
Google Android presents a software stack for mobile devices constituting an
operating system, middleware and key applications. The Android SDK provides the
necessary tools and APIs required to develop custom applications using the Java
programming language. Android OS and Android SDK comes with a set of core
applications, all written using the Java programming language, including an email
client, SMS program, calendar, maps, browser, contacts, and others [11].
Google's Android is much more available and talked about than any other
mobile OS. Smartphones, unlike a few years ago, have attracted a mass today that
are not professionals but are looking for more entertainment out of their cell phones.
Google Android phones have become fairly popular because of the deluge of
advantages it offers, which are:
8
• Open Platform: Google Android with Android OS and Google Android SDK
is an open platform, which means that the Google code is available for people
to look at and edit for making their projects fairly innovative and gives we
quality features for anyone to program them into the system.
• In addition to this, Google Open Source Platform also indicates that the
device could be used on multiple networks. An Android phone makes itself
available on most popular networks these days. This helps when we are
switching from one to another because we won't have to make a complete
change in whatever we are using.
• Android OS permits third parties to develop applications for the Android
phone that could be installed and used by anyone. This is in contrast to many
other platforms that demand a permission in order to install the software.
With the Google Android phone, we are free to choose which software we
wish to install.
• An Android phone comes with a guarantee that it works well with Google
products. Google products have a huge customer base for the variety of
features and flexibility they offer. Whether it's YouTube, Gmail, Google
Docs or any other Google product; an Android phone gives we access to a
wide variety of applications that we can comfortably use within our phone
and our computer.
• Eventually this platform will work on netbook and computers. This means
that we could have devices that share the same platform giving we the ability
to purchase applications that will work on all our devices [11].
9
In the most-used Android platform, version 2.3.3 or Gingerbread, the
Android Compatibility Definition Document clearly stated that an Android device is
highly recommended to have a front-facing camera with, at least, VGA resolution
(that is, 640 * 480 pixels) [12]. This requirement is one of the main reasons that
Android is chosen to be the base of this project development.
2.1.1.1 Software
Android applications are written in Java – a relatively easy to learn, friendly
language for new developers. Android apps are developed on a computer – PC or
Mac – and then compiled and sent to the device for testing. If we don’t have an
Android device yet, there are emulators that simulate an Android device on our
computer, meaning that we can still develop an Android game or application without
owning one [13].
The major components of Android are:
• Linux kernel. Android relies on Linux 2.6. The kernel acts as an abstraction
layer between the hardware and the rest of the software stack and provides
core system services: security, memory management, process management,
network stack, and driver model.
• Android runtime. Android includes a set of core libraries that provides most
of the functionality available in the core libraries of the Java programming
language. Every Android application runs in its own process, with its own
instance of the Dalvik virtual machine. Dalvik has been written so that a
device can run multiple VMs efficiently. The Dalvik VM executes files in the
Dalvik Executable (.dex) format which is optimized for minimal memory
footprint.
10
• Libraries. Android includes a set of C/C++ libraries used by various
components of the Android system. These capabilities are exposed to
developers through the Android application framework.
• Application framework. Developers have full access to the same framework
APIs used by the core applications. The application architecture is designed
to simplify the reuse of components.
• Applications. Android ships with a set of core applications including an email
client, SMS program, calendar, maps, browser, contacts, and others [14].
Figure 2.1.2: Android Major Components
To develop an Android application, it is required to use a customized IDE
(Integrated Development Environment) supporting Android platform development.
The most used Android-supporting IDE is Eclipse because it’s the easiest and most
11
hassle-free development tool for Android in the time of this writing [13]. Other
alternatives include NetBeans, IntelliJ IDEA, DeuterIDE, and many more.
2.1.1.1.1 SDK (Software Development Kit)
The Android SDK is composed of modular packages that we can download
separately using the Android SDK Manager. There are several different packages
available for the Android SDK. The table below describes most of the available
packages and where they're located once we download them [15].
Package Description File Location
SDK Tools
Contains tools for debugging and testing,
plus other utilities that are required to
develop an app. If we've just installed the
SDK starter package, then we already have
the latest version of this package. Make
sure we keep this up to date.
<sdk>/tools/
SDK Platform-
tools
Contains platform-dependent tools for
developing and debugging our application.
These tools support the latest features of
the Android platform and are typically
updated only when a new platform
becomes available. These tools are always
backward compatible with older platforms,
but we must be sure that we have the latest
version of these tools when we install a
<sdk>/platform-
tools/
12
new SDK platform.
Documentation An offline copy of the latest documentation
for the Android platform APIs. <sdk>/docs/
SDK Platform
There's one SDK Platform available for
each version of Android. It includes
an android.jar file with a fully compliant
Android library. In order to build an
Android app, we must specify an SDK
platform as our build target.
<sdk>/platforms/<
android-version>/
System Images
Each platform version offers one or more
different system images (such as for ARM
and x86). The Android emulator requires a
system image to operate. We should always
test our app on the latest version of
Android and using the emulator with the
latest system image is a good way to do so.
<sdk>/platforms/<
android-version>/
Sources for
Android SDK
A copy of the Android platform source
code that's useful for stepping through the
code while debugging our app.
<sdk>/sources/
Samples for
SDK
A collection of sample apps that
demonstrate a variety of the platform APIs.
These are a great resource to browse
Android app code. The API Demos app in
<sdk>/platforms/<
android-
version>/samples/
13
particular provides a huge number of small
demos we should explore.
Google APIs
An SDK add-on that provides both a
platform we can use to develop an app
using special Google APIs and a system
image for the emulator so we can test our
app using the Google APIs.
<sdk>/add-ons/
Android
Support
A static library we can include in our app
sources in order to use powerful APIs that
aren't available in the standard platform.
For example, the support library contains
versions of the Fragment class that's
compatible with Android 1.6 and higher
(the class was originally introduced in
Android 3.0) and the ViewPager APIs that
allow we to easily build a side-swipeable
UI.
<sdk>/extras/andr
oid/support/
Google Play
Billing
Provides the static libraries and samples
that allow we to integrate billing services in
our app with Google Play.
<sdk>/extras/goog
le/
Google Play
Licensing
Provides the static libraries and samples
that allow we to perform license
verification for our app when distributing
<sdk>/extras/goog
le/
14
with Google Play.
Table 2.1.1: Android SDK Available Packages
2.1.1.1.2 Java
Android uses Java programming language to develop applications on
Android platform. Therefore, it is necessary to install Java SDK (Java SE
Development Kit) inside the computer that will be used to develop Android
application. Java Platform, Standard Edition (Java SE) lets we develop and deploy
Java applications on desktops and servers, as well as in today's demanding embedded
environments. Java offers the rich user interface, performance, versatility, portability,
and security that today’s applications require [16].
2.1.1.1.3 NDK (Native Development Kit)
The NDK is a toolset that allows us to implement parts of our app using
native-code languages such as C and C++. For certain types of apps, this can be
helpful so that we may reuse existing code libraries written in these languages and
possibly increased performance [17].
Before downloading the NDK, we should understand that the NDK will not
benefit most apps. As a developer, we need to balance its benefits against its
drawbacks. Notably, using native code on Android generally does not result in a
noticeable performance improvement, but it always increases our app complexity. In
general, we should only use the NDK if it is essential to our app—never because we
simply prefer to program in C/C++ [17].
Typical good candidates for the NDK are self-contained, CPU-intensive
operations that don't allocate much memory, such as signal processing, physics
simulation, and so on. When examining whether or not we should develop in native
15
code, think about our requirements and see if the Android framework APIs provide
the functionality that we need [17].
To conclude, the NDK provides [18]:
• A set of tools and build files used to generate native code libraries from C
and C++ sources
• A way to embed the corresponding native libraries into application packages
files (.apks) that can be deployed on Android devices
• A set of native system headers and libraries that will be supported in all
future releases of the Android platform, starting from Android 1.5
• Documentation, samples, and tutorials [18]
2.1.1.1.4 C/C++
C is a programming language originally developed for developing the Unix
operating system. It is a low-level and powerful language, but it lacks many modern
and useful constructs. C++ is a newer language, based on C that adds many more
modern programming language features that make it easier to program than C [19].
Basically, C++ maintains all aspects of the C language, while providing new
features to programmers that make it easier to write useful and sophisticated
programs. [19]
For example, C++ makes it easier to manage memory and adds several
features to allow "object-oriented" programming and "generic" programming.
Basically, it makes it easier for programmers to stop thinking about the nitty-gritty
details of how the machine works and think about the problems they are trying to
solve [19].
C++ is a powerful general-purpose programming language. It can be used to
create small programs or large applications. It can be used to make CGI scripts or
16
console-only DOS programs. C++ allows we to create programs to do almost
anything we need to do. The creator of C++, Bjarne Stroustrup, has put together a
partial list of applications written in C++ [19].
2.1.1.1.5 Application Components [20]
Application components are the essential building blocks of an Android
application. Each component is a different point through which the system can enter
our application. Not all components are actual entry points for the user and some
depend on each other, but each one exists as its own entity and plays a specific
role—each one is a unique building block that helps define our application's overall
behavior [21].
A unique aspect of the Android system design is that any application can start
another application’s component. For example, if we want the user to capture a photo
with the device camera, there's probably another application that does that and our
application can use it, instead of developing an activity to capture a photo ourselves.
We don't need to incorporate or even link to the code from the camera application.
Instead, we can simply start the activity in the camera application that captures a
photo. When complete, the photo is even returned to our application so we can use it.
To the user, it seems as if the camera is actually a part of our application [21].
When the system starts a component, it starts the process for that application
(if it's not already running) and instantiates the classes needed for the component.
For example, if our application starts the activity in the camera application that
captures a photo, that activity runs in the process that belongs to the camera
application, not in our application's process. Therefore, unlike applications on most
other systems, Android applications don't have a single entry point (there's no main()
function, for example) [21].
17
Because the system runs each application in a separate process with file
permissions that restrict access to other applications, our application cannot directly
activate a component from another application. The Android system, however, can.
So, to activate a component in another application, we must deliver a message to the
system that specifies our intent to start a particular component. The system then
activates the component for us [21].
There are four different types of application components. Each type serves a
distinct purpose and has a distinct lifecycle that defines how the component is
created and destroyed [21].
Here are the four types of application components [21]:
2.1.1.1.5.1 Activities [21]
An activity represents a single screen with a user interface. For example, an
email application might have one activity that shows a list of new emails, another
activity to compose an email, and another activity for reading emails. Although the
activities work together to form a cohesive user experience in the email application,
each one is independent of the others. As such, a different application can start any
one of these activities (if the email application allows it). For example, a camera
application can start the activity in the email application that composes new mail, in
order for the user to share a picture.
An activity is implemented as a subclass of Activity.
2.1.1.1.5.2 Services [21]
A service is a component that runs in the background to perform long-
running operations or to perform work for remote processes. A service does not
provide a user interface. For example, a service might play music in the background
while the user is in a different application, or it might fetch data over the network
18
without blocking user interaction with an activity. Another component, such as an
activity, can start the service and let it run or bind to it in order to interact with it.
A service is implemented as a subclass of Service.
2.1.1.1.5.3 Content Providers [21]
A content provider manages a shared set of application data. We can store the
data in the file system, an SQLite database, on the web, or any other persistent
storage location our application can access. Through the content provider, other
applications can query or even modify the data (if the content provider allows it). For
example, the Android system provides a content provider that manages the user's
contact information. As such, any application with the proper permissions can query
part of the content provider (such as ContactsContract.Data) to read and write
information about a particular person.
Content providers are also useful for reading and writing data that is private
to our application and not shared. For example, the Note Pad sample application uses
a content provider to save notes.
A content provider is implemented as a subclass of ContentProvider and
must implement a standard set of APIs that enable other applications to perform
transactions.
2.1.1.1.5.4 Broadcast Receivers [21]
A broadcast receiver is a component that responds to system-wide broadcast
announcements. Many broadcasts originate from the system—for example, a
broadcast announcing that the screen has turned off, the battery is low, or a picture
was captured. Applications can also initiate broadcasts—for example, to let other
applications know that some data has been downloaded to the device and is available
for them to use. Although broadcast receivers don't display a user interface, they may
19
create a status bar notification to alert the user when a broadcast event occurs. More
commonly, though, a broadcast receiver is just a "gateway" to other components and
is intended to do a very minimal amount of work. For instance, it might initiate a
service to perform some work based on the event.
A broadcast receiver is implemented as a subclass of BroadcastReceiver and
each broadcast is delivered as an Intent object.
2.1.1.2 Hardware
When building a mobile application, it's important that we always test our
application on a real device before releasing it to users. This page describes how to
set up our development environment and Android-powered device for testing and
debugging on the device [22].
We can use any Android-powered device as an environment for running,
debugging, and testing our applications. The tools included in the SDK make it easy
to install and run our application on the device each time we compile. We can install
our application on the device directly from Eclipse or from the command line with
ADB. If we don't yet have a device, check with the service providers in our area to
determine which Android-powered devices are available [22].
2.1.1.2.1 Minimum Specifications
While Android is designed to support a wide variety of hardware platforms
and configurations, this section provides recommended minimum device
requirements [23].
Feature Minimum
Requirement Notes
20
Chipset ARM-based
For the first release, Android is primarily
targeted towards mobile handsets and
portions of the platform, such as Dalvik
VM graphics processing, currently
assume an ARM architecture.
Memory 128 MB RAM; 256
MB Flash External
Android can boot and run in
configurations with less memory, but it
isn't recommended.
Storage Mini or Micro SD Not necessary for basic bring up, but
recommended.
Primary
Display HVGA required
The current Android interface targets a
touch-based HVGA resolution display
with a touch-interface no smaller than 2.8
inches in size. However, smaller displays
will suffice for initial porting.
Navigation
Keys
5-way navigation with
5 application keys,
power, camera and
volume controls
Camera 2MP CMOS Not required for basic bring up.
USB Standard mini-B USB
interface
Android uses the USB interface for
flashing the device system images and
21
debugging a running device.
Bluetooth 1.2 or 2.0 Not required for initial bring up.
Table 2.1.2: Android Device Minimum Requirements
If available, our Android device can also benefit from the following optional
device characteristics [23]:
• QWERTY keyboard
• WiFi
• GPS
2.1.2 User Interface
User Interface, abbreviated UI, is the junction between a user and a computer
program. An interface is a set of commands or menus through which a user
communicates with a program. A command-driven interface is one in which we enter
commands. A menu-driven interface is one in which we select command choices
from various menus displayed on the screen [24].
The user interface is one of the most important parts of any program because
it determines how easily we can make the program do what we want. A powerful
program with a poorly designed user interface has little value [24].
2.1.2.1 Graphical User Interface
Graphical User Interface, abbreviated GUI, is a program interface that takes
advantage of the computer's graphics capabilities to make the program easier to use.
Well-designed graphical user interfaces can free the user from learning complex
command languages. Graphical user interfaces, such as Microsoft Windows and the
one used by the Apple Macintosh, feature the following basic components [25]:
22
• Pointer: A symbol that appears on the display screen and that we move to
select objects and commands. Usually, the pointer appears as a small angled
arrow. Text -processing applications, however, use an I-beam pointer that is
shaped like a capital I.
• Pointing device: A device, such as a mouse or trackball that enables us to
select objects on the display screen.
• Icons: Small pictures that represent commands, files, or windows. By moving
the pointer to the icon and pressing a mouse button, we can execute a
command or convert the icon into a window. We can also move the icons
around the display screen as if they were real objects on our desk.
• Desktop: The area on the display screen where icons are grouped is often
referred to as the desktop because the icons are intended to represent real
objects on a real desktop.
• Windows: We can divide the screen into different areas. In each window, we
can run a different program or display a different file. We can move windows
around the display screen, and change their shape and size at will.
• Menus: Most graphical user interfaces let us execute commands by selecting
a choice from a menu.
2.1.2.2 User Interface Control
An element of a graphical user interface such as a button, menu, list box, text
window and dialog box [26].
2.1.3 Human-Computer Interaction
Human-computer interaction (HCI) is an area of research and practice that
emerged in the early 1980s, initially as a specialty area in computer science
23
embracing cognitive science and human factors engineering. HCI has expanded
rapidly and steadily for three decades, attracting professionals from many other
disciplines and incorporating diverse concepts and approaches. To a considerable
extent, HCI now aggregates a collection of semi-autonomous fields of research and
practice in human-centered informatics. However, the continuing synthesis of
disparate conceptions and approaches to science and practice in HCI has produced a
dramatic example of how different epistemologies and paradigms can be reconciled
and integrated in a vibrant and productive intellectual project [27].
In interaction with a computer, the human input is the data output by the
computer vice versa. Input in humans occurs mainly through the senses and output
through the motor controls of the effectors. Vision, hearing and touch are the most
important senses in HCI. The fingers, voice, eyes, head and body position are the
primary effectors [28].
2.1.3.1 Eye Tracking
Eye tracking is a research method that determines what part of an
advertisement consumers look at, by tracking the pattern of their eye movements
[29].
In the simplest terms, eye tracking is the measurement of eye activity. The
concept is basic, but the process and interpretation can be quite complex [30].
Eye tracking data is collected using either a remote or head-mounted ‘eye
tracker’ connected to a computer. While there are many different types of non-
intrusive eye trackers, they generally include two common components: a light
source and a camera. The light source (usually infrared) is directed toward the eye.
The camera tracks the reflection of the light source along with visible ocular features
such as the pupil. This data is used to extrapolate the rotation of the eye and
24
ultimately the direction of gaze. The eye tracker also detects additional information
such as blink frequency and changes in pupil diameter. The aggregated data is
written to a file that is compatible with eye-tracking analysis software. There are
many different methods of exploring eye data. The most common is to analyze the
visual path of one or more participants across an interface such as a computer screen.
Each eye data observation is translated into a set of pixel coordinates. From there, the
presence or absence of eye data points in different screen areas can be examined.
This type of analysis is used to determine which features are seen, when a particular
feature captures attention, how quickly the eye moves, what content is overlooked
and virtually any other gaze-related question. Beyond the analysis of visual attention,
eye data can be examined to measure the cognitive state and workload of a
participant [30].
2.1.3.2 Eye Gaze
Eye Gaze allows us to control the computer by looking at it, while wearing
special glasses, head-mounted boxes etc. By tracking a laser beam’s reflection in the
eye, the direction in which the eye is looking is determined. The system needs to be
tuned and is very expensive, but also very accurate [28].
2.1.3.3 Blink Control
Blink Control is an assistive technology to assist people to overcome their
disability by providing other means of control and communication [31].
2.1.4 Input Devices
An input device is any hardware device that sends data to the computer,
without any input devices, a computer would only be a display device and not allow
users to interact with it, much like a TV [32].
25
2.1.4.1 Front-Facing Camera
Front-facing camera is a camera on the front of the phone, facing the user.
This enables two-way video chat, and is also useful for capturing self-portraits. The
resolution and quality of the front-facing camera is often inferior to the rear, main
camera [33].
2.1.5 Computer Vision
Computer Vision, abbreviated CV, is the process of taking a live raster image
(represented as a matrix of numeric values) and interpreting it into higher-level data
abstractions and symbolic objects (such as humans, limbs, faces, props, poses,
gestures, etc.). In this way, Computer Vision is the inverse of Computer Graphics (in
a similar fashion, the camera is the dual of the projector) [34].
Further, computer vision is the area of research that deals with computers
abstracting high-level data constructs from video-based data streams. By applying
intelligence to the signal, the purpose is to get the machine to recognize objects,
humans, poses, and gestures with a decent degree of accuracy. The broader field is
called Machine Vision. Computer vision is a subset of Computer Science. Aligned
fields include Robotics, Signal Processing, and Artificial Intelligence [34].
2.1.5.1 OpenCV
OpenCV (Open Source Computer Vision) is a library of programming
functions mainly aimed at real-time computer vision application development.
Originally developed by Intel, it takes particular advantage of the Intel Performance
Primitives (IPP) libraries [34].
OpenCV has a modular structure, which means that the package includes
several shared or static libraries. The following modules are available [35]:
26
• core - a compact module defining basic data structures, including the dense
multi-dimensional array Mat and basic functions used by all other modules.
• imgproc - an image-processing module that includes linear and non-linear
image filtering, geometrical image transformations (resize, affine and
perspective warping, generic table-based remapping), color space conversion,
histograms, and so on.
• video - a video analysis module that includes motion estimation, background
subtraction, and object tracking algorithms.
• calib3d - basic multiple-view geometry algorithms, single and stereo camera
calibration, object pose estimation, stereo correspondence algorithms, and
elements of 3D reconstruction.
• features2d - salient feature detectors, descriptors, and descriptor matchers.
• objdetect - detection of objects and instances of the predefined classes (for
example, faces, eyes, mugs, people, cars, and so on).
• highgui - an easy-to-use interface to video capturing, image and video
codecs, as well as simple UI capabilities.
• gpu - GPU-accelerated algorithms from different OpenCV modules.
• ... Some other helper modules, such as FLANN and Google test wrappers,
Python bindings, and others.
2.1.5.2 OpenCV4Android
OpenCV Manager is an Android service targeted to manage OpenCV library
binaries on end users devices. It allows sharing the OpenCV dynamic libraries of
different versions between applications on the same device.
The Manager provides the following benefits [36]:
27
• Less memory usage. All apps use the same binaries from service and do not
keep native libs inside themselves;
• Hardware specific optimizations for all supported platforms;
• Trusted OpenCV library source. All packages with OpenCV are published on
Google Play service;
• Regular updates and bug fixes;
2.1.6 Display Calibration
Display calibration is the process of using a display’s setting controls to
adjust the on-screen image so that it matches the original source content. This allows
the calibrated display to accurately reproduce the color and clarity of the video
signals from any source device, be it a computer, digital signage player,
cable/satellite box, or Blu-Ray player. Whether it is to optimize a TV display for the
best image quality possible or to ensure that we are working on a properly profiled
monitor, display calibration removes any doubt that the digital content we are
viewing and/or creating is accurately represented on-screen [37].
2.2 Theoretical Frameworks
The research methodology that the author will use in developing the
application is by combining algorithm from available sources to create stable, fast
and effective algorithm to detect user’s eyes and gaze. An extensive literature study
of eye tracking and eye-gaze will also be done to support the author in finding the
right and exact solution. Available sources are including Internet journals, websites,
governmental paper, scientific papers, and books. The author will also ask the thesis
supervisor which is already an expert in computer vision field.
28
Below are the development steps in developing Android Eye-Tracking Based
User Interface Control Application, they are:
• Analyze similar applications and possible solutions which are being
developed
• Create overview the existing system of eye-tracking application, Android
native application with pointer, eye-tracking algorithm, and blink control
algorithm from trusted and detailed source
• Combine and improve existing algorithms into own customized efficient and
effective algorithms
• Compare the differences, both advantages and disadvantages, between
existing system with the new solution
• Design the application prototype
• Implement the application prototype
• Test the application prototype
• Evaluate the prototype