media:contexual_inquiry_and_task_analysis.doc

Contextual Inquiry and Task Analysis for Group 3

Group members

Jeremy Syn - Worked on various parts (Problem and Solution Overview, Task Analysis Questions, Storyboard

Sketching)

Michael So - Did the list of 6 tasks, did storyboard sketching, and took part in interviews. Also, did the footnote.

Henry Su - Did the Contextual Inquiry - Interview Descriptions section, and one of the story boards, was there for

the interviews (co-conducted 2 of them)

How-Kil "Eric" Chung - Was there for interviews (like everyone was), also worked on Description of Users,

Analysis of Approach, and parts of Interface Description.

Everyone worked a bit on each others' parts.

Description of Users

Megaman56 - This user is a junior in college. He is a psychology major, who is interested in the major to help in

the topic of social situations that come about, especially ones based partially as a result of personal stubbornness.

He was born in the United States, with English being his first language, and is concurrently learning Mandarin.

His likes include listening to music and watching movies. His tech level is that of being familiar with computing

for everyday use (such as email) and that for research (for college, for example). He has a somewhat basic usage

plan for his cellular phone: he uses it as a phone and for texting. He doesn't use it for games, web surfing, and

doesn't use it for much picture taking, either. This user had gone on vacation to Europe (Italy and France) so this

user has a tourist point of view, which is one of our key target users.

KirbySuperstar1995 - This user is a third year in college, dual majoring in Math and Music. He was born in the

United States, so he uses English as his primary language. He did learn Spanish in high school, but doesn't have

a strong grasp of it since it was just a rudimentary high school course taken such a long time ago. He doesn't

know a secondary language besides his high school Spanish. He has many family members living in foreign

countries such as France and Spain. Some of his likes are music, math, women, and travel. He feels

uncomfortable being American when visiting overseas countries, because he feels out of place and he gives off an

obvious touristy aura. He has a pretty good background in terms of technological usability and even knows a bit

of programming (C++). He owns an average cell phone with nothing special in particular. He has visited his

family in France before so his experiences will provide us with significant information.

Charmander♥ - This user is a senior in college. The tech level of this person is higher than a normal user. He

does not keep up with the latest gadgets and technology but has done stuff like making Facebook applications. He

has also grown up bilingual and was born in Hong Kong. His hobbies include watching movies, TV, and playing

video games. As a user, he dislikes things such as repeating oneself. For example, when people try to say

something, they sometimes have to repeat themselves because they are not understood. Going along with this, in

general, he dislikes communication issues, such when people who don't understand a language try and use a

dictionary; not only is it slow and disruptive but it is often wrong as well, because often the first definition/word

in the dictionary will be used without regard to meaning. The rationale for choosing this user (along with the fact

that he was someone we had access too), is that he is someone who has dealt with (and deals with)

communication issues on a more frequent basis and for a larger variety of contexts, as opposed to one who is

simply traveling. This user gives us a bit of a fresh look at the topic.

Footnote:

Our Target Users in General:We have a diverse set of potential target users for our language translation application. The following is a list of our target users and our rationale behind them:

ESL (English as a Second Language) people: Because ESL people are learning a new language (English to be exact), so having a translation device will ease their learning experience. Our translation application can help them in their studies. They can take pictures of words, or sentences they do not know, which appear in places such as textbooks, chalk boards, or shirts, and get a translation quickly and easily.

Vacationers (American visiting some other country, or even visiting a Chinatown, etc): Because a vacationer may not under-stand the foreign country's language. So to understand the foreign language, a vacationer needs a translation device.

English speaker/reader who wants to read a restaurant menu written solely in a foreign language (common in Chinatown as well): Because they cannot read the foreign language printed on the menu, a translation device will make those words under-standable and they will be able to successfully order their preferred menu item(s).

People who watch foreign drama, but don't necessarily understand the language completely: Because those people do not un-derstand the foreign words coming out of the actors' mouths. Usually if those people do not understand from listening to the words, they put on the subtitles. But if the only subtitles available are foreign also, a translation device of the subtitles would be awesome.

The traditional translation device user (for various purposes): Because obviously those translation devices users are in need of translations.

Everyday people who want to communicate in any area with foreign language (it doesn't even have to be specific areas, like Chinatown, but even within a city (like Charmander♥).

Problem and Solution Overview

The problem that some people have when traveling overseas is that they do not have a strong grasp of the native

language. They may feel really lost and confused without the availability of proper language aides. Without the

ability to read signs or communicate with the natives, one may feel insecure traveling in that kind of

environment. This problem could even extend to visiting local areas and shops where the locals or workers do

not speak fluent English. When looking at a menu with no helpful translations next to it, the person won't know

what to order. The solution we are proposing is a mobile application that allows you to take a picture of some

text, such as menu items or signs, and even symbolic signs such as road signs, and the application will translate it

into a language of your choice. So say for example, you travel to China; you can take a picture of a sign and your

mobile application will tell you what the sign means for your convenience. This solution is an approach to create

a comfortable way to travel in an unknown, foreign area.

Contextual Inquiry - Interview Descriptions:

First of all, due to the nature of the problem our application solves, it was not possible to do true contextual

interviews nor get to all of our potential target users. The reason is that we would either need to find a person

who cannot read English but can read another language (perhaps a very recent immigrant), or, we would have to

take an interviewee to a foreign country (where the interviewee does not know the language). Clearly, setting up

such contextual interviews in a matter of a week or two is nearly impossible, not to mention the capital required,

if we were to go to a foreign country. Thus, instead, we used the "recall" strategy mentioned in the article, to give

our interviews a contextual flavor. This means that we try to avoid having the interviewee summarize their

experiences with foreign languages, but instead ask them to talk about specific instances, while we ask questions

that would help them recall the finer details. Also, because these interviews were not truly contextual, they were

done in restaurants over a meal. All three interviewees were friends of different group members, so to make the

interviews more objective, we had the people who didn't know the interviewee conduct the interview.

Some similarities among the interviewees were that none of them were seriously dependent on translation

devices or human translators. That is, they either only need them during their travels (pleasure, not business), or

they were already bilingual to some extent. However, these "light-duty" users are still an important subset of the

user group, because we expect that many of our users would either use the application for recreational purposes

or for infrequent references. Another shared aspect among the three interviewees was that all of them are college

students, albeit from different technological backgrounds. Because younger people tend to have less trouble

learning new languages, similar interviews with older subjects may reveal additional information.

The first interviewee, MegaMan56, talked about his vacation to France and Italy. He went with friends who

did not speak or read French or Italian, but they had a tour guide. In one instance, the tour guide let them loose.

MegaMan56 and his friends wanted to go to the Notre Dame Cathedral, but did not know how to get there. He

asked a street officer, who was in a bad mood because she was being pestered by English-speakers, and she did

not know English very well. He noted that to avoid this uncomfortable situation, he would have needed to

research the information before leaving, as many signs were not in English.

The second interviewee, KirbySuperStar1995, talked about his trip to France with his family, to celebrate a

wedding of an extended family member. Only one of his family members was bilingual in English and France,

so he sometimes had to figure things out on his own. For example, he went to a barbershop where the hairdresser

did not speak any English at all. He noted that because no translator was present, hand gestures proved to be

useful. He also mentioned that it would be nice if he could write down his intentions, and have it translated

automatically to French. However, he didn't carry around a translation device or dictionary, because firstly, he

didn't have one, and secondly, he didn't feel the need to get one. He felt that it would only draw negative

attention because it would make him seem more tourist-like to the local French. He did, however, carry around

some "learn-it-now" CD's, to quickly learn some commonly used French phrases. Upon reflecting on his travel,

KirbySuperStar1995 realized that different situations varied in difficulty. For instance, eating at a restaurant was

relatively easy because most waiters knew some English, and you can always point and look at pictures. Riding a

taxi was also relatively trouble-free because he could just point to a map and gesture, "go here". However, using

the subway system was very difficult, because there were no English translations, and the maps were pretty

complicated to begin with. Lastly, the interviewee mentioned that as a whole, it is feasible to travel around a big

foreign city without a translator (human or machine), but in the more rural areas, where there is less bilingualism,

a translator may be necessary.

The third interviewee, Charmander♥, talked about several experiences dealing with the language barrier. In

one instance, he wanted to ask a restaurant owner for some information, to help him build a Facebook

application. However, the restaurant owner did not know English, and so just shooed his team out. Clearly, a

translator, be it person or device, would have prevented this outcome. Charmander♥ also talked about his

immigration experience to the US from Hong Kong, when he was in elementary school. Although Hong Kong

was bilingual as well, there were many words Americans used that weren't used in Hong Kong. This situation

was most often encountered when he was reading textbooks. The interviewee thus had to carry around an

electronic translation device, to translate those new English words to Chinese. Unfortunately, the device was only

capable of translating one word at a time, so sometimes, the meaning of a sentence still isn't clear because by

translating a word at a time, the translator could not put the words in context. At times, the teacher was helpful

and had other bilingual students translate for him. However, he noted that the student translators (and even adult

translators, for that matter), were imperfect, as they often cut out "non-essential" words for the sake of

conciseness, at the expense of an authentic translation. As he grew up, oftentimes he found himself in the

opposite situation: now, he often has to translate from Chinese to English. In this case, the classical electronic

translator would not be as helpful--unless you knew how to type in the Chinese characters, perhaps using pinyin

or zhuyin. He circumvented this problem by guessing the [English] answer, and checking if the Chinese

translation given matched the character in question.

List of Tasks

Easy

* Choosing from a variety of possible translations - There are many instances where a word can have multiple

translations in another language. The ability to view these multiple translations is supported whereas certain

other translations devices give the user only one translation.

* Selecting language options - This encompasses not only what languages you want to translate to and from, but

also the ability to customize what language dictionaries they want in their phone. This task is supported because

a user usually has languages he or she wants translations from and languages the user will never bother to have

translations from. For instance, a user may frequently visit Japan and China, so the user will most frequently use

the Chinese and Japanese translations provided by our application. Having a French dictionary would therefore

be useless to that particular user. But if the user happens to want to take a vacation to France, the user has the

ability to download a French dictionary into their phone. And when the vacation is over, the user can remove that

dictionary.

Moderate

* Provide meaning for symbols and signs - In foreign areas, such as overseas countries and shopping malls,

there are many unique and foreign pictorial signs that prove to be unfamiliar to the user. Our application will be

able to communicate to the user what these foreign pictures mean, with a short verbal description. The signs do

not even need to be in a foreign country. It could be a sign used in the user's native country, but the user does not

know what the sign means. The user will finally know what it means thanks to our translator application.

* Save result for future reference - There may be words that the user frequently sees or uses, so saving those

words would eliminate the need and hassle of repeatedly translating the words. For example, if the user has an

unfortunate small bladder such that the user goes to the bathroom often, and if the user has bad memory such that

the user will not remember the translation for bathroom even if the user did a translation of it already, having

saved a translation of a bathroom sign is helpful. This is also useful for users who wish to learn the foreign

language; the saved results feature can be used like flash cards.

Hard

* Selecting region of picture to translate (should not be necessary for 75% of cases) - For example, there

could be many signs next to each other, but the user only wants to translate a specific sign. Our application

supports the ability to take a picture, and then select the desired region in the scene for translation, eliminating the

undesirable parts of the scene.

* Advanced options such as multi-shot mode, where every x seconds, a picture is taken (may be good for

TV shows) - For example, the user is watching a foreign show with foreign subtitles. Chinese dramas, for

instance, have Chinese subtitles on screen and the user watching them does not know how to read or understand

Chinese. The foreign subtitles usually stay on the screen for several seconds, and then a new subtitle comes on

the screen, and so on. With the multi-shot mode, the application will automatically take multiple pictures in

succession every x seconds (x being specified by the user). So the user with multi-shot mode will be easily

having pictures of the different subtitles that change about every x seconds. And with those pictures, the user will

have the translations done via our application.

Task Analysis Questions

1. Who is going to use the system?

The users are people who travel to overseas countries or other areas in general whose main language is

foreign to them. This system could also be useful for those who want to start learning a new language.

All of the users we interviewed have some sort of experience in one of these areas.

2. What tasks do they now perform?

The tasks the users perform now are utilizing dictionaries, electronic translation devices, and human

translators.

3. What tasks are desired?

Well, the main task that is desired is being able to read languages that you would otherwise not be able to

read without having learned it. The point of this system is to remove that language barrier and to allow

the user to communicate with the foreign environment with ease.

4. How are the tasks learned?

The tasks will be learned through a simple instructions manual, which will be pictorially based to convey

the usage easier to the user. There will be brief short descriptions on each step of the process to go with

the picture to provide only the most essential information to the user. These tasks should not be difficult

to execute.

5. Where are the tasks performed?

These tasks are performed in areas where the user is not familiar with the surrounding language. When

the users goes to a foreign land, such as in the case of KirbySuperstar1995 when he visited France,

whenever he sees something he cannot read, he can pull out his phone, capture the image, and then

translate it to English so that he can finally read it.

6. What's the relationship between user and data?

The user can save the images and translations that they have taken so that they can look at them again at a

later time.

7. What other tools does the user have?

The users are also able to take multi-shots of images to catch fast changing texts such as certain billboard

signs or words on a television screen. The user can also pick their language options so that more than one

language is available to them.

8. How do users communicate with each other?

Users will be able to send other users the images that they take that may also come in use for them as

well. For example, if traveling with a fellow family member but you temporary separate from them, you

can send them your images so that the next time they come around they'll already have the translation.

9. How often are the tasks performed?

The tasks are performed whenever the user feels like he wants to know what something means but he can't

read it because of the language barrier.

10. What are the time constraints on the task?

The user won't always have a lot of time on their hands. They may not be able to stay there all day and so

the user would have to take pictures quickly and move on in order to experience all the events that the

area has to offer.

11. What happens when things go wrong?

If things go wrong, the user can just retake the picture or refocus on a part of the picture or look up an

alternative definition that would make the context clearer.

Interface Design

Functionality summary

There are a lot of things we can do with the pictures. First, you can take a picture. That implies that you can

store the picture and organize/manage them as well. You can send the pictures to others, which will be useful

especially with translations. You can also view the pictures in different ways (zoom functions, highlight, and

selection) which will be useful, especially when specifying where to translate and if you want the translation

directly on the picture, where to put the text on. Finally, you can extract text or specific pictures (like signs) from

the image (if applicable).

Text can also be manipulated in this program. Any text keyed in or taken from a picture can be translated.

Several translations can be available as well, especially if there is not enough context. The translation can happen

with locally or remotely stored dictionaries (the cell phone should act as a first resource and also as a cache).

In the last two things, the "to" and "from" languages should be easily changeable for the specific pictures/texts.

There's also an area outside of the main functionality that we got from contextual interviews, which is more

like a tourist guidebook but more detailed. It has some common/basic phrases and monuments/attractions like

most tour books do but the common phrases are vocalized. Also, we also have a database of signs and important

colors and features (for example, in Japan, their stop signs look different and their color for stop is blue instead of

red). We also will have a description about the culture and how people in the country interact, so a person can be

more assimilated and less intrusive (not be like the typical annoying tourist).

We also have a "multi-shot" ability, where it will automatically take a shot at every so and so time interval.

This can be helpful when text changes (like billboards or movies) and also useful for walking around so you have

some context. You can play it back like a movie, complete with translations (where the translations are, such as

on or below original text), can be changed in a separate "options" menu.

Using the map functionality in Android, you can also keep a pin on the map that corresponds to the

picture/movie, so you know where the picture was taken, and also it will help anyone you send the picture to.

Options menu is also available. From here, you can change such things such as where the translated text should

appear on default (like directly on the original part of the picture, to the side a little, or in a separate box/text file).

Language options are also available, like which languages are available directly from the phone (since

dictionaries take space, it would be nicer to have only some that the user uses). Naturally, some other things like

default to and from languages should be in here as well (it should auto-detect the language of Android on first run

and also the "from language" depending on GPS).

User Interface Description/Sketches

3 scenarios of example tasks/sketches

Easy - Choosing from a variety of possible translations

This scenario shows an example of when using the multiple translations would be useful. The user is in a foreign

library and looks at a sign above the bookshelves. He doesn't know what it means so he pulls out his mobile

phone and snaps a picture of it. The mobile application then translates the sign into English for the user to read,

but the translation doesn't make any sense to him. He then uses the application's multi translation function and

finds the next available translation for it, this time making sense contextually.

Moderate - Save result for future reference

This scenario first shows a user using the application to translate a Chinese word ("male", in this case, referring to

the bathroom). He decides to save the word that is translated. He then shops around the mall, and twenty

minutes later, needed to use the bathroom again. He is confused with all the different Chinese signs everywhere,

so goes back to the application to look at his "Favorite translations" list. On it, he finds "male", and clicks it, and

out comes the original picture with the Chinese word. He recognizes it on a door, and goes happily runs there.

Hard - Selecting region of picture to translate

the scenario depicts a user doing the task of selecting a region of the picture to translate. So in the first frame,

there is the user and a bunch of signs in front of him. In the second frame, the user wants to translate one of the

signs. So in the third frame, the user takes out his cell phone and takes a picture of the signs. Then in the fourth

frame, he uses the touch screen cropping feature to select the sign he wants to translate. When he has selected the

desired sign, he hits translate. So in the fifth frame, the user gets a translation of what the sign says. And finally

in the last frame, the user has a reaction.

Analysis of Approach

Android affords specific technologies, not to mention specific features as well, that will be particularly

beneficial to our project. The first is naturally going to be the ability to take relatively detailed pictures. In order

to recognize text from a picture, we should be able to take decent pictures so our text recognition program will be

able to decipher the text (especially between text and non-text pixels). It should also have the capabilities to use

and store said picture, and Android affords a certain level of RAM, nonvolatile memory (for storage) and

computing speed necessary for somewhat intensive mobile applications like this. We can use a touch screen and a

keyboard and other inputs of that nature. This will make inputting text (for translation, for specifying which part

of the picture/screen to look at, option selection and navigating databases/dictionaries, etc) easier, as opposed to

just the typical cell phone input, which would force us to find less optimal ways of navigating. Android also

affords Internet access, so we can use the Internet access as a way to store data (like a database of

words/dictionaries, etc), since Android also affords not as much storage space (being mobile). This will also

allow us to be able to keep used data and other such things on the phone itself (for fast access and as a memory

saving technique) but also allow us to keep a large database of extensive and complete information. Android also

affords a Linux kernel so any open source code will be able to be recompiled to work with the mobile device.

Also, any other program will also work with WINE, be it Apple or Microsoft based. This is particularly useful for

finding translation programs and image to text conversion programs. This will also give us leeway with how we

use the code and programs that we end up using. Internet will also be useful for accessing databases online and

parsing web pages (like getting several different translations by using different services). Internet is also

imperative to sharing information with others, whether through personal contacts or general help of other users of

the application (user generated content and other such things).

There are plenty of other devices that are used for translation purposes. The biggest issue with those is that

they often don't have a camera or have poor camera quality. Since this is the central task of our application, this is

a huge issue as to why other devices are not suited. And while other devices will often have better storage,

computing, and human interaction abilities (such as touch screen and bigger screens), they will often not be

portable, which severely decrease the mobility of the device. Our other main competitor, the traditional machine

translator, has been described as something that costs a lot for something that doesn't even work. The devices will

often translate word by word or common phrases without attention to grammar, sentence structure or content.

Also, the translations will often be wrong without the ability to analyze the words. There is no recourse or

multiple choices in these situations either. They also don't have the benefit of a large database that is readily

available through wireless communication and the Internet that follows. Finally, human translators are an issue as

well; although the most reliable in both image to text conversion and in translation, they often introduce some lost

in translation issues (the translated phrase will work but some of the meaning in diction may be lost in

translation). Also, they require constant upkeep (both emotional and economical) and it reduces independence for

the user. As for the issue of PDAs, the Internet issue may be prohibitively costly, as well as there being a keypad

issue for issuing commands (although it still has the touch screen). However, the issues aren't that different since

PDAs and smart phones are converging anyway.

The approach we are taking is good for a few reasons. Firstly, having this application on a small mobile device

is good since people would rather carry less than more, with the same functionality. Also, the camera will put less

stress on the user to decipher the text they see and try and type or draw it out on the screen (drawing is a lot less

reliable anyway with the current technology). There's also the issue of the user being unfamiliar with the

character set of the language in question, so they can't type it in. Being able to automate a lot of functions that

would put stress on the human is important. Touch screen will make certain on screen options and such more

intuitive, though.

There are cons to the approach as well however. Even if we use commercial code, the language translation will

never be 100% perfect with our current technology (or at least to the level of a human translator). Speed may also

be slightly an issue (we haven't tested it out yet, however). This is also potentially costly.

media:contexual_inquiry_and_task_analysis.doc

Documents

normal user

high school spanish

secondary language

primary language

everyday use

parts of interface description

description of users

task analysis questions