human-centered computing

HUMAN-CENTERED COMPUTING

Frank ShipmanProfessor, Department of Computer Science and Engineering

Associate Director, Center for the Study of Digital LibrariesTexas A&M University

Outline Short discussion of research area Supporting access to sign language

videoObservations of potential user community

causes redefinition of the problem Multi-application user interest modeling

Iterative design moving from concept to relatively complete system

Research “Area”

Many interests Multimedia New Media Computers and Education Computers and Design Software Engineering Computer-Supported

Cooperative Work Human-Computer

Interaction Knowledge-Based Systems

Best descriptions I have come up with:

Cooperative Problem Solving Systems Systems where humans &

computers cooperatively solve problems (humans are part of overall system)

Intelligent User Interfaces Interactive systems that

process information in non-trivial ways

AIHCIIR

MM

What is human-centered computing? Developing software or computational

techniques with a deep understanding of the human activities they will support

ImplicationsMost often need to study the human activity

before designing the softwareDesign may be (likely will be) a cooperative

problem solving system rather than a software system

Cooperative Problem Solving System What is a cooperative problem solving

system?A system that includes human and software

components to perform a task or solve a problem

ImplicationsTake advantage of the asymmetry of

partners in system designEvaluation of overall system involves

humans

Supporting Access toSign Language Video

First Example:

7

Sharing Sign Language Video Opportunity

Cameras in laptops and attached to computers enable easy capture of sign language video

Video sharing sites (e.g. YouTube) allow the publication of such expressions

PracticePointers to the videos are passed around in other

media (e.g. email, Facebook)Some sites specifically support the sign language

community

8

Sharing Sign Language Video Locating a sign language video on a particular

topic is still difficult The community-specific sites have limited

collectionsPeople must upload to the site orMust add a pointer for each video to the site

Locating desired videos within the large video sharing sites rely on metadata (e.g. tags)Tags must be accurately applied indicating both

the language and the topic

How Good is Text-based Search? Search for sign language discussions of the

top 10 news queries for 2011 from Yahoo! Queries performed with the addition of “ASL”

and “sign language”

In Sign Language

Not in Sign Language

Total

On Topic 50 (45.5%) 27 (24.5%) 77 (70%)

Not on Topic 24 (21.8%) 9 (8.2%) 33 (30%)

Total 74 (67.3%) 36 (32.7%) 110 (100%)

Duarte, Gutierrez-Osuna, and Shipman, Texas A&M University

10

Why Tags Are Not Enough Consider results from the first page of

results for the query “sign language” Tags are ambiguous

In sign language vs. about sign language

Different meanings of sign language

Sign language as a song title

11

Automatic Identification of SL Video Our approach is to develop a technique

that can automatically identify if a video is in sign language

To run on a site the size of YouTubeShould be accurate enough to be run

without human verification of resultsShould be efficient enough to be run during

video upload without significant extra resources

12

What is Sign Language Video We decided to scope the problem by focusing

on the equivalent of sign language documentsRecorded by an individual

with the intent of being watched

What we are not trying to identify (yet)Videos of sign language

conversationsSign language translations

13

Related and Prior Work Work on sign language recognition

Recognizing what is being said in sign language

Often assumes the video is in sign languageToo heavyweight for our purpose

Detecting sign languageRecognizing when a person starts signing for

more efficient resource utilizationNot designed to work on likely false positives

14

Designing a SL-Video Classifier Our classifier

processes a randomly selected 1 minute segment from the middle of the video

returns a yes/no decision being a SL video Design method

Use standard video processing techniquesFive video features selected based on their

expected relation to SL videoTest classifiers provided with one or more of the

features

15

Video Processing Background Modeling

Convert to greyscaleDynamic model (to cope with changes in signer

body position and lighting)○ BPt = .96 * BP(t-1) + .04 P

Foreground object detectionPixels different from background model by more

than a threshold are foreground pixelsSpatial filter removes regions of foreground

pixels smaller than a minimum threshold Face location to determine position of

foreground relative to the face Videos without a single main face are not

considered as potential SL videos

16

Five Visual Features VF1: overall amount of activity VF2: distribution of activity in camera view VF3: rate of change in activity VF4: symmetry of motion VF5: non-facial movement

SVM classifier worked best

17

Corpus for Evaluation Created corpus of 98 SL videos and 94

likely false positive (non-SL) videosMajority of non-SL videos were likely false

positives based on visual analysis○ Person facing camera moving their hands and

arms (e.g. gesturing presenter, weather forecaster)Small number of non-SL videos were selected

were false positives based on tag search○ Number kept small because these are likely easier

than the others to detect

18

Evaluation Method Common method for testing classifier

Each classifier tested on 1000 executions in each context

Randomly select training and testing sets each execution

MetricsPrecision – % of SL videos classified as SL videos

that really are SL videosRecall – % of SL videos correctly classified as SL

videosF1 score – harmonic mean of precision and recall

19

Overall Results All five features, varying size of training set

While larger training sets improve recall the effect is fairly small

Later results are with 15 training videos/class.

# Training Videos/Class Precision Recall F1 Score

15 81.73% 86.47% 0.8430 83.62% 88.11% 0.8545 80.67% 91.00% 0.8560 82.21% 90.83% 0.86

20

All But One Feature Comparing the results when one feature is

removed from the classifier

Removing VF4 (symmetry of motion) has the largest effect meaning it has the most useful information not found in the other features

Video Feature Removed Precision Recall F1 Score

VF1 80.36% 86.25% 0.83VF2 78.34% 85.41% 0.82VF3 78.90% 83.62% 0.81VF4 72.80% 74.30% 0.74VF5 78.86% 85.60% 0.82

21

Only One Feature Comparing the results when only one feature is provided to

the classifier

Again, VF4 (symmetry of motion) has the most valuable information

VF4 alone does better than the other four features combined

Video Feature Precision Recall F1 Score

VF1 70.48% 60.14% 0.65VF2 73.57% 53.26% 0.62VF3 65.65% 64.03% 0.65VF4 75.95% 83.69% 0.80VF5 56.31% 49.52% 0.53

22

Discussion of Failures (False Positives) Our non-SL videos were

chosen to be hardPrecision of ~80% means

about one in five videos identified as sign language was really one of these

Performance on the typical video sharing site would be much better because most non-SL videos would be easy to classify

We are happy with this performance

23

Discussion of Failures (False Negatives) Examining the SL videos not

recognized by the classifierSome failures were due to signers

frequently turning away from the cameraOthers were due to the background being

similar in color to the signer’s skin toneStill others were due to movement in the

background Backing off our requirements for the

signer to face the camera and improving our background model would help in many of these cases

HCC Conclusions Examined current practice to determine

need for systemIdentified new problem of locating SL videosQuantified the difficulty with existing toolsDeveloped methodTested with real world data

Future workDeploy system to test if it meets the need

Multi-Application User Interest Modeling

Example 2:

Task: Information Triage Many tasks involve selecting and

reading more than one document at once

Information triage places different demands on attention than single-document reading activities

Continuum of types of reading: working in overview (metadata), reading at various levels of depth

(skimming), reading intensively

How can we bring user’s attention to content they will find valuable?

User Interest Modeling User model – a system’s representation

of characteristics of its userGenerally used to adapt/personalize systemCan be preferences, accessibility issues, etc.

User interest model – a representation of the user’s interestsMotivation: information overloadHistory: many of the concepts found in work

on information filtering (early 1990s)

Interest Modeling for Information Triage Prior interest models tend to assume one

application Example: browser observing page views and time on pageMultiple applications are involved in information triage

(searching, reading, and organizing) When applications do share a user model, it is with

regard to a well-known domain modelExample: knowledge models shared by educational

applicationsNot possible since triage deals with decisions about

relative value among documents of likely value

Acquiring User Interest Model Explicit Methods

users tend not to provide explicit feedbacklong tail assumptions not applicable

Implicit MethodsReading time has been used in many casesScrolling and mouse events have been shown

somewhat predictiveAnnotations have been used to identify passages of

interest Problem: Individuals vary greatly and have

idiosyncratic work practices

Potential Value?: A First Study Study designed to look at:

deciding what to keep expressing an initial view of relationships

Part of a larger study: 8 subjects in role of a reference librarian, selecting and organizing

information on ethnomathematics for a teacher Setting: top 20 search results from NSDL & top 20 search

results from Google presented in VKB 2 Subjects used VKB 2 to organize and Web browser to read

After task, subjects were asked to identify: 5 documents they found most valuable 5 documents they found least valuable

Many User Actions Anticipate Document Assessment

Correlated actions (p < .01) (from most to least correlated) Number of object moves Scroll offset Number of scrolls Number of border color changes Number of object resizes Total number of scroll groups Number of scrolling direction changes Number of background color changes Time spent in document Number of border width changes Number of object deletions Number of document accesses Length of document in characters

Blue – from VKB White – from browser

Interest ModelsBased on the data from first study, we

developed four interest modelsThree were mathematically derived

○ Reading-Activity Model○ Organizing-Activity Model○ Combined Model

One hand-tuned model included human assessment based on observations of user activity and interviews with users.

Evaluation of Models 16 subjects with same:

Task (collecting information on ethnomathmatics for teacher) and

Setting (20 NSDL and 20 Google results) Different rating of documents

Subjects rated all documents on a 5-point Likert scale (with 1 meaning “not useful” and 5 meaning “very useful”)

Predictive Power of Models Models limited due to data from original study Used aggregated user activity and user

evaluations to evaluate models

Lower residue indicates better predictions Combined model better than reading-activity

model (p=0.02) and organizing-activity model (p=0.07)

Model Avg. Residue Std. Dev.Reading-activity model 0.258 0.192Organizing-activity model 0.216 0.146Combined model 0.175 0.138Hand-tuned model 0.197 0.134

Architecture for Interest Modeling

Results of study motivated development of infrastructure for multi-application interest modeling

Location/Overview Application

Organizing Application

Reading Application

User Interest Estimation Engine

Reading Application

Reading ApplicationInterest

Profile Manager

Interest Profile

New Tools: VKB 3Main Layer System

Layer

Main Layer

System LayerNew Document Object

User expression via coloring document objects’ user layer leads to user interests

System layer used to indicate documents’ relations to inferred interests

New Tools: WebAnnotate

WebAnnotate ToolbarAnnotation Suggestion

Annotation-based Visualizations

Evaluation of the New Design 20 subjects organized 40 documents about

“antimatter” returned by Yahoo! search Subjects assessed the relevance of each

document at the end of the task 10 with and 10 without

suggestions/thumbnails Measured

Task switchingTime on documents

Results Task Switching

Fewer but longer reading sessions with new interface

Average reading time○ 10.7 seconds with new

features○ 4.3 seconds without○ p < 0.0001

Interpretation: People are doing more in-depth reading

Group 1 Group 2ID Coef. Sigma ID Coef. Sigma1 0.429 0.018 11 0.277 0.0932 0.397 0.014 12 0.111 0.5653 0.356 0.087 13 0.210 0.2054 0.409 0.011 14 - 0.148 0.3765 0.576 0.008 15 0.367 0.0246 0.206 0.214 16 0.633 < 0.00017 0.137 0.412 17 0.116 0.4898 0.438 0.006 18 0.114 0.4959 0.629 < 0.0001 19 0.101 0.547

10 0.170 0.309 20 0.240 0.147

Results Document Attention

6 of 10 subjects with new interface had correlations between reading time and document value

Only 2 subjects with old interface had significant correlations

Interpretation: New interface users located and spent more time on documents of value to their task

HCC Conclusions Question simplifying assumptions

Recognized that users are engaged with multiple documents and multiple applications simultaneously

Iterate between design and user studiesDesign software as an extensible environment

enabling easier redesign New system resulted in more in-depth

reading and more time spent on relevant documents

Broad View of Computer Science Many really important problems require

cooperative problem solving systemsSolutions that assume we can vary the

behavior of only one of the computer and the user are less likely to succeed

Need CPS design, development, and evaluation skillsRecognize whether the problem is one of

computation, representation, or interaction You can be part of solving big problems

Contact InformationEmail: [email protected]

Web: www.csdl.tamu.edu/~shipman

mailto:[email protected]

http://www.csdl.tamu.edu/~shipman

human-centered computing

Documents

sign language community

ambiguousin sign language

sign language discussions

sign languagetotalon

query sign languagetags

human activities

human activity

large video sharing