human-centered computing
DESCRIPTION
Frank Shipman Professor, Department of Computer Science and Engineering Associate Director, Center for the Study of Digital Libraries Texas A&M University. Human-Centered Computing. Outline. Short discussion of research area Supporting access to sign language video - PowerPoint PPT PresentationTRANSCRIPT
HUMAN-CENTERED COMPUTING
Frank ShipmanProfessor, Department of Computer Science and Engineering
Associate Director, Center for the Study of Digital LibrariesTexas A&M University
Outline Short discussion of research area Supporting access to sign language
videoObservations of potential user community
causes redefinition of the problem Multi-application user interest modeling
Iterative design moving from concept to relatively complete system
Research “Area”
Many interests Multimedia New Media Computers and Education Computers and Design Software Engineering Computer-Supported
Cooperative Work Human-Computer
Interaction Knowledge-Based Systems
Best descriptions I have come up with:
Cooperative Problem Solving Systems Systems where humans &
computers cooperatively solve problems (humans are part of overall system)
Intelligent User Interfaces Interactive systems that
process information in non-trivial ways
AIHCIIR
MM
What is human-centered computing? Developing software or computational
techniques with a deep understanding of the human activities they will support
ImplicationsMost often need to study the human activity
before designing the softwareDesign may be (likely will be) a cooperative
problem solving system rather than a software system
Cooperative Problem Solving System What is a cooperative problem solving
system?A system that includes human and software
components to perform a task or solve a problem
ImplicationsTake advantage of the asymmetry of
partners in system designEvaluation of overall system involves
humans
Supporting Access toSign Language Video
First Example:
7
Sharing Sign Language Video Opportunity
Cameras in laptops and attached to computers enable easy capture of sign language video
Video sharing sites (e.g. YouTube) allow the publication of such expressions
PracticePointers to the videos are passed around in other
media (e.g. email, Facebook)Some sites specifically support the sign language
community
8
Sharing Sign Language Video Locating a sign language video on a particular
topic is still difficult The community-specific sites have limited
collectionsPeople must upload to the site orMust add a pointer for each video to the site
Locating desired videos within the large video sharing sites rely on metadata (e.g. tags)Tags must be accurately applied indicating both
the language and the topic
How Good is Text-based Search? Search for sign language discussions of the
top 10 news queries for 2011 from Yahoo! Queries performed with the addition of “ASL”
and “sign language”
In Sign Language
Not in Sign Language
Total
On Topic 50 (45.5%) 27 (24.5%) 77 (70%)
Not on Topic 24 (21.8%) 9 (8.2%) 33 (30%)
Total 74 (67.3%) 36 (32.7%) 110 (100%)
Duarte, Gutierrez-Osuna, and Shipman, Texas A&M University
10
Why Tags Are Not Enough Consider results from the first page of
results for the query “sign language” Tags are ambiguous
In sign language vs. about sign language
Different meanings of sign language
Sign language as a song title
11
Automatic Identification of SL Video Our approach is to develop a technique
that can automatically identify if a video is in sign language
To run on a site the size of YouTubeShould be accurate enough to be run
without human verification of resultsShould be efficient enough to be run during
video upload without significant extra resources
12
What is Sign Language Video We decided to scope the problem by focusing
on the equivalent of sign language documentsRecorded by an individual
with the intent of being watched
What we are not trying to identify (yet)Videos of sign language
conversationsSign language translations
13
Related and Prior Work Work on sign language recognition
Recognizing what is being said in sign language
Often assumes the video is in sign languageToo heavyweight for our purpose
Detecting sign languageRecognizing when a person starts signing for
more efficient resource utilizationNot designed to work on likely false positives
14
Designing a SL-Video Classifier Our classifier
processes a randomly selected 1 minute segment from the middle of the video
returns a yes/no decision being a SL video Design method
Use standard video processing techniquesFive video features selected based on their
expected relation to SL videoTest classifiers provided with one or more of the
features
15
Video Processing Background Modeling
Convert to greyscaleDynamic model (to cope with changes in signer
body position and lighting)○ BPt = .96 * BP(t-1) + .04 P
Foreground object detectionPixels different from background model by more
than a threshold are foreground pixelsSpatial filter removes regions of foreground
pixels smaller than a minimum threshold Face location to determine position of
foreground relative to the face Videos without a single main face are not
considered as potential SL videos
16
Five Visual Features VF1: overall amount of activity VF2: distribution of activity in camera view VF3: rate of change in activity VF4: symmetry of motion VF5: non-facial movement
SVM classifier worked best
17
Corpus for Evaluation Created corpus of 98 SL videos and 94
likely false positive (non-SL) videosMajority of non-SL videos were likely false
positives based on visual analysis○ Person facing camera moving their hands and
arms (e.g. gesturing presenter, weather forecaster)Small number of non-SL videos were selected
were false positives based on tag search○ Number kept small because these are likely easier
than the others to detect
18
Evaluation Method Common method for testing classifier
Each classifier tested on 1000 executions in each context
Randomly select training and testing sets each execution
MetricsPrecision – % of SL videos classified as SL videos
that really are SL videosRecall – % of SL videos correctly classified as SL
videosF1 score – harmonic mean of precision and recall
19
Overall Results All five features, varying size of training set
While larger training sets improve recall the effect is fairly small
Later results are with 15 training videos/class.
# Training Videos/Class Precision Recall F1 Score
15 81.73% 86.47% 0.8430 83.62% 88.11% 0.8545 80.67% 91.00% 0.8560 82.21% 90.83% 0.86
20
All But One Feature Comparing the results when one feature is
removed from the classifier
Removing VF4 (symmetry of motion) has the largest effect meaning it has the most useful information not found in the other features
Video Feature Removed Precision Recall F1 Score
VF1 80.36% 86.25% 0.83VF2 78.34% 85.41% 0.82VF3 78.90% 83.62% 0.81VF4 72.80% 74.30% 0.74VF5 78.86% 85.60% 0.82
21
Only One Feature Comparing the results when only one feature is provided to
the classifier
Again, VF4 (symmetry of motion) has the most valuable information
VF4 alone does better than the other four features combined
Video Feature Precision Recall F1 Score
VF1 70.48% 60.14% 0.65VF2 73.57% 53.26% 0.62VF3 65.65% 64.03% 0.65VF4 75.95% 83.69% 0.80VF5 56.31% 49.52% 0.53
22
Discussion of Failures (False Positives) Our non-SL videos were
chosen to be hardPrecision of ~80% means
about one in five videos identified as sign language was really one of these
Performance on the typical video sharing site would be much better because most non-SL videos would be easy to classify
We are happy with this performance
23
Discussion of Failures (False Negatives) Examining the SL videos not
recognized by the classifierSome failures were due to signers
frequently turning away from the cameraOthers were due to the background being
similar in color to the signer’s skin toneStill others were due to movement in the
background Backing off our requirements for the
signer to face the camera and improving our background model would help in many of these cases
HCC Conclusions Examined current practice to determine
need for systemIdentified new problem of locating SL videosQuantified the difficulty with existing toolsDeveloped methodTested with real world data
Future workDeploy system to test if it meets the need
Multi-Application User Interest Modeling
Example 2:
Task: Information Triage Many tasks involve selecting and
reading more than one document at once
Information triage places different demands on attention than single-document reading activities
Continuum of types of reading: working in overview (metadata), reading at various levels of depth
(skimming), reading intensively
How can we bring user’s attention to content they will find valuable?
User Interest Modeling User model – a system’s representation
of characteristics of its userGenerally used to adapt/personalize systemCan be preferences, accessibility issues, etc.
User interest model – a representation of the user’s interestsMotivation: information overloadHistory: many of the concepts found in work
on information filtering (early 1990s)
Interest Modeling for Information Triage Prior interest models tend to assume one
application Example: browser observing page views and time on pageMultiple applications are involved in information triage
(searching, reading, and organizing) When applications do share a user model, it is with
regard to a well-known domain modelExample: knowledge models shared by educational
applicationsNot possible since triage deals with decisions about
relative value among documents of likely value
Acquiring User Interest Model Explicit Methods
users tend not to provide explicit feedbacklong tail assumptions not applicable
Implicit MethodsReading time has been used in many casesScrolling and mouse events have been shown
somewhat predictiveAnnotations have been used to identify passages of
interest Problem: Individuals vary greatly and have
idiosyncratic work practices
Potential Value?: A First Study Study designed to look at:
deciding what to keep expressing an initial view of relationships
Part of a larger study: 8 subjects in role of a reference librarian, selecting and organizing
information on ethnomathematics for a teacher Setting: top 20 search results from NSDL & top 20 search
results from Google presented in VKB 2 Subjects used VKB 2 to organize and Web browser to read
After task, subjects were asked to identify: 5 documents they found most valuable 5 documents they found least valuable
Many User Actions Anticipate Document Assessment
Correlated actions (p < .01) (from most to least correlated) Number of object moves Scroll offset Number of scrolls Number of border color changes Number of object resizes Total number of scroll groups Number of scrolling direction changes Number of background color changes Time spent in document Number of border width changes Number of object deletions Number of document accesses Length of document in characters
Blue – from VKB White – from browser
Interest ModelsBased on the data from first study, we
developed four interest modelsThree were mathematically derived
○ Reading-Activity Model○ Organizing-Activity Model○ Combined Model
One hand-tuned model included human assessment based on observations of user activity and interviews with users.
Evaluation of Models 16 subjects with same:
Task (collecting information on ethnomathmatics for teacher) and
Setting (20 NSDL and 20 Google results) Different rating of documents
Subjects rated all documents on a 5-point Likert scale (with 1 meaning “not useful” and 5 meaning “very useful”)
Predictive Power of Models Models limited due to data from original study Used aggregated user activity and user
evaluations to evaluate models
Lower residue indicates better predictions Combined model better than reading-activity
model (p=0.02) and organizing-activity model (p=0.07)
Model Avg. Residue Std. Dev.Reading-activity model 0.258 0.192Organizing-activity model 0.216 0.146Combined model 0.175 0.138Hand-tuned model 0.197 0.134
Architecture for Interest Modeling
Results of study motivated development of infrastructure for multi-application interest modeling
Location/Overview Application
Organizing Application
Reading Application
User Interest Estimation Engine
Reading Application
Reading ApplicationInterest
Profile Manager
Interest Profile
New Tools: VKB 3Main Layer System
Layer
Main Layer
System LayerNew Document Object
User expression via coloring document objects’ user layer leads to user interests
System layer used to indicate documents’ relations to inferred interests
New Tools: WebAnnotate
WebAnnotate ToolbarAnnotation Suggestion
Annotation-based Visualizations
Evaluation of the New Design 20 subjects organized 40 documents about
“antimatter” returned by Yahoo! search Subjects assessed the relevance of each
document at the end of the task 10 with and 10 without
suggestions/thumbnails Measured
Task switchingTime on documents
Results Task Switching
Fewer but longer reading sessions with new interface
Average reading time○ 10.7 seconds with new
features○ 4.3 seconds without○ p < 0.0001
Interpretation: People are doing more in-depth reading
Group 1 Group 2ID Coef. Sigma ID Coef. Sigma1 0.429 0.018 11 0.277 0.0932 0.397 0.014 12 0.111 0.5653 0.356 0.087 13 0.210 0.2054 0.409 0.011 14 - 0.148 0.3765 0.576 0.008 15 0.367 0.0246 0.206 0.214 16 0.633 < 0.00017 0.137 0.412 17 0.116 0.4898 0.438 0.006 18 0.114 0.4959 0.629 < 0.0001 19 0.101 0.547
10 0.170 0.309 20 0.240 0.147
Results Document Attention
6 of 10 subjects with new interface had correlations between reading time and document value
Only 2 subjects with old interface had significant correlations
Interpretation: New interface users located and spent more time on documents of value to their task
HCC Conclusions Question simplifying assumptions
Recognized that users are engaged with multiple documents and multiple applications simultaneously
Iterate between design and user studiesDesign software as an extensible environment
enabling easier redesign New system resulted in more in-depth
reading and more time spent on relevant documents
Broad View of Computer Science Many really important problems require
cooperative problem solving systemsSolutions that assume we can vary the
behavior of only one of the computer and the user are less likely to succeed
Need CPS design, development, and evaluation skillsRecognize whether the problem is one of
computation, representation, or interaction You can be part of solving big problems
Contact InformationEmail: [email protected]
Web: www.csdl.tamu.edu/~shipman