personalized object recognition for augmenting human memoryhosubl/wahm16_presentation.pdf · / 16...
TRANSCRIPT
/ 16
Personalized Object Recognition for Augmenting Human MemoryHOSUB LEE 1, CAMERON UPRIGHT 2, STEVEN ELIUK 2, ALFRED KOBSA 1
1: UNIVERSITY OF CALIFORNIA, IRVINE
2: SAMSUNG RESEARCH AMERICA
2016-09-12 WAHM 2016 1
/ 16
SummaryPersonalized Object Recognition System◦ Users can easily create their own image classifiers via Google Glass (and server)
◦ Users may utilize it to augment their memory (e.g., memory enhancer)
2016-09-12 WAHM 2016 2
Google Glass
ML model
Server
Classification request
Classification result
What is this?
This is “Jessica” Training request (w/ new data)
Update ML model(w/ new data)
/ 16
Introduction (1/2)Wearable Computing: Google Glass◦ Users can easily collect image data about their surroundings through Google Glass camera
Deep Learning: Convolutional Neural Networks (CNN)◦ CNN, one sort of neural networks, can mimic how the human brain perceives images
◦ CNN-based image classifiers have reached near-human accuracy levels
2016-09-12 WAHM 2016 3
Google Glass Convolutional Neural Networks
Camera
/ 16
Introduction (2/2)Problem◦ Most deep learning applications thus far have been developed for the general population
◦ Some users may want to build their “own” applications
◦ People with memory problems: “What was this?”
◦ Professor who gives a lecture to 300 students: “What was your name?”
Solution◦ We collect user-generated image data via Google Glass
◦ We train personalized deep learning model (CNN) on the user-generated image data
◦ We run image classifier based on the personalized deep learning model upon user request
2016-09-12 WAHM 2016 4
/ 16
Related WorkWearable Visual Recognition System◦ Wearable personal imaging device recognizing human faces (Steve Mann, 1997)
◦ Object recognition system recognizing American Sign Language (Thad Starner et al, 1998)
◦ Wearable system recognizing 24 different types of objects (Antonio Torralba et al, 2003)
◦ Image recognition app for Google Glass (AlchemyAPI, 2013)
◦ Emotion recognition software for Google Glass (Fraunhofer, 2014)
◦ Google Glass application retrieving meta information from images (Way et al, 2015)
Limitations◦ Prototypes were cumbersome to wear
◦ No considerations on personalized machine learning models
◦ Just concept, no implementations
2016-09-12 WAHM 2016 5
Steve Mann, 1997 AlchemyAPI, 2013
Communication units
Google Glass, but ML model for public
/ 16
Personalized Object Recognition System: DeepEyeSYSTEM ARCHITECTURE
WORKFLOW – TRAINING AND CLASSIFICATION
EXPERIMENT
2016-09-12 WAHM 2016 6
/ 16
DeepEye: System ArchitectureClient-Server Model◦ Client: Google Glass
◦ Collect images and send them to the server with a specific task type (training or classification)
◦ Server: Linux workstation w/ Caffe deep learning framework
◦ [Training] train (or update) the CNN using finetuning whenever new image data is available
◦ [Classification] classify an image through the most recently trained CNN
2016-09-12 WAHM 2016 7
Google Glass
ML model
Server
Classification request
Classification result
What is this?
This is “Jessica” Training request (w/ new data)
Update ML model(w/ new data)
TrainingClassification
/ 16
DeepEye: Workflow – Training (1/3)Labeling◦ User enters the name of the target object (i.e., its label) through Google Voice Input
Data Collection◦ DeepEye begins to take a photo of the object every five seconds
◦ DeepEye then transmits the collected image w/ the task type (caffe::train) to the server
◦ Process is repeated until the user has explicitly terminated the training task
2016-09-12 WAHM 2016 8
Initial Screen Labeling via voice Data Collection
/ 16
DeepEye: Workflow – Training (2/3)Training: Finetuning◦ Train a new model by recycling the fully trained model on a larger dataset
◦ Exploit the pre-trained CNN’s parameter values representing generic visual features like edges
◦ Focus on updating parameters representing object-specific (high-level) features for our image data
2016-09-12 WAHM 2016 9
Finetuning CNN
Generic features: edges
High-level features: shapes
/ 16
DeepEye: Workflow – Training (3/3)Training: Finetuning (cont’d)
2016-09-12 WAHM 2016 10
Training Process
/ 16
DeepEye: Workflow – ClassificationClassification◦ User takes a photo of the object by clicking Google Glass touch pad
◦ DeepEye sends the image w/ the task type (caffe::classify) to the server
◦ Server uses the latest trained CNN to execute the Caffe classification command on the image
◦ Server then sends the classification result (w/ probability) back to DeepEye
◦ DeepEye displays the result to the user through Google Glass’s heads-up display
2016-09-12 WAHM 2016 11
Classification
/ 16
DeepEye: Experiment (1/2)10 Class Object Recognition◦ We evaluated the prediction power of the trained CNNs via DeepEye in a real world scenario
Training Data◦ We selected 10 personal objects of a member of our research team
◦ We collected 100 images for each class, and augmented them by creating four variations
◦ Rotated by 90, 180, and 270 degrees, and one mirrored
Validation Data◦ We also collected 30 additional images for each class (w/ different photographing conditions)
2016-09-12 WAHM 2016 12
Training and Validation Data (sample)
/ 16
DeepEye: Experiment (2/2)Validation Accuracy◦ For up to 7 different objects, the trained CNNs showed a near perfect performance
◦ Accuracy was slightly diminished as the number of object categories increases from 8 to 9
◦ The final trained CNN’s validation accuracy was 97% with a loss of 0.116
Training Time◦ It took about 7 minutes to train the final model on our GPU environment (GeForece GTX 970)
2016-09-12 WAHM 2016 13
Validation Accuracy
/ 16
Discussion and Future WorkGoogle Glass◦ Google Glass emits a lot of heat when it continuously utilizes the camera function
◦ Google Glass battery drains quickly (< 2 hours)
Scalability and Applicability◦ Tested on small datasets only (100 class object recognition?)
◦ Tested for object recognition task only (face recognition?)
Effectiveness◦ Need to assess the usability of the system for people with memory disorders
◦ Need to verify whether the system can improve their memory and cognitive abilities
2016-09-12 WAHM 2016 14
/ 16
ConclusionIn This Paper◦ We developed a personalized object recognition system for augmenting human memory
◦ Wearable computing + deep learning
◦ We utilized finetuning approach to efficiently train personalized deep learning models
◦ We plan to test the system with more complex object recognition tasks
◦ We also plan to verify its effectiveness in augmenting human memory and perception
2016-09-12 WAHM 2016 15
/ 16
Thank You!ANY QUESTIONS?
2016-09-12 WAHM 2016 16