exploring capturable everyday memory for autobiographical authentication, at ubicomp 2013

1. Exploring Capturable Everyday Memory for Autobiographical Authentication Sauvik Das, Eiji Hayashi, Jason Hong 1 {sauvikda, ehayashi, jasonh}@cs.cmu.edu

2. Mobile Authentication Today 2 3. Mobile Authentication Is 3 Underutilized Up to 50% do not use authentication. Outdated Ported from the desktop world. http://confidenttechnologies.com/files/Infographic%20small%20image.png 4. 4 Smartphones know a lot about us. 5. 5 We carry them everywhere. 6. 6 They are our communication hubs. 7. 7 They know where we are. 8. 8 They know where were going. 9. 9 We use them to browse the web. 10. 10 and to take photos. 11. 11 And the apps! Oh, the apps we use! 12. Key Observation 12 Smartphone Logs Human Memory Capturable Everyday Memory 13. Key Observation Capturable Everyday Memory can be used as the basis for a series of autobiographical challenge-response questions. This is what we call Autobiographical Authentication. 13 14. 14 What did you eat for lunch yesterday? 15. 15 What application did you use at 1pm? 16. 16 Who did you call yesterday at 4pm? 17. CONTRIBUTIONS 17 18. A Model of Capturable Everyday Memory How well can users answer questions about capturable everyday memory? What factors affect their performance? 18 19. A Framework for Autobiographical Authentication How can we move from raw, noisy question- answer responses to an authentication decision? How do we handle inaccuracies/lapses in human memory? 19 20. METHODOLOGY 20 21. Approach Ran 3 studies to get a handle on both questions that could be asked and can be asked. 2 Mturk studies based on self-report data. 1 Field study with ground truth data. 21 22. FIELD STUDY 22 23. myAuth Indexed knowledge on the phone: Sensor readings System-level content-providers Capable of asking 13 questions. 23 24. 24 QType Description FBApp What application did you use on ? FBLoc Where were you on ? FBOCall Who did you call on ? FBInCall Who called you on ? FBOSMS Who did you SMS message on ? FBInSMS Who SMS messaged you on ? FBIntSrc What did you search the internet for on ? FBIntVis What website did you visit on ? NAOSMS Name someone you SMS messaged in the last 24 hours. NAInSMS Name someone who SMS messaged you in the last 24 hours. NAOCall Name someone you called in the last 24 hours. NAInCall Name someone who called you in the last 24 hours. NAApp Name an application you used in the last 24 hours. 25. 25 QType Description FBApp What application did you use on ? FBLoc Where were you on ? FBOCall Who did you call on ? FBInCall Who called you on ? FBOSMS Who did you SMS message on ? FBInSMS Who SMS messaged you on ? FBIntSrc What did you search the internet for on ? FBIntVis What website did you visit on ? NAOSMS Name someone you SMS messaged in the last 24 hours. NAInSMS Name someone who SMS messaged you in the last 24 hours. NAOCall Name someone you called in the last 24 hours. NAInCall Name someone who called you in the last 24 hours. NAApp Name an application you used in the last 24 hours. 26. 26 QType Description FBApp What application did you use on ? FBLoc Where were you on ? FBOCall Who did you call on ? FBInCall Who called you on ? FBOSMS Who did you SMS message on ? FBInSMS Who SMS messaged you on ? FBIntSrc What did you search the internet for on ? FBIntVis What website did you visit on ? NAOSMS Name someone you SMS messaged in the last 24 hours. NAInSMS Name someone who SMS messaged you in the last 24 hours. NAOCall Name someone you called in the last 24 hours. NAInCall Name someone who called you in the last 24 hours. NAApp Name an application you used in the last 24 hours. 27. 27 RecognitionRecall 28. Study Design Answer 5 questions a day for 14 days. Incentive to answer questions correctly. Could skip questions if they chose. 28 29. Descriptive Stats 24 users Average age: 25 (s.d. 6.25, range 18-43) 14 male (58%) 2167 question-answer responses collected 29 30. FIELD STUDY RESULTS 30 31. 1381 questions answered correctly (64%) + 168 near misses (8%) 31 32. Recognition > Recall 68% recognition questions answered correctly vs. 62% recall questions (p = 0.008). 32 33. Performance Stable Over Time No difference between first and last 20% of responses (64.1% vs. 64.8%, p = 0.73). 33 34. Question Type Matters 34 35. Time Bucketing Does Not Help No difference between Fact-Based and Name-Any questions (64% vs. 63%, p=0.5). 35 36. AUTOBIOGRAPHICAL AUTHENTICATION 36 37. Users only get 64% of answers correct. But, performance is stable and errors are systematic. 37 38. Given the systematic and stable nature of user errors, we can make an authentication decision given both correct and incorrect answers. 38 39. Confidence Estimator 39 C(u| seq,)= P(u| seq,)*S(seq |u) Confidence that the attempting authenticator is the user. Range 0 P(u). Probability that the observed sequence comes from the user, given an adversary model. Value, from 0-1, that the observed sequence of responses matches what we expect from the user. User model Observed question/answers Adversary model 40. AutoAuth takes in a sequence of autobiographical question-answer responses and an adversary model, and outputs a confidence score from 0-P(u). 40 41. EVALUATION 41 42. Evaluation Plan Use field study data to run simulations. What confidence scores would users get versus impersonators? Simulate 5 probable adversaries, weak and strong. 42 43. 5 Adversaries Simulated Nave Adversary Observing Adversary Always Correct Adversary Empirical Observing Adversary Empirical Knows Correct Adversary 43 Weakest Strongest 44. Nave Adversary Simulates complete stranger who steals phone. Has a 1/10 chance of guessing the correct answer. Random guess on recognition question. 44 45. Always Correct Adversary Simulates stalker, or software that has compromised the knowledge base. Always answers correctly. 45 46. Empirical Knows Correct Adversary Simulates adversary with all correct answers and an understanding of how an average user answers questions. Purposely gets certain questions wrong to best simulate the average user. 46 47. USER PERFORMANCE Evaluation Results 47 48. 48 49. 49 50. 50 51. ADVERSARY PERFORMANCE Evaluation Results 51 52. Against Naive 52 53. Against ACA 53 54. Against EKCA 54 55. Evaluation Take-Aways Users always get relatively high confidence scores. We can easily defend against simple adversaries. Advanced adversaries do better but can also be detected when modeled against. 55 56. CONCLUSION 56 57. Summary People answered 64% of autobiographical questions correctly, on average. Their errors can be signals, too! Autobiographical Authentication is promising and robust against some tough adversaries. 57 58. Limitations Autobiographical Authentication is slow (22 seconds on average per question). Requires constant device usage to replenish the knowledge base. Remains unclear how users will react to this sort of authentication in practice. 58 59. Questions? 59 60. Practical Use Cases Password Reset. Scalable, Dynamic Authentication. Tiered Authentication. 60 61. Why might we want this? Scalable by context. Authentication is dynamic. Shoulder-surfing, social engineering, brute-force attacks are all much harder to execute. 61 62. Usability Sanity Check 62 63. DERIVING THE AUTOAUTH EQUATION 63 64. Systematic Response Error Model Given an observed sequence of responses, seq, and a user, u, we want something like: P(u | seq) 64 65. 65 Using Bayes Law P(u | seq) = P(seq |u)P(u) P(seq) Prior probability that it is the user. Probability that we would observe this sequence of responses from the user. Overall probability that we would observe this sequence of responses. 66. 66 Adopting an Advesary P(seq) is hard to compute perfectly: Requires knowledge of all possible impersonators. But we can break it into two components. P(seq) = P(seq |u)+P(seq |) Adversary Model 67. Modified Bayesian Equation 67 P(u | seq,) = P(seq |u)P(u) P(seq | u)+ P(seq |) 68. Problem: To Get a High Score 68 P(u | seq,) = P(seq |u)P(u) P(seq | u)+ P(seq |) Minimize this term. 69. Unlike Adversary Model Attack A clever impersonator can get a high score simply by being unlike the adversary model, even if s/he is not like the user. 69 70. Add Bit-String Similarity 70 S(seq |u) = n-| E(seq |u)-correct(seq)| n Number of responses. Expected answer correctness from user. Actual answer correctness from authenticator. Bit string similarity. 71. Final Equation 71 C(u| seq,)= P(u| seq,)*S(seq |u) Confidence that the attempting authenticator is the user. Range 0 100. Probability that the observed sequence comes from the user, given an adversary model. Bit-string similarity between the correctness of the observed sequence and the expected correctness of the observed sequence. 72. Modeling Capturable Everyday Memory Used a Mixed-Effects Logistic Regression Users as a random effect Each user had his/her own baseline likelihood of getting a question correct (intercept). 72 73. Fixed Effect Description Age Integer age. Gender Male or Female. Time to Answer Number of seconds it took the user to answer the question. Time since Correct Answer Number of hours since the correct answer event occurred. Day of Study The day number of the study (0-13). Correct Answer Entropy The Shannon entropy of correct answers for this question type. Answer Uniqueness Inverse of the percentage of times that the correct answer to this question was the correct answer to this type of question. Confidence Self-reported confidence in the answer (1-5). Ease of Remember Answer Self-reported ease of remembering the answer (1-5). Difficulty of Others Guessing Self-reported perceived difficulty of others guessing the answer (1-5). Answer Type Recognition or Recall. Question Type The question type. 73 74. Model Coefficients 74 75. Unique Answers Hard to Remember 75 76. Systematic Response Error Model A simplified system: Two question types Training data An observed authentication attempt 76 QType Probability Correct QT1 0.7 QT2 0.4 Training Data Probability Distribution # QType Correct? 1 QT1 Yes 2 QT2 No Observed Question-Answer Response Sequence 77. Systematic Response Error Model We can calculate the probability that we observe a sequence of responses given the user: 77 QType Probability Correct QT1 0.7 QT2 0.4 Training Data Probability Distribution # QType Correct? 1 QT1 Yes 2 QT2 No Observed Question-Answer Response Sequence P(seq |u) = P(correct(QT1)|u)P(incorrect(QT2)|u) = 0.7*(1-0.4)= 0.42 78. 78 Adopting an Advesary We can simplify the calculation of P(seq) by adopting a specific adversary model, . 79. Location Question With Map 79

exploring capturable everyday memory for autobiographical authentication, at ubicomp 2013

Technology

sms message

recognition questions

recall questions p

authentication decision

mobile authentication

user model

study design answer

observed sequence of