personal information search and discovery

28
Personal Information Search and Discovery Amélie Marian Rutgers University [email protected]

Upload: amelie-marian

Post on 25-Jul-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Personal Information Search and Discovery

Personal Information Search and Discovery

Amélie MarianRutgers University

[email protected]

Page 2: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

2

Personal data is everywhere

Page 3: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

3

Personal data is exploding

• More and more devices/systems are capturing all parts of our lives:– Actively

emails, social media, calendar, contacts…

– PassivelyGPS, records of financial transactions, records of purchases

– StealthilyClicks, searches, interactions, tv viewing habits

Page 4: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

4

The time for Personal Information Management Systems is now!

“A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”

- Vannevar Bush, The Atlantic Monthly, 1945

Page 5: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

5

Personal Information Management Challenges

• Data fragmentation– Storage, archiving– Data integration– Data maintenance– Synchronization– Data quality

• Data ownership– Access control– Privacy– Sharing

• Functionalities– Search– Knowledge discovery and data mining– Internet of things

Page 6: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

6

Saving Personal Data – Old School

Page 7: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

7

Searching Personal Data – Old School…

File cabinet around 1888

Page 8: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

8

Personal Information Management – the Digital Age

% grep PIMS /usr/amelie/presentations

Page 9: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

9

First-generation PIMS – Desktop based

• Storage– Archival, safe-keeping

• Organization– Structure– Different file types

• Finding and re-finding information– Different from traditional IR/Web search systems– Keyword searches not ideal

Page 10: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

10

Desktop Search Tools

• Google Desktop Search (defunct)• Apple Spotlight• Windows Search

• Lead to frustration when users cannot find information they know they have

Use IR-style keyword searches Some metadata filtering

Page 11: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

11

Some Past PIMS projects

• Lifestreams– Time oriented streams

• Stuff I’ve seen– History of web behavior

• Haystack– Uniform data model

• Connections, Seetrieve– Task-based organization

• Dataspaces– Semantic connections. Data

integration

• deskWeb– Looks at the social network graph

Various use of – Context– Time– Social network

Limitations– Limited data integration– Local storage– Basic functionalities

Page 12: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

12

A changing landscape

Cloud-based model

Heterogeneous data types and formats

Need for richer functionalities

Page 13: Personal Information Search and Discovery

The Future of Personal Information Search

Page 14: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

14

Life-logging

From Memex to MyLifeBitsMemex: Memory index or Memory extender

– Hypertext system by Vannevar Bush in 1945 – Compress and store all of their books,

records, and communications…– Provide an "enlarged intimate supplement to

one's memory”

MyLifeBits– Microsoft Research project with Gordon Bell– All documents read or produced by Bell, CDs,

emails, web pages browsed, phone and instant messaging conversations, etc.

Page 15: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

15

Hypermnesia

Exceptionally exact or vivid memory, especially as associated with certain mental illnesses

For a user: We cannot live knowing that any word, any move will leave a trace?

For the ecosystem: We cannot store all the data we produce – lack of storage resources

Forgetting is Key to a Healthy MindScientific AmericanImage: Aaron Goodman

A main issue is to select the information we choose to keep

Page 16: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

16

Memory Tasks

• The “five Rs” memory tasks -Sellen and Whitaker, CACM

2010

RecollectingReminiscingRetrievingReflectingRemembering intentions

Page 17: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

17

Recollecting

• Task-based memory process• Retracing steps to recollect information– “Where did I leave my keys” – “When was the last time I saw Pierre”

• Follow a series of cues to identify information

Need: Connections between memory objects (integration and navigation)

Page 18: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

18

Reminiscing

• Browsing through past memories to re-live them

• Experience-based (no specific goal in mind)– E.g., looking at old

photos

Need: Connections between memory objects (integration and navigation)

Page 19: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

19

Retrieving

• Retrieving specific information– Files, documents, pictures– Data snippets

• Use of metadata• Can be combined with recollection

Need: Query model, Indexes, and Search

algorithms

Page 20: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

20

Reflecting

• Learning from the past– Identify patterns– Personal data analysis

• Towards a Personal Knowledge Base (PKB)– Individual vs. shared knowledge– Privacy concerns

Need: Knowledge Discovery and Mining techniques designed for personal data

Page 21: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

21

Remembering Intentions

• Focus on prospective memory– To-do lists– Appointment reminders

• Active focus of commercial companies– Google Now– Notification apps (time- or location-based)– Microsoft Personal Agent project?

Need: NLP techniques designed for personal data

Page 22: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

22

One more wish: Serendipity

• Hearing by chance a song that is going to totally obsess you

• A suggested book that will change your life

• Entering this small restaurant that you will remember forever

This is serendipitous

• A perfect search engine • A perfect recommendation

system• A perfect computer assistantEfficient but not exciting

They lack serendipity

Design programs that would help introduce serendipity in our lives – Focus on the experience

Page 23: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

23

Digital Self Project at Rutgers University

• Personal data is rich in contextual information – We remember our data based on

contextual cues

• Individualized context-aware personal information management tool – Integrate users’ fragmented data – Support personal information search – Build a personal knowledge base

Faculty– Amélie Marian– Thu Nguyen– Alex Borgida

Students (past and present)

– Daniela Vianna– Valia Kalokiri– Alicia-Michelle Yong– Chaolun Xia

Page 24: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

24

Digital Self Architecture• Data Collection

– Identification, retrieval, storage – Personal Extraction Tool:

https://github.com/ameliemarian/DigitalSelf

• Data Integration– Multidimensional, context-aware,

unified data model– w5h Model

• Search– based on the natural memory

retrieval process– Context-aware, approximate– -w5h Search

• Knowledge Discovery– Find connections and patterns– Integrates user behavior and

feedback

Page 25: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

25

w5h - Context-aware Data Model

• Personal information is rich in contextual information– Metadata– Application data – Environment knowledge

• Cognitive Psychology– contextual cues are strong triggers for autobiographical

memories • Personal information can be modeled and indexed

following six dimensions – – what, who, where, when, why and how - w5h Model

Page 26: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

26

Preliminary Results - MRR

Bold: statistically significant (p<0.05)

w5h: context aware search, w5h indexesText: Mongodb text index over integrated data Solr: Text index on raw data

Page 27: Personal Information Search and Discovery

Amélie Marian - Rutgers University - SinFra 2015

27

Conclusions

• The time for better Personal Information Management is now!– Many exciting research challenges– Important ethical and societal implications

• The ability to search and recover past information is a critical feature of future PIMS– Need to take the specificities of Personal

Information search into account

Page 28: Personal Information Search and Discovery

28

ReferencesPIMS:As we may think, Vannevar Bush, the Atlantic Monthly, 2005.Personal Information Management. W. Jones and J. Teevan, editors.

University of Washington Press, 2007.Beyond total capture: a constructive critique of Lifelogging, Sellen and Whitaker, CACM 2010.A tool for personal data extraction. Vianna, Yong, Xia, Marian, and Nguyen, IIWeb 2014.Microsoft’s Stuff I’ve Seen project (Dumais et al. SIGIR 2003)MyLifeBits (Gemmel, Bell and Lueder, CACM 2006)deskWeb (Zerr et al. SIGIR 2010)Connections (Soules and Ganger, SOSP 2005)Seetrieve (Gyllstrom and Soules, IUI 2008)LifeStreams (Fertig, Freeman, and Gelernter, CHI 1996)Haystack (Karger et al. CIDR 2005)Data Integration:A survey of approaches to automatic schema matching, Rahm & Bernstein 2001. Principles of Data integration, Doan, Halevy, Ives, 2012.Principles of dataspace systems, Halevy, Franklin, and Maier. CACM, 2006.

Amélie Marian - Rutgers University - SinFra 2015