1 dr alexiei dingli introduction to web science reusing knowledge

1

Dr Alexiei Dingli

Introduction to Web Science

Reusing knowledge

2

• Acquire

• Model

• Reuse

• Retrieve

• Publish

• Maintain

Six challenges of the Knowledge Life Cycle

3

• Three reusable types of objects

– Ontologies

– Problem Solving Methods

– Knowledge Bases

• Plus we can also use additional sources (WWW)

Reusing knowledge

4

• Locating the knowledge to be reused is difficult

• Distributed agents may be unaware that the knowledge they need is available (this is the challenge of knowledge retrieval)

• Knowledge may simply be in the wrong form for the task

Problems with reuse

5

• Question answering

• Dialogue systems

Two particular reuse tasks

6

• The main aim of QA is to present the user with a short answer to a question rather than a list of possibly relevant documents.

• As it becomes more and more difficult to find answers on the WWW using standard search engines, question answering technology will become increasingly important.

What is Question Answering?

7

• Clearly there are many different types of questions:

– When was Mozart born?• Question requires a single fact as an answer.• Answer may be found verbatim in text i.e. “Mozart

was born in 1756”.

– How did Socrates die?• Finding an answer may require reasoning.• In this example die has to be linked with drinking

poisoned wine.

Question Types (1)

8

– How do I assemble a bike?• The full answer may require fusing information from many

different sources.• The complexity can range from simple lists to script-based

answers.

– Is the Earth flat?• Requires a simple yes/no answer.

Question Types (2)

9

• The biggest independent evaluations of question answering systems have been carried out at TREC (Text Retrieval Conference)

– Five hundred factoid questions are provided and the groups taking part have a week in which to process the questions and return one answer per question.

– No changes are allowed to your system between the time you receive the questions and the time you submit the answers.

Evaluating QA Systems

10

A Generic QA Framework

DocumentCollection

Search EngineTop n

documents

DocumentProcessing

Questions Questions

Answers

• A search engine is used to find the n most relevant documents in the document collection

• These documents are then processed with respect to the question to produce a set of answers which are passed back to the user

• Most of the differences between question answering systems are centred around the document processing stage

11

• The answers to the majority of factoid questions are easily recognised named entities, such as countries, cities, dates, peoples names, etc

• The relatively simple techniques of gazetteer lists and named entity recognisers allow us to locate these entities within the relevant documents – the most frequent of which can be returned as the answer

• This leaves just one issue that needs solving – how do we know, for a specific question, what the type of the answer should be

A Simplified Approach

12

• The simplest way to determine the expected type of an answer is to look at the words which make up the question:

• who – suggests a person• when – suggests a date• where – suggests a location

A Simplified Approach (1)

13

• Clearly this division does not account for every question but it is easy to add more complex rules:

• country – suggests a location

• how much – suggests an amount of money

• author – suggests a person

• birthday – suggests a date

• college – suggests an organization

• These rules can be easily extended as we think of more questions to ask

A Simplified Approach (2)

14

• The most frequently occurring instance of the right type might not be the correct answer.

– For example if you are asking when someone was born, it maybe that their death was more notable and hence will appear more often (e.g. John F Kennedy’s assassination).

• There are many questions for which correct answers are not named entities:

– How did Ayrton Senna die? – in a car crash

Problems (1)

15

• The gazetteer lists and named entity recognisers are unlikely to cover every type of named entity that may be asked about:

– Even those types that are covered may well not be complete.

– It is of course relatively easy to build new lists, e.g. Birthstones.

Problems (2)

16

• Amber• Precious• Diamond• Asia• Summer• Holly

• Are these person’s names?

Does a gazetteer of people names contains all the names?

17

• A sequence of utterances• Exchange of information among multiple

dialogue participants• Stays coherent over the time• Driven by certain goal

– finding the most suitable restaurant in a foreign city,

– booking the cheapest flight to a given city,

– controlling the state of the devices in a home,

– or the goal might also be the interaction itself (chatting)

Dialogue (1)

18

• Most natural means for communication for humans perceived as a very expressive, efficient and robust

• However, dialogue is very complex protocol– follow certain conventions or protocols that are adopted by

participants

– humans usually use their extensive knowledge and reasoning capabilities to understand the conversational partner

– the dialogue utterances are often imperfect – ungrammatical or elliptical

Dialogue (2)

19

• People often utter partial phrases to avoid repetition– A: At what time is “Titanic” playing?– B: 8pm– A: And “The 5th element”?

• It is necessary to keep track of the conversation to complete such phrases

Ellipsis

20

• Some words can only be interpreted in context:– Previous context (anaphora)

• “The monkey took the banana and ate it”

– Future context (cataphora)• “Give me that. The book by the lamp.”

– Temporal/spatial• “The man behind me will be dead

tomorrow.”• (Who is the man? When he died/dies?)

Deixis

21

• The meaning of a discourse may be far from literal.– B: I can’t reach him.– A: There is the telephone.– B: I am not in my office.– A: Okay.

• Undertones & implications are often employed for effect or efficiency

Indirect Meaning

22

• People seem to know very well when they can take their turn– There is little overlap (5%)– Gaps are often a few 1/10ths of a second– Appears fluid, but not obvious why

• A computational model of overlap does not exists– causes problem for dialogue systems

Turn Taking

23

• Phrases like “a-ha”, “yes”, “hmm” or “eh” are often prompted in order to fill the pauses of the conversation, to indicate the attention or reflection

• The challenge here is to recognize when they should be understood as a request for turn taking and when they should be ignored

Conversational fillers

24

• Flight and train timetable information and reservation

• Smart homes

• Automated directory enquires– Yellow pages enquires– Weather information

Most common dialogue domain

25

Components of a Dialogue System

26

• Transforms speech to text

• Two basic types– Grammar-based ASR

• The set of accepted phrases defined by regular/context-free grammars (i.e. language model in the form of a grammar)

• Usually speaker independent

– Dictation machine• Recognizes “any utterance”• N-gram language model• Often speaker dependent

Automatic Speech Recognition

27

• Analyzes textual utterance and returns its formal semantic representation– Logical formula– Named entities– etc

Natural Language Understanding

28

• Coordinates activity of all components

• Maintains representation of the current state of the dialogue

• Communicates with external applications

• Decides about the next dialogue step

Dialogue Manager

29

• Finite-state– dialogue flow determined by a finite state automata

• Frame-based– form filling

• Plan (task) based– a dynamic plan is constructed to reach the dialogue goal

• … in practice, you often find an extended versions or combinations of above mentioned approaches!

Three types of DM

30

Finite State Automata

31

Frame Based

32

• Take a problem solving approach– There are goals to be reached– Plans are made to reach those goals– The goals and plans of the other participants must be

iteratively inferred or predicted

• Potential for handling complicated dialogues– suffers from today’s technological limitation– in more complex cases the planning problem can become

computationally intractable

• Examples: Bathroom consultant

Plan Based

33

• Produces a textual utterance (so called surface realization) from an internal (formal) representation of the answer

• The surface realization can include formatting information– Speaking style, pauses– Background sounds

Natural Language Generation

34

• Transforms the surface realization into a an acoustic representation (sound signal)

Text-To-Speech

35

• Commercial systems:– small vocabulary (~100 words)– closed domain– system initiative

• Research systems:– larger (but still small) vocabulary (~10000

words)– closed domain– (limited) mixed initiative

Typical parameters

36

• System-initiative– system always has control, user only responds to

system questions

• User-initiative: – user always has control, system passively answers

user questions

• Mixed-initiative: – control switches between system and user using

fixed rules

• Variable-initiative: – control switches between system and user

dynamically based on participant roles, dialogue history, etc.

Different Initiatives

37

• Several possible input/output modalities to communicate with dialogue systems

– speech, text, pointing, graphics, gestures, face configurations, body positions, emotions, etc.

• Not single “most convenient” modality (different modalities have different advantages)

– entering day of week: click on a calendar– entering Zip code: use keyboard– performing commands: speech– complex query: express them as typed natural language

• Several modalities useful– when one modality is not applicable - e.g. eyes or hands are busy,

silent environment– or when difficult to use - e.g. small devices with limited keyboard and

small screen

Multi Modal Dialogue Systems

38

• Comic

• Companions

Case Study

39

The Comic Avatar

40

Wizard of Oz

41

Putting it together

42

The Companions Architecture

43

The Companions Robot

44

The Companions Interface 1

45

The Companions Interface 2

46

Questions?

1 dr alexiei dingli introduction to web science reusing knowledge

Documents

question types

reuse slide

specific question

knowledge life cycle

simplified approach

short answer

simple yesno answer

different types of questions