extraction of socio-semantic data from chat conversations in collaborative learning communities
TRANSCRIPT
Maastricht - September, 18th, 2008
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Learning Communities
Traian Rebedea1, Stefan Trausan-Matu1,2, Costin Chiru1
1 “Politehnica” University of Bucharest, Department of Computer Science and Engineering
2 Research Institute for Artificial Intelligence of the Romanian Academy
{traian.rebedea, stefan.trausan, costin.chiru} @ cs.pub.ro
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Overview1. Introduction2. Theoretical background3. Implementation
Detecting conversation’s topics Assessing learners’ competencies Discovering implicit voices Conversation graph
4. Conclusions
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Context Computer-assisted learning
Developing tools to support the learning process Evaluation of these tools (and of the learning process) Determining the learners’ performances
Computer Supported Collaborative Learning – CSCL Main idea: “rather than speaking about ‘acquisition of
knowledge,’ many people prefer to view learning as becoming a participant in a certain discourse” (Sfard, 2000)
Focus on studying interactions between the participants in chat conversations in small groups
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Objectives Automatic extraction of useful social and semantic
information from conversations Determining relationships between utterances Utterances that have influenced the further development
of the conversation The performance / competency of each participant
Designing an interface for the visualisation of a conversation
Applied both to chats, discussion forums, etc and face-2-face discussions
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Experiments Languages: English (advantages: existing NLP tools) and
Romanian Computer Science – HCI, NLP and Algorithm Design
courses in “Politehnica” University of Bucharest Small groups of 4-5 students – all of the students must be
graded (over 100 students / course) The conversations have well-determined subjects
Collaborative, team work Competitive
Also used chat transcripts from Virtual Math Teams, Drexel University, Philadelphia, US
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Overview1. Introduction2. Theoretical background3. Implementation
Detecting conversation’s topics Assessing learners’ competencies Discovering implicit voices Conversation graph
4. Conclusions
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Socio-cultural Paradigm the role of socially established artefacts in
communication and learning (Vygotsky) Bakhtin focuses on the role of language and
discourse, and especially of speech and dialog: “… Any true understanding is dialogic in nature.”
Lotman considers text as a „thinking device”
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Bakhtin’s Dialogism Bakhtin’s ideas
Dialogism Polyphony Inter-animation of voices
Bakhtin: “The specific totality of ideas, thoughts and words is everywhere passed through several unmerged voices, taking on a different sound in each” – referring to Dostoevsky’s novels
Dual nature of voices: community and individuality
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Voices in Chats Utterances should be the units of analysis An utterance contains at least one voice – the
one of the participant that issued it Most of the utterances contain multiple voices The inter-animation of the voices – discussion
threads of the conversation
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Discussion Threads
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Overview1. Introduction2. Theoretical background3. Implementation
Detecting conversation’s topics Assessing learners’ competencies Discovering implicit voices Conversation graph
4. Conclusions
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Foreword Transcript chats are read from HTML or
XML files ConcertChat environment (Fraunhofer)
Advantages for collaborative work Enables the use of explicit references to previous
utterances or a whiteboard Implementation in C#.NET
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Techniques Tokenization Stop-words, emoticons and usual abbreviations
( :) , :D , brb, thx, …) are eliminated WordNet for identifying synonyms Misspells are searched using the Google API The ontology can be with words discovered in the
chat, specific to the conversation’s domain Pattern analysis
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Detecting the Topics Each word in the chat becomes a candidate concept
Synset list Frequency
Clustering algorithm for the concepts’ unification If the synsets of two concepts have a common word
The two synset lists are merged The frequency of the resulting concept = sum of the
frequencies of the unified concepts The resulting concepts – the main topics of the
conversation
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Detecting the Topics (2)
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
1. Introduction2. Theoretical background3. Implementation
Detecting conversation’s topics Assessing the learners’ competencies Discovering implicit voices Conversation graph
4. Conclusions
Overview
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Assessing the Competencies Graphics – evaluates the competency of each participant
starting from the chat topics (concepts represented as synsets) Uses other criteria like the nature of the utterances: questions,
agreements, references, etc. are treated different Parameters:
Factors for references Bonuses for agreements, penalties for disagreements O minimum value that is awarded to any line in the chat Penalties for (dis-)agreement, as they present less originality
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
The value of each utterance is computed by reporting it to an abstract utterance
Abstract utterance – built from the most important concepts identified in the chat; we only consider the concepts that have a frequency greater than a given threshold
Every utterance in the chat is scaled in the interval 0 – 100, by comparison to the abstract utterance
Synsets are used for every word An utterance with 0 score does not contain any concept from
the abstract one, and an utterance with an 100 score contains all the concepts from the abstract one
Value of an Utterance
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Computing the Competencies At the start of the conversation, each participant has a null
competency. For each utterance in the chat, the value of the competencies are
modified accordingly: The participant that issued the current utterance receives the its score,
eventually downgraded, if it is an (dis-)agreement; All the participants that are literally present in the current utterance are
rewarded with a percentage of its value; The participant that issued the utterance referred by the current one is
rewarded for an agreement and penalized for a disagreement, with a constant value;
The participant that issued the utterance referred by the current one and is not a (dis-)agreement is rewarded with a fraction of the value of this utterance;
if the current utterance has a score of 0, the issuer will receive a minimum score (for participation).
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Competencies’ Graphics Oy axis – Value of competency Ox axis – The number of the utterance
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Overview1. Introduction2. Theoretical background3. Implementation
Detecting conversation’s topics Assessing the learners’ competencies Discovering implicit voices Conversation graph
4. Conclusions
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Discovering Implicit Voices We have explicit references We want to discover more references Why ? Haste and lack of attention The method
List of patterns that consist of a set of words (expressions) and a local subject called the referred word
If an utterance matches one of the patterns, we determine what word in the utterance is the referred word (e.g. “I don’t agree with your assessment”)
we search for this word in a predetermined number of the most recent previous utterances
If we can find this word in one of these utterances, then we have discovered an implicit relationship between the two utterances, the current one referring to the identified one
During the identification process, the synsets of the words are used
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Discovering Implicit Voices (2) There are a number of empirical methods Examples
Short agreement / disagreement, then B refers AA – I think wikis are the bestB – I disagree
REF A, REF B – explicit and B – short (dis)agreement, then C implicitly refers A (transitivity)
A – I think wikis are the best (…)B – I disagree REF A(…)C – Maybe we should talk about them anyway REF B
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Overview1. Introduction2. Theoretical background3. Implementation
Detecting conversation’s topics Assessing the learners’ competencies Discovering implicit voices Conversation graph
4. Conclusions
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Conversation is a graph Vertices = utterances Edges = references between utterances
The graph is directed and acyclic – can be topologically sorted
Using the graph: Segmentation of the chat in discussion threads Determining the strength of an utterance Graphical representation of the conversation
Conversation Graph
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Utterances’ Strength The importance of an utterance in a conversation can be computed
using: Length The importance of the words
Another approach: an utterance is important if it influences the further evolution of the conversation
An important utterance – referenced by many further utterances Thus, the importance can be considered as a measure of the
strength of the utterance The utterance is strong if it influences the rest of the conversation
(like a breaking news at TV) Computed recurrently: Utterance strength = 1 + param1 * number references + param2 *
sum of the references’ strength
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Visual Representation
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Conclusions Social-semantic data extracted from conversations:
Discovery and visualisation of the discourse Determining important utterances Assessing the competencies Searching for references between utterances
Successfully integrated ideas and techniques from: Socio-cultural and dialogic paradigm Classical cognitive paradigm – ontologies and
knowledge-based processing Natural language processing
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Conclusions (2) Machine learning for the automatic discovery of the
rules that define implicit references A chat annotation tool has been built Started creating a annotated chat corpus to be used as a
golden standard Improving the method used to compute the
competences – integrating SNA techniques Use domain ontologies and/or pLSA Current and further work is part of LTfLL FP7
project
September, 18th, 2008Extraction of Socio-Semantic Data from Chat Conversations in
Collaborative Learning Communities Maastricht –
Thank You!