from%keystrokes%to%annotated%process% · pdf...

1
From keystrokes to annotated process data: enriching the output of Inputlog with linguis9c informa9on Lieve Macken & Veronique Hoste (LT 3 ) Mariëlle Leijten (FWO/UA) & Luuk van Waes (UA) Introduc9on Keystroke logging is an unobtrusive way to monitor wriMen language produc9on. The method is applied to study a wide range of topics from a cogni9ve, strategic or developmental perspec9ve a.o. professional wri9ng in educa9onal sePngs, second language wri9ng, spelling errors, revision strategies and transla9on processes. Inputlog is wordprocessor independent keystrokelogging program that not only registers keystrokes, mouse movements, clicks and pauses in MS word, but also in any other Windowsbased applica9on. Moreover, also speech input via Dragon Naturally Speaking (Nuance) can be logged. Inputlog is freely available for research purposes via www.inputlog.net Keystroke logging tools are widely used to monitor written language production. Basically, they record all keystrokes, including backspaces and deletions, together with timing information. To open the way for more linguistically-oriented writing process research, we enhanced the keystroke- logging program Inputlog by aggregating the logged process data from the keystroke (character) level to the word level. The logged process data are further enriched with different kinds of linguistic information: part-of-speech tags, lemmata, chunk boundaries, syllable boundaries and word frequency. Snota9on The Snota9on is the de facto standard for represen9ng the nonlinearity of the wri9ng process. It displays all revisions at their posi9ons in the text and can be automa9cally generated from the keystrokelogging data. Conven9ons: | i A break in the wri9ng process with sequen9al number i {inser9on} i An inser9on occurring aZer break i [dele9on] i A dele9on occurring aZer break i Linguis9c annota9ons Final product data and dele9ons are further enriched with linguis9c annota9ons. To op9mize the results of the shallow linguis9c analysis, the dele9ons in context are analyzed. LT 3 tools suite: PoS tagger/lemma9zer/chunker Syllabifica9on Word frequency informa9on derived from Wikipedia The resul9ng output is presented in a table format and will be rendered in XML format. Focus on English and Dutch as proofofconcept Postedi9ng example Use of Inputlog in a transla9on context To study postedi9ng of Machine Transla9ons Temporal informa9on can shed light on the MT passages that were difficult to process Logging of all windowsbased events to study external sources that were consulted A dedicated parser has been developed that dis9ls from the logged process data wordlevel revisions, deleted fragments and final product data. The parser can handle nested revisions. Within word typing errors are excluded from further analyses. Acknowledgements This research was funded by Flanders Research Founda9on (FWO)

Upload: vunhi

Post on 08-Mar-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: From%keystrokes%to%annotated%process% · PDF filein%educaonal%sengs,%second%language%wri9ng,%spelling%errors,%revision% ... they record all ... part-of-speech tags, lemmata, chunk

From  keystrokes  to  annotated  process  data:  enriching  the  output  of  Inputlog  with  linguis9c  informa9on  Lieve  Macken  &  Veronique  Hoste  (LT3)    Mariëlle  Leijten  (FWO/UA)  &  Luuk  van  Waes  (UA)  

Introduc9on    Keystroke  logging  is  an  unobtrusive  way  to  monitor  wriMen  language  produc9on.  The  method  is  applied  to  study  a  wide  range  of  topics  from  a  cogni9ve,  strategic  or  developmental  perspec9ve  a.o.  professional  wri9ng  in  educa9onal  sePngs,  second  language  wri9ng,  spelling  errors,  revision  strategies  and  transla9on  processes.      Inputlog  is  word-­‐processor  independent  keystroke-­‐logging  program  that  not  only  registers  keystrokes,  mouse  movements,  clicks  and  pauses  in  MS  word,  but  also  in  any  other  Windows-­‐based  applica9on.  Moreover,  also  speech  input  via  Dragon  Naturally  Speaking  (Nuance)  can  be  logged.  Inputlog  is  freely  available  for  research  purposes  via  www.inputlog.net  

Keystroke logging tools are widely used to monitor written language production. Basically, they record all keystrokes, including backspaces and deletions, together with timing information. To open the way for more linguistically-oriented writing process research, we enhanced the keystroke-logging program Inputlog by aggregating the logged process data from the keystroke (character) level to the word level. The logged process data are further enriched with different kinds of linguistic information: part-of-speech tags, lemmata, chunk boundaries, syllable boundaries and word frequency.

S-­‐nota9on    The  S-­‐nota9on  is  the  de  facto  standard  for  represen9ng  the  non-­‐linearity  of  the  wri9ng  process.  It    displays  all  revisions  at  their  posi9ons  in  the  text  and  can  be  automa9cally  generated  from  the  keystroke-­‐logging  data.    Conven9ons:            |i                                                A  break  in  the  wri9ng  process  with    sequen9al  number  i            {inser9on}i                      An  inser9on  occurring  aZer  break  i            [dele9on]  i                        A  dele9on  occurring  aZer  break  i            

Linguis9c  annota9ons    Final  product  data  and  dele9ons  are  further  enriched  with  linguis9c  annota9ons.  To  op9mize  the  results  of  the  shallow  linguis9c  analysis,  the  dele9ons  in  context  are  analyzed.    LT3  tools  suite:  •  PoS  tagger/lemma9zer/chunker  •  Syllabifica9on  •  Word  frequency  informa9on  derived  from  Wikipedia  

The  resul9ng  output  is  presented  in  a  table  format  and  will  be  rendered  in  XML  format.    Focus  on  English  and  Dutch  as  proof-­‐of-­‐concept  

Post-­‐edi9ng  example    Use  of  Inputlog  in  a  transla9on  context  •  To  study  post-­‐edi9ng  of  Machine  Transla9ons  •  Temporal  informa9on  can  shed  light  on  the  MT  passages  that  were  

difficult  to  process  •  Logging  of  all  windows-­‐based  events  to  study  external  sources  that  

were  consulted  A  dedicated  parser  has  been  developed  that  dis9ls  from  the  logged  process  data  word-­‐level  revisions,  deleted  fragments  and  final  product  data.  The  parser  can  handle  nested  revisions.    

Within  word  typing  errors  are  excluded  from  further  analyses.    

Acknowledgements    

This  research  was  funded  by  Flanders  Research  Founda9on  (FWO)