from%keystrokes%to%annotated%process% · pdf...
TRANSCRIPT
From keystrokes to annotated process data: enriching the output of Inputlog with linguis9c informa9on Lieve Macken & Veronique Hoste (LT3) Mariëlle Leijten (FWO/UA) & Luuk van Waes (UA)
Introduc9on Keystroke logging is an unobtrusive way to monitor wriMen language produc9on. The method is applied to study a wide range of topics from a cogni9ve, strategic or developmental perspec9ve a.o. professional wri9ng in educa9onal sePngs, second language wri9ng, spelling errors, revision strategies and transla9on processes. Inputlog is word-‐processor independent keystroke-‐logging program that not only registers keystrokes, mouse movements, clicks and pauses in MS word, but also in any other Windows-‐based applica9on. Moreover, also speech input via Dragon Naturally Speaking (Nuance) can be logged. Inputlog is freely available for research purposes via www.inputlog.net
Keystroke logging tools are widely used to monitor written language production. Basically, they record all keystrokes, including backspaces and deletions, together with timing information. To open the way for more linguistically-oriented writing process research, we enhanced the keystroke-logging program Inputlog by aggregating the logged process data from the keystroke (character) level to the word level. The logged process data are further enriched with different kinds of linguistic information: part-of-speech tags, lemmata, chunk boundaries, syllable boundaries and word frequency.
S-‐nota9on The S-‐nota9on is the de facto standard for represen9ng the non-‐linearity of the wri9ng process. It displays all revisions at their posi9ons in the text and can be automa9cally generated from the keystroke-‐logging data. Conven9ons: |i A break in the wri9ng process with sequen9al number i {inser9on}i An inser9on occurring aZer break i [dele9on] i A dele9on occurring aZer break i
Linguis9c annota9ons Final product data and dele9ons are further enriched with linguis9c annota9ons. To op9mize the results of the shallow linguis9c analysis, the dele9ons in context are analyzed. LT3 tools suite: • PoS tagger/lemma9zer/chunker • Syllabifica9on • Word frequency informa9on derived from Wikipedia
The resul9ng output is presented in a table format and will be rendered in XML format. Focus on English and Dutch as proof-‐of-‐concept
Post-‐edi9ng example Use of Inputlog in a transla9on context • To study post-‐edi9ng of Machine Transla9ons • Temporal informa9on can shed light on the MT passages that were
difficult to process • Logging of all windows-‐based events to study external sources that
were consulted A dedicated parser has been developed that dis9ls from the logged process data word-‐level revisions, deleted fragments and final product data. The parser can handle nested revisions.
Within word typing errors are excluded from further analyses.
Acknowledgements
This research was funded by Flanders Research Founda9on (FWO)