conference september 2013
DESCRIPTION
Conference September 2013. Text analysis software needs more common sense and less intelligence! John S. Lemon, University of Aberdeen. Open Day 2013. IT Services . John S. Lemon. S tudent Liaison Officer. Introduction. History – setting the scene - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/1.jpg)
Conference September 2013
Text analysis software needs more common sense and less
intelligence!John S. Lemon, University of
Aberdeen1
![Page 2: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/2.jpg)
Open Day 2013IT Services
Student Liaison OfficerJohn S. Lemon
![Page 3: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/3.jpg)
Introduction
• History – setting the scene • Problem – move from quantitative to
qualitative• Etc.
![Page 4: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/4.jpg)
Introduction
• History – setting the scene• Problem – move from quantitative to qualitative• How - Analysis / reporting• Quantity – increases each year• Constraints
– Reports required earlier each year– Very limited budget
4
![Page 5: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/5.jpg)
Disclaimer
• I am not a statistician – I just have to present reports
• When I started at university in 1975 almost all data was numeric / quantitative
• For the purposes of this paper I emulated a naive user
• To carry out the analysis there is no budget for:– Software– Training 5
![Page 6: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/6.jpg)
History
• IT Services ( formerly DISS & DIT ) runs an annual survey to:– Staff– Students
• Purpose is to identify satisfaction with facilities and service
• Originally on paper and scanned – almost entirely tick boxes
• Moved to web but retained ‘tick box’ format 6
![Page 7: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/7.jpg)
History
7
• Converted to WebHost around 2008/9• Still retained the mainly quantitative original
![Page 8: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/8.jpg)
History
• SNAP had been used to create Student Course Evaluation Forms ( SCEF )
• On paper since 1999 – two sides of Likert scales• Only one free text box• 60,000 forms scanned / year• In 2010 deemed to be ‘not green’ / ecological• Move to special web based software• Move to free text comments
8
![Page 9: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/9.jpg)
History
• This is the 2007 paper form• As SCEF forms had changed
approach it was decided the annual survey would do the same
• Fewer tick boxes
9
![Page 10: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/10.jpg)
History
10
• From 2011 some check boxes but more free text options.
![Page 11: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/11.jpg)
Problem - quantitative to qualitative
• Report generation could no longer rely on – charts – tables.
• No thought given to how to cope with free text• First year one person (me)
– ‘skimmed’ the responses– Subdivided according to which area of service was
commented on– Passed to section heads for action and responses
11
![Page 12: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/12.jpg)
Problem - quantitative to qualitative
• Second year – manual coding• Excel file of case number and free text comments• Plus extra columns for coding comments /
categorisation• Code values were “Positive”, “Negative” or
“Ambiguous”• Limited number of categories• Needed consistency so one person coded all
12
![Page 13: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/13.jpg)
Problem - quantitative to qualitative
• Once coded loaded into SPSS• Merged with original file• Produced tables and charts combining
demographic data and coded values• Extremely labour intensive• Needed an iterative approach for accuracy
– Categories were too broad or too detailed– Codes were too restrictive
13
![Page 14: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/14.jpg)
Problem - quantitative to qualitative
• This year attempted a new approach• Use software • New / updated versions of:
– SNAP (11)– Nvivo (10)– STAFS - SPSS Text analysis For Surveys (4)
• Also consider use of concordance software
14
![Page 15: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/15.jpg)
Problem - quantitative to qualitative
• Why choose these four products ?– SNAP
• Already had so no extra cost• Had SNAP format files so no translating / transforming
the data– NVivo
• Like SNAP already had on site• Claims that it would meet all requirements• Takes data from many sources
15
![Page 16: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/16.jpg)
Problem - quantitative to qualitative
• Why choose these four products ?– SPSS Text Analysis For Surveys
• Reads SPSS files which SNAP would create• Export coded categories back to SPSS• Being considered for site licence
– Concordance• Language / literature department recommendation• Cheap• Appeared easy to use.
16
![Page 17: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/17.jpg)
SNAP
• Survey had been done in SNAP so tried first• New features are:
– word ‘cloud’– Auto coding of text / words
• Can combine all the free text questions into one new ‘derived’ / auto-recoded variable
17
![Page 18: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/18.jpg)
SNAP
• Not very helpful• Is there a
difference between ‘computer’ and ‘computers’ ?
18
Word cloud of Free text comments
workgood
computers
computer
time
service
staff
eduroam
MyAberdeen
internet
Services
access
libraryuniversity
helpful
help
problems
slow
Campus
students
![Page 19: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/19.jpg)
SNAP
• Not only ‘computer(s)’ presented problems• But all the different terms students use for the
wireless network.• These are the more obvious
spellings – ignoring the miss-spellings.
• Not ideal as did not allow for synonyms
19
![Page 20: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/20.jpg)
SNAP - limitations
• Has a ‘Stop’ list – words to exclude• No equivalent list to create synonyms• Would like to be able to do:{wifi,wi-fi,eduroam,resnet,wireless}={wireless}
• Not just a limitation of SNAP word cloud• In the time available could not find how to export
auto-coded variables to SPSS
20
![Page 21: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/21.jpg)
Concordance
• Cheaper but very limited• No ability to easily export the results• Positive point is it shows need for synonyms !!
21
![Page 22: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/22.jpg)
NVivo
• Very powerful • Accepts data from a wide variety of sources:
– Text– Video– Pictures– Web– Social media– Etc.
22
![Page 23: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/23.jpg)
NVivo
• Data needed some pre-preparation before input• Some of the concepts weren’t obvious• Took a number of attempts to get the data into
the correct format• It will combine terms
– But may not be exactly what you want– Some of the words for ‘connect’ are quite imaginative
to say the least.
23
![Page 24: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/24.jpg)
NVivo
24
![Page 25: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/25.jpg)
NVivo
• Depending on how ‘tight’ or ‘loose’ the word associations were made could end up with entirely different results / word clouds
25
![Page 26: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/26.jpg)
NVivo
• Found difficulty in:– Trying to get the data categorised– Exporting the results to merge back to SPSS– Alternatively try and produce tables and charts linked
to demographic data within NVivo• Problems with all the different software were:
– Time to learn all idiosyncrasies– Impatient line managers– Nomenclature
26
![Page 27: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/27.jpg)
STAFS
• Appears to be very powerful and comprehensive• Very large manual• Like Nvivio has different nomenclature for the
aspects of analysis• Will read data from SPSS files
– Providing the text fields are less than 4000 characters in length
• Looked the most promising to solve the problem27
![Page 28: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/28.jpg)
STAFS
• Foolishly left it until last for evaluation• Very little time left to get to grips with yet another
set of concepts• The deadline for the report was approaching so
not a lot of time• Also trial version which lasted 14 days• Appears to have a bit more intelligence in
matching words together 28
![Page 29: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/29.jpg)
STAFS
29
![Page 30: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/30.jpg)
STAFS
• Has the ability to indicate “good” and “bad” phrases in green, and red
• It also highlights the context in amber
30
![Page 31: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/31.jpg)
STAFS
• Problem is that the file that ‘drives’ this appears to be rather general in approach
• To really be useful in future it needs tailoring• Ran out of time to really develop expertise in this• Potential to apply a level of ‘common sense’• Not easy to actually do in the time available.• Export back to merge with SPSS appeared OK• But had to abandon any further experiments
31
![Page 32: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/32.jpg)
What was used finally
• Time for testing / experimentation had run out • Only one course of action
– By hand– One person – me
• Scale of problem– When loaded into Word as single spaced, normal
margins, 12 pt Calibri– Just under 500 pages
• A ream of paper 32
![Page 33: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/33.jpg)
Next year
• Try and get a longer trial period for STAFS• Experiment with this years data to provide coding
file• Use STAFS from the start
33
![Page 34: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/34.jpg)
Conclusion
• Don’t try and learn a lot of new software when there are deadlines from “management”
• Word clouds don’t help much• A concordance really only highlights speeling
idiosyncrasies• Care must be taken when allowing software to
make choices in coding
34
![Page 35: Conference September 2013](https://reader035.vdocuments.site/reader035/viewer/2022062315/5681661e550346895dd97097/html5/thumbnails/35.jpg)
Conclusion
• Does text analysis software have intelligence ?• Up to a point• Does it have common sense• Of the four tried only one does
BUT• It needs teaching “common sense” and that
takes time• Just like a child !!
35