best of user! 2011
DESCRIPTION
Best of UseR! 2011. A personal & biased view with an emphasis on data visualisation Andy Pryke [email protected] Birmingham R User Meeting 20 th March 2012. My Bias…. I work in commercial data mining , data analysis and data visualisation - PowerPoint PPT PresentationTRANSCRIPT
www.the-data-mine.co.uk
Best of UseR! 2011
A personal & biased view with an emphasis on data visualisation
Andy Pryke
Birmingham R User Meeting20th March 2012
www.the-data-mine.co.uk
My Bias…
I work in commercial data mining, data analysis and data visualisation
Background in computing and artificial intelligence
Use R to write programs which analyse data
www.the-data-mine.co.uk
Using Google Visualisation API from R
Speaker: Markus Gesmann, LloydsMotivation: Display statistics about publications on a website
•18 different charts are available through Google API•Requires internet access & viewed through web browser•Data is embedded in HTML, with call to google's javascript visualisation API•Using RAPACHE you can mix HTML & R (bit like Sweave)•Can update data & look of chart from R by modifying the object returned by the plotting method
www.the-data-mine.co.uk
Google Visualisation API - Code
install.packages("googleVis")library("googleVis")demo("googleVis")demo(package="googleVis")
# Example from demo:require(datasets)states <- data.frame(state.name, state.x77)GeoStates <- gvisGeoChart(states, "state.name", "Illiteracy", options=list(region="US",displayMode="regions", resolution="provinces",width=600, height=400))plot(GeoStates)
www.the-data-mine.co.uk
Google Visualisation API – More info
Use at Lloyds: http://lloyds.com/stats
Video demo: http://goo.gl/zfQdG
www.the-data-mine.co.uk
Google Visualisation API - Examples
www.the-data-mine.co.uk
Google Visualisation API - Examples
www.the-data-mine.co.uk
Google Visualisation API - Examples
www.the-data-mine.co.uk
Google Visualisation API - Examples
www.the-data-mine.co.uk
Google Visualisation API - Examples
www.the-data-mine.co.uk
Google Visualisation API - Examples
www.the-data-mine.co.uk
More Information…
In use on Lloyds website: http://lloyds.com/stats
Original Slides: http://web.warwick.ac.uk/statsdept/user-2011/TalkSlides/Contributed/16Aug_0950_Kaleid_Ib_2-Gesmann.pdf - Includes good list of other interesting packages
www.the-data-mine.co.uk
Nomograms for visualising relationshipsbetween three variables
Jonathan Rougier - Dept Mathematics, Univ. Bristol
Kate Milner - Crossroads Veterinary Centre,Buckinghamshire
www.the-data-mine.co.uk
How to Use R, in a Morocan Marketplace, to Improve the Life of Donkeys
It's hard to weigh donkeys in North Africa, but useful to know their weight when prescribing drugs.
1) Measure the weight, height,girth, body condition, age and gender of donkeys.
2) Use R to create a predictive model of weight3) Create a nonographic model which can be used by
vets on the ground
www.the-data-mine.co.uk
How Heavy is that Donkey?
Initial Model – Complex !
sqrt(Weight) ~ BCSis + Gender + Age + log(HeartGirth) + log(Height) + log(HeartGirth):log(Height) + BCSis:log(HeartGirth) + Gender:log(HeartGirth) + Age:log(HeartGirth) + BCSis:log(Height) + Gender:log(Height) + Age:log(Height)
www.the-data-mine.co.uk
How Heavy is that Donkey?
Use stepAIC in the MASS package to simplify the model…
Final Model:
sqrt(Weight) ~ BCSis + Age + log(HeartGirth) + log(Height)
Still hard to use in a dust marketplace though…
www.the-data-mine.co.uk
Solution - Nomograms
“Graphical representation of formula allowing calculations to be made using paper and a ruler”
Published in books & on charts to make complex calculations possible before calculators & computers
Ron Doerfler, 2009, The Lost Art of Nomography, The UMAP Journal, 30(4), pp. 457-493.
http://myreckonings.com/wordpress/wp-content/uploads/JournalArticle/The Lost Art of Nomography.pdf
www.the-data-mine.co.uk
www.the-data-mine.co.uk
www.the-data-mine.co.uk
More information…
Jonty’s Home page with links to slides & code from: http://www.maths.bris.ac.uk/~MAZJCR/#pres
Presentation Slides: http://www.maths.bris.ac.uk/~MAZJCR/jontyUseR.pdf
Package Design also has a nomogram function() – Not in Cran any more but old versions available.
www.the-data-mine.co.uk
Easy interactive ggplots
Speaker: Richie CottonClever use of packages ggplots and gWidgetstcltk
together, allowing clear and simple code for interactive control of charts
Example data: Chromium exposure of welders. Took air concentations & urine samples (pre/post exposure)
www.the-data-mine.co.uk
More Information…
Links at: http://www.bitly.com/jV1NBnCode linked directly from
http://4dpiecharts.com/2011/08/17/user2011-easy-interactive-ggplots-talk/
See also: package gWidgets - wraps 5 UI toolkits
www.the-data-mine.co.uk
Predicting Personality fromSocial Network Data
Speaker: Daniel Chapsky, Hampshire CollegeThis was quite a fast talk, but one of my favourite pieces of work, so
apologies if I've mis-interpreted anything!
Big 5 theory of personality is that 5 dimensions can predict attitude, views, behaviour
This work attempts to build a model which predicts someone's "big 5" values from Online Social Network (OSN) data
www.the-data-mine.co.uk
Predicting Personality - Data
• 615 respondents• 100 question open source personality test, "IPIP NEO"• Data last.fm, netflicks, etc – e.g. genres listened to• Distance from home town to current residence
- liberallity correlates with amount of moving around• Mean income, Education level• Race inferred from surname• Data was continuous• Missing data was inferred using gibbs sampling
www.the-data-mine.co.uk
Predicting Personality – Model
Continuous bayesian networks - discrete needs more data - Often weaker prediction than black box + Clear semantics + Works with limited evidence + Hybrid network
www.the-data-mine.co.uk
Predicting Personality – Packages
Database connectivity - RMySQLWeb scraping / API connection - RCurl, RJSONIO, XML Inference through mashups - psych, geosphereData Cleaning - plyr, reshape2, bayestree, mice, tm, mvoutlier Bayesian Network construction - bnlearn, pcalg Parallelization of optimization - foreach, snow Graphics - Latticist, bnlearn, ggplot2
www.the-data-mine.co.uk
Agreeableness = 42.4
- 1.26(Sex.Missing)
- 2.47(Sex.Male)
- 25.99(Home.Teen.Prop)
- 0.63(Movie.Dystopia-Political)
- 0.49(Movie.Action-thriller)
+ 6.51(Wall.Status.Ratio)
+ 0.08(Conscientiousness)
- 0.29(Neuroticism)
R2 = 0.46
www.the-data-mine.co.uk
More Information
Original Slides:http://web.warwick.ac.uk/statsdept/user-2011/TalkSlides/Contributed/17Aug_1115_FocusIII_5-DataMining_2-Chapsky.pdf