scienfic and large data visualizaon introduc&on to...
Post on 23-Aug-2020
2 Views
Preview:
TRANSCRIPT
Scien&ficandLargeDataVisualiza&on
Introduc&ontoInforma&onVisualiza&on
MassimilianoCorsiniVisualCompu,ngLab,ISTI-CNR-Italy
Nextlessons–Overview• Introduc&ontoInforma&onVisualiza&on• Mo&va&ons• DataTypes,GraphTypesandVisualPercep&on• Mul&dimensionalData• GraphDrawing• Prac&ce
– Visualiza&onontheWeb– Javascript,WebGL– D3.js
Lesson14-15–IntrotoInfoVis• Introduc&onandmo&va&ons• Ingredientsofeffec&vevisualiza&on• Datatypes• Graphtypes• Visualpercep&on
Informa,onVisualiza,on
• Informa,onvisualiza,onisthestudyof(interac,ve)visualrepresenta,onsofabstractdatatoreinforcehumancogni,on.[Wikipedia]
• Theuseofcomputer-supported,interac,ve,visualrepresenta,onsofabstractdatatoamplifycogni,on.[Cardetal.1999]
• Theuseofcomputergraphicsandinterac,ontoassisthumansinsolvingproblems.[Purchaseetal.2008]
Informa,onVisualiza,on
• Thepurposeofinforma,onvisualiza,onistoamplifycogni,veperformance,notjusttocreateinteres,ngpictures.[Card2007]
• Infographicsisavisualtoolforcommunica,on,fortheunderstandingandfortheanalysis.[AlbertoCairo]
Informa,onVisualiza,on
• DifferencefromScien4ficVisualiza4on:– Informa,onvisualiza,ontreatsalsoabstractdata(numericalandnon-numericaldata).
– Inscien,ficvisualiza,onspa,alrepresenta,onisgiven.
• DifferencefromVisualAnaly4cs:– InVisualAnaly,cstheaccentisonthereasoning/interac,onloop.
InfoGraphics–Charts
From the Wall Street Journal
InfoGraphics–Diagrams
Juan Colombata & Enzo Oliva – La Voz del Interior (Argentina)
Mo,va,ons
Mo,va,ons
Which country is close to its historical maximum ?
Mo,va,ons
Easier to answer… Why ?
Ra,onale
• Thehumanvisualsystem(HVS)isverygoodatiden,fiesandanalyzespaZerns.
• Wecanvisualizedatatomakeeasyforourbraintoanalyzethem,forexampletodocomparisons.
Effec,veness
• Tobeeffec,ve,datavisualiza,onshouldbetakenintoaccountseveralfactors:– Datatype– Goal(func,on)– VisualPercep,onSystem
Func,onalArt
• Func,ondoesnotdictatebutrestrictourchoices.
• Thisispar,cularlytrueforinfographics.
Example–Comparisons
From “The functional art” by Alberto Cairo
ClevelandandMcGill1984Lessaccurate
Lessaccurate
Adapted from “The functional art” by Alberto Cairo
KeyIngredients
• Inthefollowingwefocuson:– Datatypes
• Onedimension,twodimensions,N-dimensions• Quan,ta,ve,Ordinal,Nominal
– Graphtypes– VisualPercep,on
• Wegivesomedesignguidelines,meby,me.
Data
• Informa,onisobtainedfromdata(!)• Structured/Unstructured.• Generatedbysensors,bycomputers,byhumans,etc.
VariableTypes• AccordingtoStevens(1946):– Nominal• Labels(e.g.apples,oranges,bananas)
– Ordinal• Toorderingthings(e.g.ranksofmovies)
– Interval• Intervalscaleofmeasurements(e.g.,meofdeparture-arrival)
– Ra,o• Measuresdefinedonara,oscale(e.g.themassofanobject)
S. S. Stevens, “On the theory of scales and measurements.”, Science, 103, pp. 677-680, 1946.
VariableTypes
• Category– Steven’snominalclass(e.g.countrynames,typeofdisease)
• Ordinal– Labelsexpressingdegree(e.g.cold,hot,veryhot)– Ingeneral,encodedasintegerdata.
• Quan,ta,ve– Intervals,measures,etc.– Ingeneral,real-numbereddata.
DataDimensions
• Commondimensions:– Univariate,bivariate,trivariate– Mul,-variate(N>3dimensions)
• Variablescanbedependentorindependent.• EachcaseisapointinaspacewithNdimension(datapoint).
DataDimensions
• AsetofdatacanberepresentedbyatablewithNcolumns(oneforeachvariable).
Variable1 Variable2 Variable3
Data1
Data2
Data3
Data4
…
DataRela,onships
• Atablecanbeusedtorepresentarela,onshipbetweendifferentdata.
UserId Gameid
Data1 Smith 023923
Data2 James 238548
Data3 Frank 385753
Data4 Powell 357352
…
DataRela,onships
• 1-to-1• 1-to-many• Many-to-many• Rela,onshipsmayalsohaveaZributes
Whatwewant..
• Asetofwellformedandinterconnectedtablesofdata.
Whatwehave..
• Datadoesnotcomeintheformwewouldlike.
• Datamayhaveinconsistencies:– Corrupteddata– Missingdata– Severaldatamaybeequivalent(e.g.textfield“R&D”,“Research”,“ResearchandDevelopment”,“r*d”)
DataProcessingPipeline
• Datadoesnotcomeintheformwewouldlike.
• Typicaldataprocessingpipeline(fromrawdatatocleanstructureddata):1. Collectdata.2. Datasimplifica,on(extractasubsetofinterest).3. Cleanandstructurethem.
• Anoutputofthepipelinecanbealsometadata.
DataCollec,on
• Searchanddownloaddata.• Parsetext.• Convertbetweendifferentformats.– Examples:CVStoJSON,MySQLtoHTML,etc.
• Mergeheterogeneoussources.
DataSimplifica,on
• Filter/Selec,on– Removeunwanteddata– Removeinvaliddata(nullvalues)
• Aggrega,on– Collapseseveraldatapointsintoasingleone– Replacesomevalueswithminimum,maximum,average,total,etc.
DataProcessingPipeline
• Typicallyperformedautoma,callyusingscripts.
• Human-guideddatatransforma,onsispossible(throughmacro-opera,ons)– Useappropriatetools(e.g.OpenRefine-h?p://openrefine.org)
Metadata
• Datawhichdescribesthedata– Roleofvariables– Typeofvariables– Constraints– Dependencies
• Collec,onandprocessingopera,onscanbealsodescribed.
HowtoPresentData?
• Howtopresentdatagraphically?– Toallowvisualanalysis– TohighlightpaZerns– Toanswersomespecificques,ons– Etc.
Quan,ta,veValues
• Wehavejustmen,onedtheworkbyCleveland&McGill(1984)– Posi,on– Length– Angle/slope– Area– Volume– Colorsatura,on/shading
PerceptualAccuracy
From “Data Visualization” Course by John C. Hart, for Coursera, 2015.
QUANTITATIVE ORDINAL NOMINAL
Posi,on Posi,on Posi,on
Length Density Hue
Angle Satura,on Texture
Slope Hue Connec,on
Area Texture Containment
Volume Connec,on Density
Density Containment Satura,on
Satura,on Length Shape
Hue Angle Length
Slope Angle
Area Slope
Volume Area
Volume
TablevsGraphs
• Presenttablesdirectlyispreferredwhen:– Fewdatapoints– Precisevaluesareimportant
UnivariateData
• Fewinteres,ngsolu,ons.• Sta,s,caldescrip,on:– Mean,median,standarddevia,on,quar,les.
• Warning!(somedatathatappearstobeunivariateareactuallybivariate).
Stemplots
• Alsocalledstemandleafplot.
• Usedtodisplayquan,ta,vedata,generallyfromsmalldatasets(50orfewerobserva,ons).
• Easytoprint.• Easytoread.
Figure from Data Visualization Catalogue (http://datavizcatalogue.com)
Stemplots
Figure from Data Visualization Catalogue (http://datavizcatalogue.com)
BoxandWhiskerPlots• BoxandWhiskerPlot(orBoxPlot)isaconvenientwayofvisuallydisplayingadatadistribu,onthroughtheirquar,les.
• Advantages:– Keyvalues(average,median,25th
percen,le,etc.)– Ifthereareanyoutliersandwhattheir
valuesare.– Ifthedataisskewedandinwhat
direc,on.
Figure from Data Visualization Catalogue (http://datavizcatalogue.com)
BoxandWhiskerPlots
Figure from Data Visualization Catalogue (http://datavizcatalogue.com)
LineCharts/LineGraphs
• InventedbytheScotshengineerandsta,s,cianWilliamPlayfair(1759-1823)
• Twoquan,ta,vevariables,typically:– Xà,meorintervals,Yany
• LineindicatesthattherearealsointermediatevaluesFigure from Data Visualization Catalogue (http://datavizcatalogue.com)
BarCharts
• BivariateData– Onenominalvariable(typicallyindependent)
– Onequan,ta,vevariable(typicallydependentvariable)
• Horizontal/Ver,calbars• Donotconfusewithhistograms.
BarCharts
3D?!Notaverygoodidea..
Histograms
• BivariateData– Oneindependentandonedependentvariable
– Thefirstvariableisquan,zedinintervals(bins)
Figure from Data Visualization Catalogue (http://datavizcatalogue.com)
PieCharts
• BivariateData– Oneindependentandonedependentvariable
• Goodforaquickvisualcheck.
• Notgoodfor:– Manyvalues.– Accuratecomparisons.
Figure from Data Visualization Catalogue (http://datavizcatalogue.com)
Figure from Data Visualization Catalogue (http://datavizcatalogue.com)
PieCharts
3D?!Notmorereadable.
Figure by Luiz Salomão.
• Essen,ally,apiechartwiththecenterareacutout.
• Allowstofocusmoreonarclengthinsteadofcomparingthepropor,onbetweenslices.
• Morespaceefficient.
Figure from Data Visualization Catalogue (http://datavizcatalogue.com)
DonutCharts
SunburstCharts• Toshowshierarchythroughaseriesofrings.Eachringcorrespondstoalevelinthehierarchy.
• Hierarchymovingoutwardsfromthecenter.
• Colourcanbeusedtohighlighthierarchalgroupingsorspecificcategories.Figure from Data Visualization Catalogue (http://datavizcatalogue.com)
SunburstCharts
Produces by Space Radar app (https://github.com/zz85/space-radar)
ScaZerPlots
• BivariateData– Twoindependentvariables
• Goodtoiden,fyrela,onships,outliersandclusters.
Variable1
Variable2
ScaZerPlots
Variable1
Variable2
Outliers
HighDensity
Variable1
Variable2Clusters
SurfaceGraphs
• TrivariateData– Threecon,nuousvariables
– Twoindependentandonedependent
SurfaceGraphs
• Colormaybeassociatedtothedependentvariable.
SurfaceGraphs
• Levelcurvescanbealsoused.
3DScaZerPlots
• TrivariateData– Threequan,ta,vevariables
• Sameconceptof2DScaZerPlot
ColoredScaZerPlots
• TrivariateData• Colorcanencodeavariableoracategory
BubbleCharts
• TrivariateData– Threequan,ta,vevariables
• Donotallowforaccuratecomparison.
• Colorscanbeusedtoshowdifferentcategories.
BubbleCharts
Figure from http://gapminder.com
ChloroplethMap
• BivariateData– Quan,ta,vevalueoverageographicalareas/regions
• Dataarecoloured,shadedorpaZernedindifferentways.
• Goodforanoverview,notforaccuratecomparison.• Smallareascanbeunderemphasized.
WordCloud
• WordCloudsdisplayshowfrequentlywordsappearinagivenbodyoftext,bymakingthesizeofeachwordpropor4onaltoitsfrequency.
• Arrangementandcolorcanvaryalot.• WordCloudscanbeusedtocomparetwobodiesoftextortogiveaquickideaofrepea,ngkeywords(e.g.usedbyresearcherstosummarizethecontentoftheirpapers).
WordClouds
• Disadvantages:– Longwordsareemphasizedovershortwords.– WordswhoseleZerscontainmanyascendersanddescendersmayreceivemoreaZen,on.
– Noaccuracycomparison,mainlyusedforaesthe,creasons.
WordClouds(example)
Produces by www.jasondavies.com/wordcloud.
WordClouds(example)
Produces by www.jasondavies.com/wordcloud.
WordClouds(example)
From www.wordclouds.com .
Summary
• Abriefintroduc,ononInforma,onVisualiza,onhasbeengiven.
• Informa,oncomingfromdata.Datathatshouldbecollectedandprocessedproperly.
• Aquickpanoramicofgraphtypeshasbeengiven.
Ques&ons?
top related