data visualization for libraries - caul-cbua.ca data visualization... · basic principles of data...
TRANSCRIPT
Agenda
● Library data - what data, why visualize, some examples● Basic principles of data visualization, types of charts, good and bad● Excel features● Google Data Studio● Tableau Public (Desktop)
Trends, Patterns, Relationships
● Budget● Circulation activity● Collections● Instruction● E-Resource usage data (COUNTER, Proxy server, other)● Interlibrary loan● Services: virtual reference, staff-logged transactions, equipment checkouts
(e.g. laptops), study rooms bookings● Website activity (Google Analytics)● Gate counts● Other?
Metrics that Matter!
Kinds of Data● Numeric data - "countable" = Metrics
○ e.g., FT uses of a journal○ Can do arithmetic on - add, median/mean, min/max - meaningfully○ may be absolute or percentage○ don't be fooled by data that looks like numbers but isn't, see below
● Categorical data = Dimensions○ labels like different collections of books: Stacks, Reference, Oversize○ includes "nominal" data that looks like numbers but isn't really, like ISBNs, room numbers○ may have a "natural" order like LC classes, rainbow colors○ Often best to present in “most to least”
● Dates/times - can be metrics (e.g. median pub years) or dimensions (month of year); has natural order that should override “most to least”
● Ordinal data - rankings - the number does not represent a quantity of anything
Basic Principles of Data VisualizationKnow what charts are best for what data
○ Bar charts - comparison of dimensional things (LC classes, types of patrons, databases)■ horizontal bars for bigger labels
○ Line charts - trend over time (budget, circ or any service activity, e-resource use)
○ Scatter charts - correlation between two metrics variables, data point can be categorical
○ Histograms - bar charts that show frequency where the order of the bars is meaningful (book imprints grouped in decades)
○ Heat maps - visualize tiers of quantities across a matrix or literal map○ Pareto and Treemaps - proportions of 100%
Basic Principles of Data VisualizationAvoid any type of chart that displays 2-d area in data values
● Principle: people don't judge area as quantity very well - we're linear (length)● pie charts/donut charts● stacked bar and area charts● cartoon figures as stylized "bars"● radar charts (LibQual)
Bar Charts● Vertical or horizontal - horizontal best when labels don't show well in vertical● Good for showing metric data in dimensions● Always start Y scale at zero
Data from UPEI Library Website Google Analytics using Google Data Studio
Line chartsBest for showing data with continuity, usually time, especially a few sets of related data
SJSU
UPEI Library Service Desk Transactions
Line Charts
● Usually start Y scale at zero or show it if go negative
● Intervals must be equal in size (e.g. years, months, quarters, etc.)
● Average slope of 45 degrees is often good
https://eagereyes.org/basics/banking-45-degrees
Showing contributions towards a wholeWhen your data should add up to 100%
● Pie charts - most well known, but not well used● Treemaps● Pareto charts
Pie ChartUse rarely, with very few slices, data labelled
Video: Salvaging the Pie - http://www.darkhorseanalytics.com/blog/salvaging-the-pie
Pareto Charts● Emphasizes most important contributions towards a total● Data organized by frequency/largest to smallest● Visualize cumulative percentage
Histograms
Example: Ebook usage in EBSCO Ebook package by Publication Decade
Frequency summary of large raw data (graph version of pivot table) esp. for continuous data that needs to be grouped into ranges
Scatter chart (or Scatter plot)
Often categorical data with two variables to detect correlation or odd interactions between the two variables.
data points are countries
Heat MapTo illustrate patterns in a set of data that can be arranged in 2D, often a matrix or map.
Reference Desk Transaction Counts Fall 2010-Spring 2012
When to just use a Table instead of a Graphic Chart
● Users want to see individual values, not just trends and patterns● Users want to see precise values● Values need to be shown in various groupings, e.g., subtotals● Data includes different units, e.g. dollars, percentages, unit counts
Charts are best when it's the trend, pattern, or relationship among the data points, not their precise values, that matters most
Taken from Stephen Few, Show Me the Numbers
Library dashboards
● What is a dashboard? ideally live data, not just static tables● Why not? maintenance - keep fresh!● Ball State U - basic stats - http://www.bsu.edu/libraries/dashboard/public.php● Georgia Tech - http://www.library.gatech.edu/dashboard/● Portland State U - http://library.pdx.edu/about/library-data-dashboard/● Indiana University - Kokomo
http://iuk.edu/library/about_library/dashboard/index.php● SJSU - http://library.sjsu.edu/dashboard/dashboard● NMSU - http://lib.nmsu.edu/dashboard/libuse.shtml (note submenus)
Steps to Make Good Visualizations1. Plan what data you need and how to get it2. Acquire raw data3. Filter, refine, and process for use/ingest into charts/software4. Ingest into display tools and create visualizations
Plan what you need - what do you need to reveal?
Circulation data: what to include, exclude (reserves?), patron type info?
Collection data: volumes versus titles, locations, e-materials
E-Resource usage data: COUNTER, non-COUNTER, current, titles over time, Linking/OpenURL data
ILL data: by material type, by patron characteristics
Proxy server data: off-campus, on-campus, products without real usage reports
Website use data: Google Analytics? Feedback form use?
Other data: Staff-logged transaction data, virtual ref service?
Excel Charts● Most likely to copy-and-paste to Word or elsewhere - link versus image● Excel 2016 has many new options, not all are good practice
○ Pareto○ Treemap○ Histogram○ Waterfall - cumulative effect of positive and negative values○ Box & Whisker - shows median, median of top and bottom halves and min/max values○ Sunburst - hierarchical data - rings of pie charts
Data Viz Products"Business Intelligence"
● Google Data Studio● Tableau● Microsoft Power BI● Qlik● others
Concept:
Data in sources - spreadsheets, APIs, SQL databases, etc.
Connect sources to BI platform to make reports, dashboards, etc.
Google Data StudioSteps:
1. Make sure your data is cleaned up, with one header row2. Create new report3. Create new data source - select what to link to - must select one worksheet at
a time from a spreadsheet with multiple worksheets/tabs4. If needed create new "metrics" from text columns (e.g. day of week)5. Connect the data source to the report (can add multiple sources to a single
report)6. Start to add charts, text labels, etc. Can have multiple pages.7. Share report - if for website, "More" - "Public on the Web"
Google Data Studio
https://datastudio.google.com/
Does not appear with usual list of Apps, not even "more"
Can connect to Google Sheets, Google Analytics, many other data sources
If data source is live/updated, Data studio report is too
Demo Google Data Studio
TableauCommercial Product
Purchase options: Public, Desktop, Server, Online (hosted Server)
Public is free:
● 10GB space● Some connectors not available in free version● Reports are entirely public● Limit to 1 million rows of data (per source?)● Download app
Demo Tableau Public
Tableau Web Data ConnectorsFacebook: http://files.tableaujunkie.com/facebooksearch/pagefeedwebconnect.html
Instagram: https://illonage.github.io/
Many others: http://tableau.github.io/webdataconnector/community/
Recommended Reading and Q&A● Edward, Tufte. The Visual Display of Quantitative Information. Graphics
Press, Cheshire, USA [many reprints and revisions, original was 1983]● Few, Stephen. Show Me the Numbers : Designing Tables and Graphs to
Enlighten. Oakland, Calif. : Analytics Press, c2004.● Few, Stephen. Information Dashboard Design. Sebastopol, Calif. : O'Reilly,
2006.● Magnuson, Lauren. Data Visualization : A Guide to Visual Storytelling for
Libraries. Rowman & Littlefield Publishers, 2016. A LITA Guide.
Questions? Discussion?