data visualization for libraries - caul-cbua.ca data visualization... · basic principles of data...

40
Data Visualization for Libraries Principles, Excel, Data Studio, Tableau

Upload: others

Post on 25-Jun-2020

37 views

Category:

Documents


1 download

TRANSCRIPT

Data Visualization for Libraries

Principles, Excel, Data Studio, Tableau

Agenda

● Library data - what data, why visualize, some examples● Basic principles of data visualization, types of charts, good and bad● Excel features● Google Data Studio● Tableau Public (Desktop)

Trends, Patterns, Relationships

● Budget● Circulation activity● Collections● Instruction● E-Resource usage data (COUNTER, Proxy server, other)● Interlibrary loan● Services: virtual reference, staff-logged transactions, equipment checkouts

(e.g. laptops), study rooms bookings● Website activity (Google Analytics)● Gate counts● Other?

Metrics that Matter!

Basic Principle of Data Visualization

Keep it Simple

Which is easier to understand?

From Stephen Few Show Me the Numbers

Chartjunk

Kinds of Data● Numeric data - "countable" = Metrics

○ e.g., FT uses of a journal○ Can do arithmetic on - add, median/mean, min/max - meaningfully○ may be absolute or percentage○ don't be fooled by data that looks like numbers but isn't, see below

● Categorical data = Dimensions○ labels like different collections of books: Stacks, Reference, Oversize○ includes "nominal" data that looks like numbers but isn't really, like ISBNs, room numbers○ may have a "natural" order like LC classes, rainbow colors○ Often best to present in “most to least”

● Dates/times - can be metrics (e.g. median pub years) or dimensions (month of year); has natural order that should override “most to least”

● Ordinal data - rankings - the number does not represent a quantity of anything

Basic Principles of Data VisualizationKnow what charts are best for what data

○ Bar charts - comparison of dimensional things (LC classes, types of patrons, databases)■ horizontal bars for bigger labels

○ Line charts - trend over time (budget, circ or any service activity, e-resource use)

○ Scatter charts - correlation between two metrics variables, data point can be categorical

○ Histograms - bar charts that show frequency where the order of the bars is meaningful (book imprints grouped in decades)

○ Heat maps - visualize tiers of quantities across a matrix or literal map○ Pareto and Treemaps - proportions of 100%

Basic Principles of Data VisualizationAvoid any type of chart that displays 2-d area in data values

● Principle: people don't judge area as quantity very well - we're linear (length)● pie charts/donut charts● stacked bar and area charts● cartoon figures as stylized "bars"● radar charts (LibQual)

Bar Charts● Vertical or horizontal - horizontal best when labels don't show well in vertical● Good for showing metric data in dimensions● Always start Y scale at zero

Data from UPEI Library Website Google Analytics using Google Data Studio

Grouped bar chart - time as dimension

Bad "bar" charts

People see area, obscures linear values

Stacked bar chart:How much work to see trends in each data set?

Instead of stacked bar when lots of dataPanel bar chart - keep the scales the same!

Unit Charts - Bad!Which is easier to understand quickly?

Line chartsBest for showing data with continuity, usually time, especially a few sets of related data

SJSU

UPEI Library Service Desk Transactions

Line Charts

● Usually start Y scale at zero or show it if go negative

● Intervals must be equal in size (e.g. years, months, quarters, etc.)

● Average slope of 45 degrees is often good

https://eagereyes.org/basics/banking-45-degrees

Why stacked area is badLook at the blue data - are Central sales going up?

Radar Charts: Very bad!!From UPEI's 2013 LibQual report.

Showing contributions towards a wholeWhen your data should add up to 100%

● Pie charts - most well known, but not well used● Treemaps● Pareto charts

Pie ChartUse rarely, with very few slices, data labelled

Video: Salvaging the Pie - http://www.darkhorseanalytics.com/blog/salvaging-the-pie

Taken from Magnuson book

Which helps you understand the data better?

TreemapsArea (2d) properly represents proportion of 100%

Pareto Charts● Emphasizes most important contributions towards a total● Data organized by frequency/largest to smallest● Visualize cumulative percentage

Histograms

Example: Ebook usage in EBSCO Ebook package by Publication Decade

Frequency summary of large raw data (graph version of pivot table) esp. for continuous data that needs to be grouped into ranges

Scatter chart (or Scatter plot)

Often categorical data with two variables to detect correlation or odd interactions between the two variables.

data points are countries

Heat MapTo illustrate patterns in a set of data that can be arranged in 2D, often a matrix or map.

Reference Desk Transaction Counts Fall 2010-Spring 2012

When to just use a Table instead of a Graphic Chart

● Users want to see individual values, not just trends and patterns● Users want to see precise values● Values need to be shown in various groupings, e.g., subtotals● Data includes different units, e.g. dollars, percentages, unit counts

Charts are best when it's the trend, pattern, or relationship among the data points, not their precise values, that matters most

Taken from Stephen Few, Show Me the Numbers

Library dashboards

● What is a dashboard? ideally live data, not just static tables● Why not? maintenance - keep fresh!● Ball State U - basic stats - http://www.bsu.edu/libraries/dashboard/public.php● Georgia Tech - http://www.library.gatech.edu/dashboard/● Portland State U - http://library.pdx.edu/about/library-data-dashboard/● Indiana University - Kokomo

http://iuk.edu/library/about_library/dashboard/index.php● SJSU - http://library.sjsu.edu/dashboard/dashboard● NMSU - http://lib.nmsu.edu/dashboard/libuse.shtml (note submenus)

Steps to Make Good Visualizations1. Plan what data you need and how to get it2. Acquire raw data3. Filter, refine, and process for use/ingest into charts/software4. Ingest into display tools and create visualizations

Plan what you need - what do you need to reveal?

Circulation data: what to include, exclude (reserves?), patron type info?

Collection data: volumes versus titles, locations, e-materials

E-Resource usage data: COUNTER, non-COUNTER, current, titles over time, Linking/OpenURL data

ILL data: by material type, by patron characteristics

Proxy server data: off-campus, on-campus, products without real usage reports

Website use data: Google Analytics? Feedback form use?

Other data: Staff-logged transaction data, virtual ref service?

Tips specific to library data

LC Class:

● Too many for any useful graph - how to handle?

Excel Charts● Most likely to copy-and-paste to Word or elsewhere - link versus image● Excel 2016 has many new options, not all are good practice

○ Pareto○ Treemap○ Histogram○ Waterfall - cumulative effect of positive and negative values○ Box & Whisker - shows median, median of top and bottom halves and min/max values○ Sunburst - hierarchical data - rings of pie charts

Demo Excel Charts

Data Viz Products"Business Intelligence"

● Google Data Studio● Tableau● Microsoft Power BI● Qlik● others

Concept:

Data in sources - spreadsheets, APIs, SQL databases, etc.

Connect sources to BI platform to make reports, dashboards, etc.

Google Data StudioSteps:

1. Make sure your data is cleaned up, with one header row2. Create new report3. Create new data source - select what to link to - must select one worksheet at

a time from a spreadsheet with multiple worksheets/tabs4. If needed create new "metrics" from text columns (e.g. day of week)5. Connect the data source to the report (can add multiple sources to a single

report)6. Start to add charts, text labels, etc. Can have multiple pages.7. Share report - if for website, "More" - "Public on the Web"

Google Data Studio

https://datastudio.google.com/

Does not appear with usual list of Apps, not even "more"

Can connect to Google Sheets, Google Analytics, many other data sources

If data source is live/updated, Data studio report is too

Demo Google Data Studio

TableauCommercial Product

Purchase options: Public, Desktop, Server, Online (hosted Server)

Public is free:

● 10GB space● Some connectors not available in free version● Reports are entirely public● Limit to 1 million rows of data (per source?)● Download app

Demo Tableau Public

Tableau Web Data ConnectorsFacebook: http://files.tableaujunkie.com/facebooksearch/pagefeedwebconnect.html

Instagram: https://illonage.github.io/

Many others: http://tableau.github.io/webdataconnector/community/

Recommended Reading and Q&A● Edward, Tufte. The Visual Display of Quantitative Information. Graphics

Press, Cheshire, USA [many reprints and revisions, original was 1983]● Few, Stephen. Show Me the Numbers : Designing Tables and Graphs to

Enlighten. Oakland, Calif. : Analytics Press, c2004.● Few, Stephen. Information Dashboard Design. Sebastopol, Calif. : O'Reilly,

2006.● Magnuson, Lauren. Data Visualization : A Guide to Visual Storytelling for

Libraries. Rowman & Littlefield Publishers, 2016. A LITA Guide.

Questions? Discussion?