the technology of infovis · 2019-06-19 · such as creating a webpage, that links to visualization...

26
INFORMATION VISUALIZATION The Technology of InfoVis Information Visualization - 4 version 0.1 ©benoit 1

Upload: others

Post on 30-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • INFORMATION VISUALIZATION

    The Technology of InfoVisInformation Visualization - 4

    version 0.1 ©benoit�1

  • INFORMATION VISUALIZATION

    The Tech of InfoVis• Overview of computing environments (or architecture)

    • Common tools to create & to view visualizations

    • Data file types

    • Fundamentals of the relationships of HTML, CSS, JS, .json, and the Document Object Model (DOM)

    !2

  • !3

    Watson and mega-computers mainframes

    mid-size computers

    networks of computers cloud, distributed, “info ecologies”

    local networks

    desktopsmobile devices

    This triangle is to suggest that there are many complex computer architectures, using a variety of very power computers down to very powerful desktops and mobile devices.

    For InfoVis people who don’t work in the mainframe environments (that’s most of us) we work with the Internet … the programming tools and data stores commonly associated with that “n-tier” architecture.

    So whether we’re programming for our own laptop or our institution’s in-house and client needs, the concepts and hands-on skills are very much related.

    Let’s work our way down from these large architectures to something we’ll use all the time - the Internet, our desktops & mobile devices.

  • INFORMATION VISUALIZATION

    Overview of computing environmentsCreating & displaying visualizations are impacted by your knowledge of the message you want to convey, the data themselves, and the computing tools you employ.

    !4

    IBM, like SAS, Oracle, SPSS are data-rich companies that provide services, tools and particularly offer white papers to “train” their clients into the value of visualization and their products. Their computer environments may be large platforms, may be midrange computers or mainframes, or Cloud-based networks of servers.

    SAS, known for statistical analysis, moves into DataVis market (https://www.sas.com/fr_fr/insights/big-data/data-visualization.html)

    IBM and others also provide their own programming languages and networks for visualization, such as IBM’s Design Language (https://www.ibm.com/design/language/), Oracle uses the Java programming language and Big Data “information ecologies” (tailored computing equipment and software) such as Hadoop, Spark, AWS and Kafka clusters.

    https://www.sas.com/fr_fr/whitepapers/data-visualization-techniques-106006.html

    https://www.ibm.com/design/v1/language/experience/data-visualization/process/

    https://cloud.oracle.com/

    http://navylive.dodlive.mil/2015/11/09/the-history-of-veterans-day/final_infographic_veteransday2015_jpg-jpg/https://www.sas.com/fr_fr/insights/big-data/data-visualization.htmlhttps://www.ibm.com/design/language/

  • INFORMATION VISUALIZATION

    Overview of computing environments• Creating & displaying visualizations are impacted by your knowledge of the message

    you want to convey, the data themselves, and the computing tools you employ.

    !5

    These large-scale environments are usually bound to ideas of “Business Analytics”, “Data Visualization”, “Visual Analysis”, “Big Data and Data Analytics”.

    EMC2 Education Series’ text Big Data and Data Analytics is a great, slightly technical, guide to integrating R (statistical analysis software) and specialty languages, such as Carrot and Pig (really!). The goal is optimize computational efficiency by addressing “space complexity” [the hardware] and “time complexity” [the software] (aka BigO).

    Hadoop and Spark are pretty popular.

    http://navylive.dodlive.mil/2015/11/09/the-history-of-veterans-day/final_infographic_veteransday2015_jpg-jpg/

  • INFORMATION VISUALIZATION !6

    Overview of computing environments

    Info Ecosystems address scripting, statistics, text file and database access, and administrative tools into a purposively-designed, holistic architecture.

  • INFORMATION VISUALIZATION

    • Most will not use these larger-scale architectures but rather the more familiar client-server architecture of web sites and services.

    • Server: usually a dedicated computer, running software, such as Apache, that handles receiving and responding to webpage requests.

    • The server “listens” to “ports”; e.g., requests* from a client for a webpage (.html) are received by the server [usually] on port 80; database requests are often on port 3206, servlets on 8080, and so on.

    • Client: this is usually you and your patrons using web browsers. When you send a request (such as search or login) from browser to server, the server responds and executes the scripts identified in the web form (e.g., . These scripts may communicate with a relational database server, file servers, and use other software on the server to perform calculations or otherwise prepare the data for the client.

    !7

    Overview of computing environments

    *Note: admin can alter the config.xml and server.xml files to “listen” to different ports, ignore certain domains, process or not certain script languages, etc.

    http://apache.org

  • INFORMATION VISUALIZATION !8

    Overview of computing environmentsServer Clients1

    2

    3

    4

    5

    6

    1) Client browser sends a “request” object to the server, e.g., https://info.com/, using

    2) Server receives request; activates program identified in the action statement of the web form, … action=“myscript.php”

    3) Script starts; maybe in many languages and use tools, such as R, to calculate and prepare data

    4) Typically scripts access relational databases, file servers, and other web-servers for the raw data

    5) The server prepares the data, often integrating JavaScript, CSS, and other script libraries to create a “virtual page” that is streamed back to the client computer as the “response”

    6) User’s browsers loads the virtual file and processes just as it would for any webpage.

    This is often called an “n-tier” architecture.

    https://info.com/https://info.com/

  • INFORMATION VISUALIZATION

    Drilling down to our tools

    !9

    3) Script starts; maybe in many languages and use tools, such as R, to calculate and prepare data

    Compiled languages*: Java, C++ Script languages: Python, PHP, Perl, JavaScript

    Common tools used on the server to prepare data: R, SQL, other scripts to retrieve, clean, cluster/classify data before integrating the data into the “virtual page” that is downstreamed back to the client.

    In the same way, a programming running on the server may contact other websites and extract data from them, too, as part of preparing data for the client.

    Note: sometimes the prepared data are saved as a file on the server (e.g., mydata.dat, or yourdata.txt). The script that’s returned to the client references this saved file by its name and downloaded to the client as needed.

    There are other tools that aren’t mentioned here. The ones in the image represent the most commonly-employed languages in 2019.

  • INFORMATION VISUALIZATION

    Common Tools• Tools for creating the visualization

    • Text editors (BBEdit, Atom, NotePad, etc.) where you write everything from scratch, such as creating a webpage, that links to visualization libraries; d3.js is a very popular choice.

    • IDEs (Integrated Development Environments) are stand-alone applications that help you layout designs and integrate data. These tools focus on the UX and usually shield the more difficult aspects of data access; Vega, VegaLite, Tableau, others.

    • Code Libraries - some scripting/programming languages have built-in code that facilitate visualizations or rely on importing external libraries. Python programmers, for example, usually import matplotlib, seaborn, ggplot (popularly used with R), bokeh, geoplotlib, plotly. JavaScript libraries include d3.js.

    • There are also code libraries that help webpages be more efficient when accessing networked data; these include Ajax (asynchronous javascript) and node.js.

    !10

    Important! Students and students enrolled in an InfoVis course can get free access to Tableau through the company’s student training program. Visit the site and check it out.

    https://www.tableau.comhttps://matplotlib.orghttp://seaborn.pydata.orghttp://ggplot.yhathq.comhttps://d3js.orghttp://Common%20tools%20to%20create%20&%20to%20view%20visualizations

  • INFORMATION VISUALIZATION

    Data and Data Types• When discussing data for our purposes, we’ll focus on files. The most

    common data formats we’ll encounter are data stored as …

    • unstructured text usually in files with extensions* .txt, .dat

    • tab-delimited text or tab-separated value often data exported from spreadsheets or RDBMS where each column or field of data is separated by a tab/t character; equally may encounter comma-separated values where the fields of data are separated by a comma.

    • semi-structured text files with an inherent structure

    • xml extensible mark-up language

    • json JavaScript Object Notation. This is very popular!

    !11

    The programming languages mentioned earlier have built-in commands that read these file types and load the values into RAM. Then our script takes over to apply the data to the visualization.

  • INFORMATION VISUALIZATION

    Data and Data Types (optional)• Think about data types from the computer’s pov & look at the size of the datum. From small to large …

    • byte, char, boolean (for “character”; 8 bits; boolean is “true or false”, or 0/1.)

    • int, float, double, long these are all number formats - integers are the smallest; float, double, and long vary by the computer itself, but range from 32-bits to 64 and perhaps now even 128! Not all languages we all four; rely on int and float. E.g., var i = 27;

    • pointer a pointer is a variable that holds the memory address of another variable. We won’t be using them. Pointers are part of what’s hidden when using extra libraries; the benefit is such libraries are ultra fast.

    • string is a contiguous block of alphanumeric (A-Z, a-z, 0-9) data, e.g., var s = “Tom and Gerry”; Strings are usually created using double-quotes “ ” or single quotes ‘ ’. Don’t mix and match.

    • Date is a data type all its own, an object that contains lots of commands (or methods) that let us convert the date all kinds of representations, such as 07/19/2019 or 2019/07/19 or 2019年年6⽉月19⽇日

    • Array is a handy container that holds other data types. Each element in the array can be identified by its position in the array. Start counting from 0, not 1. For instance, var arrayOfCats = [“Suky”, “Bixy”, “Bunny”, “Kiki”]; then in our code we can say something like print(“Today, the star of the show is “ + arrayOfCats[1]);

    !12

    for lots of details see https://www.javatpoint.com/java-data-types

    https://javascript.info/types

    https://www.w3schools.com/js/js_datatypes.asp

    https://www.w3schools.com/sql/sql_datatypes.asp https://www.journaldev.com/16774/sql-data-types

    https://www.javatpoint.com/java-data-typeshttps://javascript.info/typeshttps://www.w3schools.com/js/js_datatypes.asphttps://www.w3schools.com/sql/sql_datatypes.asphttps://www.journaldev.com/16774/sql-data-types

  • INFORMATION VISUALIZATION

    Data and Data Types (Encoding)• Data we read from the Net should be intelligible on screen; if the page is written in

    English, show English letters; if in Arabic, show Arabic, etc. The way to do this is to use the required standard of UTF-8. [https://www.fileformat.info/info/unicode/utf8.htm; https://www.w3schools.com/charsets/ref_html_utf8.asp]

    • The letterforms we see on the computer have to be uniquely identifiable. We combine UTF-8 and Unicode to represent all the world’s writing systems.

    • When reading/writing data files, you may have to “Save as…” with encoding UTF-8.

    • Every letter/character can be represented in several ways:

    • as an html “named entity”, e.g., à is à À is À In some programming languages, such as python, we can use the full name of the letter (LATIN CAPITAL LETTER A WITH GRAVE)

    !13

    https://www.fileformat.info/info/unicode/utf8.htmhttps://www.fileformat.info/info/unicode/utf8.htmhttps://www.w3schools.com/charsets/ref_html_utf8.asp

  • INFORMATION VISUALIZATION

    • Every character and even color can be identified in several ways in a web page. For InfoVis, we can manipulate programmatically the values to change the color and/or opacity of data. This technique helps end-users understand the data that are more or less relative to their inquiries.

    • The character itself, ü, é, à, ñ, ø, ß, з, я, 明. (Provided the data are saved as UTF-8 encoding and the webpage is set to read UTF-8 pages).

    • The letter à in decimal (á), hexadecimal (xEO;), and named entity à

    • Same with color codes: rgba (values for red, green, blue, and alpha (or opacity)):

    • red in HTML is a named entity “red”

    • red in RGBA is rgba(255, 0, 0, 1) = 100% red (in hex, 255); 1 for 100% non-opacity

    • a lightly transparent red is rgba(255, 0, 0, 0.5)

    !14

    Data and Data Types (Encoding)

  • INFORMATION VISUALIZATION

    Quick example

    !15

    html, body { width: 100%; height: 100%; margin: 0; padding: 20px; font-family: "Raleway", 'Avenir Next' } #thebox { background-color: coral; margin: 10px; padding: 15px; } function changeColors() { var div = document.getElementById('thebox'); var j = 255;

    (function theLoop(i) { setTimeout(function() { j -= 1; col = “rgba(" + j + “,0,0,1)"; div.style.backgroundColor = col; document.getElementById("no").innerHTML = col; if (--i) { theLoop(i); } if (i == 128) { div.style.color = "white"; } }, 100); })(255); }

  • INFORMATION VISUALIZATION

    • Unicode! Check out the character code charts on unicode.org

    !16

    Data and Data Types (Encoding)

    Notice that each graphic has a unique identifier; many of

    which include cultural phenomena.

    http://www.unicode.org/charts/

  • INFORMATION VISUALIZATION

    Bringing it home … • So far we’ve looked briefly at the ideas of

    • Large-scale computing that leads to Big Data/Data Science

    • The Internet as an n-tier architecture that we use in our work

    • Issues of file types, the data in those files, and their encoding so we can process them intelligently for computers and for people

    • Now, let’s review a few fundamentals about the relationship of HTML, CSS, JS, and .json for creating visualizations.

    !17

  • INFORMATION VISUALIZATION

    HTML, CSS, JS, JSON (reminder)• HyperText Markup Language - a semi-structured file type because tags identify the container for the data, which

    consist of the raw “text”. Note, tho, that the tags are arbitrary. For instance, there’s nothing inherent in the bold tag meaning an author, while in .xml we can create our own semantic tag . The webpage provides the data; we’re responsible for the presentation of the data. [This division of data/presentation is vital in contemporary computing.]

    • Cascading Style Sheet - commands that overwrite the default behavior of an HTML tag. CSS commands can be in a separate .css file, inline with the command, or within the section of a webpage.

    • JavaScript - is not Java - it’s a script language that lets us interact with the data, the css commands, and, most of all, capture and respond to end-user input. The inputs are usually mouse and clicks - called events. For instance the user clicking on an arrow icon in a webpage might trigger the page to open a new container to show more text.

    • Json - a semi-structured text file that uses a particular syntax to hold data and represent the hierarchical relationships of those data with each other, the parent-child relationship of nodes. Json and tsv files are very commonly employed because their structure + data feed directly into data structures. For instance, a .json file maps perfectly to a Python dictionary; for JavaScript, the data feed well into the DOM, or document object mode - a hierarchical tree of data.

    • DOM - the DOM tree is a set of data nodes in an hierarchical order. Each node can be identified uniquely by its id tag, or by the class (or group) of tags or by its unique name. By knowing the ID of a node, JavaScript can manipulate the data + the presentation of the data by changing the css commands and/or by updating the content of the tag. This is absolutely vital!

    !18

  • INFORMATION VISUALIZATION !19

    HTML, CSS, JS, JSON (reminder)

  • INFORMATION VISUALIZATION !20

    HTML, CSS, JS, JSON (reminder)

  • INFORMATION VISUALIZATION !21

    HTML, CSS, JS, JSON (reminder) This h1 is blue! document.getElementById(“dateTimeStamp”).innerHTML = new Date();

    External stylesheet

    Internal stylesheet

    Inline style command

    Note that some visualization languages work best if you use inline commands.

  • INFORMATION VISUALIZATION !22

    HTML, CSS, JS, JSON (reminder) function showHideDiv() { var x = document.getElementById(“demo");   if (x.style.display === "none") {      x.style.display = "block";    } else {      x.style.display = "none";   } }

    Welcome to our new site

    And the rest of our cool content appears here.

    document.getElementById(“dateTimeStamp”).innerHTML = new Date();

    link to a .js file that is in the same folder as this .html page

    link to an external library. Notice the https! Your browser might not allow using both http and https.

    This JavaScript function is in the html page itself. Notice the function is activated when there’s a “click event”. The tag with the ID of “demo” is linked to the function (showHideDiv()). If the div is visible on the screen, it is hidden upon click; if invisible upon click, the div is made visible.

    This script doesn’t use a function - it is always interpreted as the browser reads the webpage (starting from the called the “root” and then down every node to the tag. This script looks for an element with Id “dateTimeString” and updates the contents/text (innerHTML) with the actual date and time. The date and time come from the Date() object, a built-in javascript function.

  • INFORMATION VISUALIZATION

    Common work scenariosA common situation on-the-job is that Tom has many Excel spreadsheet files and he wants to create an interactive visualization. Rather than use the built-in graphs of Excel, he applies some of his knowledge of webpages. So …

    • Exports the data and saves it as a .csv (even tho he saved as a “tab-delimited file” the name of the file itself may end up being .csv)

    • Opens the .csv file and manually or cut-and-pastes commands to turn his data into a table. This is time-consuming.

    • Adds a .css command to make alternate the background color of each record in the table. (Good for you, Tom!)

    !23

  • INFORMATION VISUALIZATION

    Common work scenarios … Tom leaves his .csv files on the server. In his webpage he adds an Ajax command to load his data into his browser. The data are stored in a JavaScript array. Then Tom just iterates through the data …

    !24

    function loadDoc() { var xhttp = new XMLHttpRequest(); xhttp.onreadystatechange = function() { if (this.readyState == 4 && this.status == 200) { document.getElementById("demo").innerHTML = this.responseText; } }; xhttp.open("GET", “TomsData.csv”, true); xhttp.send(); }

    In this example, Tom’s function loads the data when he wants them to be loaded. For instance, perhaps he has an option for the end-users to add data when they click on a particular button.

    Notice the tight integration of web technologies and JavaScript. The “request/response” idea we know of from the C/S Architecture is expressed here in an XMLHttpRequest Object and a “response”.

    When the server catches this request from the client, the server checks first if the requested “TomsData.csv” page is available [that’s the readyState = 4] and checks that the server is ready to send it [that’s the status == 200].

    The script identifies a div (“demo”) just as we’ve done before and changes the content via the innerHTML by letting the data stream from the server right into this variable.

    When the data are retrieved, the webpage is updated with the new content!

    Important note: Notice “GET”? Usually we send data over the Net using a “Get” or a “Post” package of data. GET is less secure and limited to < 128 characters; POST is much more secure and doesn’t have a limit.

  • INFORMATION VISUALIZATION

    Common work scenario• You’ve been asked to create an online info vis service. What do you do?

    • What tools and skills are already available at work?

    • Who is going to maintain and scale the projects, after your documentation?

    • Should we use Tableau or d3.js or other third-party software?

    • Does your office play a role or does the central IT office process all requests and services on their own time?

    • Recommendation? InfoProfessionals should be adept at web design, so by combining a knowledge of C/S Architecture, file read/write, JavaScript libraries, we can focus on the data and not the people creating the service … so we can be vital participants in making data available in many formats for whatever the need will be.

    !25

  • !26

    Next steps?• It’s time for hands-on practice with HTML, CSS, JS, .json and

    preparing to use d3.js

    • If you’ve downloaded Tableau, experiment with it, too.

    • It’s easier to control the aesthetic and data functions of a visualization by using d3.js but it may be more work for some, hence Tableau’s popularity.

    • Students vary in preparation with web-oriented skills; but focus on the message you want to send and how it may be misinterpreted by others … how does the computing technology influence your ability to create the message and others to see/use the data and interpretation? It’s all in design that combines aesthetics + tech!