2018 Data Science Institute

Data Visualization in JavaScript

Data Visualization in JavaScript

Here are the resouces and the walkthtough for this session

Our schedule

8:30 - 8:45 a.m. Introductions and setup
8:45 - 9:15 a.m. Visualization principles
9:15 - 9:30 a.m. Your first visualization
9:30 - 9:45 a.m. Changing it around
9:45 - 10 a.m. Wrap-up
12:30 p.m. Dataviz dffice hours in AJB W337

Setting up

This session requires a laptop, some kind of text editor for storing snippets and a web browser. Preferably Chrome, for consistency.

  1. Go to CodePen and login using your Github, Twitter or Facebook account.

  2. Create a new pen.

  3. Open the Settings. settings
    • Set the Pen title to My Data Science Institute page
    • Set Pen description to Visualizations created during the 2018 Data Science Institute at the University of Iowa.
    • Set Add Class(es) to <html> to section container.
    • Go to CSS and Quick-add Bulma
    • Go to JavaScript and add https://cdn.plot.ly/plotly-latest.min.js and https://vega.github.io/datalib/datalib.min.js
    • Go to Behavior and make sure AUSTOSAVE and Auto-Updating Preview ENABLED are checked.
  4. Save & Close

  5. In HTML paste the following code

     <h1 class=title>Data visualization in JavaScript</h1>
    
     <h2 class=subtitle>Part of the University of Iowa 2018 Data Science Institute</h2>
    
     <p>Visualization #1</p>
    
     <div class="chart" id="simpleBar"></div>
    
  6. In CSS paste the following code

     .chart {
       width: 400px;
       clear: both;
     }
    

Introduction

This class will use twp JavaScript libraries based on D3, a JavaScript library for data and document manipulation you may have heard of.

D3 can be used to create powerful visualizations and data analysis workflows on on its own, but it’s far from easy. Its creator’s dissertation advisor called it a “visualization kernel.”

Plotly.js simplifies the matter substantially. It’s the guts of the Plot.ly web service, which they have open sourced along with connectors to Python, R, Julia, Scala, MATLAB.

Datalib comes out of the same research group as D3 and provides syntax that should feel “natural” to anyone who is familiar with SQL.

Visualization principles

Visualization involves several major, and competing, features.

  1. Accuracy
  2. Legibility (a.k.a. ease of use)
  3. Precision
  4. Aesthetics
  5. Infofmation density
  6. Narrative
  7. Explorability

This is primarily accomplished through decisions about the encoding of data. The sum total of your decisions about encodings adds up to your visualization, more or less.

Grammars of graphics: Marks and transformations

Marks

Shapes  
Lines  
Icons  
Text/Labels  
Aesthetics (Colors, patterns, etc.  

Transformations

Scales  
Axes  
Coordinate systems  
Facets/repetition  

Encodings (Text)

Encoding Quantitative Ordinal Categorical Relational Group
Position Good Good Good Good Size
Text Good Good Good Good Label
Density Good Good Good   Fill
Length Good Good     Size
Area Good Good     Size
Angle Good Good     Size
Enclosure     Good Good Line
Font weight   Good Good   Label
Color   Good Good   Fill
Saturation/Brightnes   Good Good   Fill
Line thinkness   Good Good   Line
Line pattern     Good   Line
Pattern     Good   Fill
Font     Good   Label
Shape (icon)     Good   Label
Connection       Good Line
Line ending       Good Line

Encodings (Color coded)

encodings

Encodings (Color coded 2)

encodings

Encodings (Color coded, grouped)

encodings

Visualization bestiary

Scatterplot

scatterplot

scatterplot size

scatterplot colored

Bars

bar

stacked bar

dodged bar

pie

Lines

line

step

ribbon

ribbon with line

area

contour

Maps

map

map

map

map

map

Complex

radar

box

bullet

multipanel

Our first viz

The data we’re using is a standard R dataset of passengers on the Titanic, helpfully posted to Github by Vincent Arel- Bundock.

Example data

Name PClass Age Sex Survived SexCode
Artagaveytia, Mr Ramon 1st 71 male 0 0
Astor, Colonel John Jacob 1st 47 male 0 0
Astor, Mrs John Jacob (Madeleine Talmadge Force) 1st 19 female 1 1
Aubert, Mrs Leontine Pauline 1st NA female 1 1
Barkworth, Mr Algernon H 1st NA male 1 0
Baumann, Mr John D 1st NA male 0 0
Baxter, Mrs James (Helene DeLaudeniere Chaput) 1st 50 female 1 1
Baxter, Mr Quigg Edmond 1st 24 male 0 0
Beattie, Mr Thomson 1st 36 male 0 0
Beckwith, Mr Richard Leonard 1st 37 male 1 0
Beckwith, Mrs Richard Leonard (Sallie Monypeny) 1st 47 female 1 1
Behr, Mr Karl Howell 1st 26 male 1 0
Birnbaum, Mr Jakob 1st 25 male 0 0

Code

In the JavaScript tab add the following to setup and load the data

// JavaScript  utility function
function objectsToArrays(input) {
  var output = {};
  fields = dl.keys(input[0]);
  fields.forEach(function(field) {
    var vals = input.map(function (d) { return d[field]})
    output[field] = vals;
  })
  return output;
}


// load data on Titanic passengers
var data = dl.csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/Titanic.csv');

// print two records
console.log(data[0]);
console.log(data[1]);

The next step is to create the summary statistics


// create summary stats
var data = data.filter(function (d) { return d.PClass != "*"});
var groupedData = dl.groupby("PClass").summarize({"Survived":["average"]}).execute(data);    
    

console.log(groupedData);

Then we “flip” it to be in the Plotly format.

var flippedData = objectsToArrays(groupedData);

console.log(flippedData)

Finally we draw the chart

var chart1 = {}
chart1.x = flippedData.PClass;
chart1.y = flippedData.average_Survived;
chart1.type = "bar";

Plotly.newPlot("simpleBar", [chart1]);

console.log(flippedData);

Running with scissors

// JavaScript  utility function
function objectsToArrays(input) {
  var output = {};
  fields = dl.keys(input[0]);
  fields.forEach(function(field) {
    var vals = input.map(function (d) { return d[field]})
    output[field] = vals;
  })
  return output;
}


// load data on Titanic passengers
var data = dl.csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/Titanic.csv');



// print two records
// console.log(data[0]);
// console.log(data[1]);

// create summary stats
var data = data.filter(function (d) { return d.PClass != "*"});
var women = data.filter(function (d) { return d.Sex == "female"});
var men = data.filter(function (d) { return d.Sex == "male"});


var groupedWomen = dl.groupby("PClass").summarize({"Survived":["average"]}).execute(women);   

var groupedMen = dl.groupby("PClass").summarize({"Survived":["average"]}).execute(men);   

    

var flippedWomen = objectsToArrays(groupedWomen);
var flippedMen = objectsToArrays(groupedMen);


var chart1 = {};
chart1.x = flippedWomen.PClass;
chart1.y = flippedWomen.average_Survived;
chart1.type = "bar";
chart1.name = "women";

var chart2 = {};
chart2.x = flippedMen.PClass;
chart2.y = flippedMen.average_Survived;
chart2.type = "bar";
chart2.name = "men";

var layout = {barmode: 'group'}

Plotly.newPlot("simpleBar", [chart1, chart2], layout);;

console.log(groupedMen)