loras college 2014 business analytics symposium | andy stevens: big data analytics

Post on 30-Oct-2014

293 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

This session will cover issues and and advice for implementing Big Data Analytics in a Research and Development context. In addition to the basics, it will discuss the past, present and future and touch on relevant mathematics, statistics, science, technology, economics, business, history and even some literature. For more information on the Loras College 2014 Business Analytics Symposium, the Loras College MBA in Business Analytics or the Loras College Business Analytics Certificate visit www.loras.edu/mba or www.loras.edu/bigdata.

TRANSCRIPT

Big Data Analytics, R&DRobert Andrew Stevens, CFA

John Deere

Disclaimer

The information, views, and opinions contained in this presentation are those of the author and do not necessarily reflect the views and opinions of John Deere

Outline = Favorite Quotes

1. “when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind”

2. “it takes all the running you can do, to keep in the same place”

3. “The future is already here – it’s just not evenly distributed”4. “The essence of strategy is the timing of the sunk cost

commitment”5. “Americans can always be counted on to do the right

thing...”

“when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind”

“I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.”

Lecture on “Electrical Units of Measurement” (3 May 1883), published in Popular Lectures Vol. I, p. 73; quoted in Encyclopaedia of Occupational Health and Safety (1998) by Jeanne Mager Stellman, p. 1992http://en.wikiquote.org/wiki/William_Thomson

http://en.wikipedia.org/wiki/Lord_Kelvin

William Thomson, 1st Baron Kelvin

1824–1907

a.k.a.: Lord KelvinOccupation: mathematical physicist and engineer

What is Analytics?Turning Data into Decisions

Production, Assembly, Inspection

Distribution

Consumers

ConsumerResearch

Designand

Redesign

Receipt andTest of

Materials

Tests of Process,Machines, Methods,

Costs

Suppliers ofMaterials and

Equipment

* Deming, W.E. Out of the Crisis,1986 (p. 4)

Production Viewed as a System *

Take Action!

The Road to Earlier Discovery and Shorter Decision Cycles

Big Data in R&D at John Deere

Primarily machine data: CAN and GPSVolume: immeasurableVelocity: fast and furiousVariety: nothing is the sameValue: TBD

“it takes all the running you can do, to keep in the same place”

The Red Queen's race is an incident that appears in Lewis Carroll's Through the Looking-Glass and involves the Red Queen, a representation of a Queen in chess, and Alice constantly running but remaining in the same spot.

“Well, in our country,” said Alice, still panting a little, “you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”“A slow sort of country!” said the Queen. “Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”http://en.wikipedia.org/wiki/Red_Queen's_race

http://en.wikipedia.org/wiki/Lewis_Carroll

Charles Lutwidge Dodgson

1832–1898

Pen name: Lewis CarrollOccupation: Writer, mathematician,  Anglican cleric, photographer, artist

The Problem/Opportunity

Data generated

Data analyzed

Data captured and stored

[Remember: DIKW = Data Information Knowledge Wisdom ?]

Ideally, if nothing changes…Today Transition Vision

But the data generated might grow faster than we can manage

[Ever hear of “The Internet of Things” ?]

Today Transition Vision

So, maybe we should try to do something like this…

[“If you want to get somewhere else, you must run at least twice as fast as that!”]

Today Transition Vision

A Solution: Data Science

• Applies everywhere

• Practical/feasible?

• In R&D?http://www.dataists.com/2010/09/the-data-science-venn-diagram

Data Science in R&D

1. Multidisciplinary Investigations (25%) 2. Models and Methods for Data (20%) 3. Computing with Data (15%) 4. Pedagogy (15%) 5. Tool Evaluation (5%) 6. Theory (20%)Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics , ISI Review, , 69, 21-26. W. S. Cleveland, 2001.http://www.stat.purdue.edu/~wsc/papers/datascience.pdf

“The future is already here – it’s just not evenly distributed”— William Gibson, quoted in The Economist, December 4, 2003

http://www.economist.com/printedition/2003-12-06http://en.wikipedia.org/wiki/William_Gibson

William Gibson1948–

CERN: Solving the Mysteries of the Universe with Big Data

The Large Hadron Collider Computing Challenge• Data volume

– High rate large number of channels 4 experiments – 15 PetaBytes of new data each year 30 PB in 2013

• Overall compute power – Event complexity Nb. events thousands users – 200 k cores 350 k cores– 45 PB of disk storage 150 PB Storage

http://openlab.web.cern.ch/sites/openlab.web.cern.ch/files/presentations/Jarp_Big_Data_Boston_final.pdf (09/12/13)

The Scientific Method

1. Formulation of a question

2. Hypothesis3. Prediction4. Testing5. Analysis

http://en.wikipedia.org/wiki/Scientific_method

An 18th-century depiction of early experimentation in the field of chemistry

“The essence of strategy is the timing of the sunk cost commitment”Verbal communication during UIUC MBA Strategic Management class

http://www.amazon.com/Economic-Foundations-Strategy-Organizational-Science/dp/1412905435http://business.illinois.edu/facultyprofile/faculty_profile.aspx?ID=99

Professor of Business Administration and Caterpillar Chair of BusinessUniversity of Illinois at Urbana-Champaign

Joseph T. Mahoney1958–

What happens to Q as P 0?• Change “Household” to

“Firm”• Change “chocolate” to

“software”• Now what happens to Q as

P 0?• How could that happen in

a Big Data Analytics, R&D context?http://catalog.flatworldknowledge.com/bookhub/reader/2992?e=coopermicro-ch07_s01

Figure 7.1 The Demand Curve of an Individual Household

The One-Day MBA

http://www.engineeringtoolbox.com/cash-flow-diagrams-d_1231.htmlhttp://en.wikipedia.org/wiki/Net_present_value

𝑁𝑃𝑉=∑𝑡=0

𝑛 𝐹 𝑡

(1+𝑖)𝑡

F0 = Sunk cost investment

• Assuming Ft does not decrease* for t > 0, what happens to NPV as F0 0?

• How could that happen in a Big Data Analytics, R&D context?

• What are the implications for strategy?

Avoid Sunk Cost Commitments and Vendor Lock-in with Open Source

• Apache: http://www.apache.org/– Hadoop, Hive, Mahout, Pig, Spark…

• GRASS GIS: http://grass.osgeo.org/• Java: http://www.java.com/ + Cassandra• Julia: http://julialang.org/• Perl: http://www.perl.org/• Python: http://www.python.org/• R: http://cran.us.r-project.org/ + RHIPE• Scala: http://scala-lang.org/ + Scalding• SQL:

– http://www.mysql.com/– http://www.postgresql.org/ + PostGIS

“Americans can always be counted on to do the right thing...”

“...after they have exhausted all other possibilities.”

Also famous for: “We shall never surrender” “peace in our time”And many others relevant to The War on Data

http://www.quotedb.com/quotes/2313https://en.wikipedia.org/wiki/Winston_churchill

Sir Winston Churchill1874–1965

Profession: Member of Parliament , statesman, soldier, journalist, historian, author, painter

Tips for winning The War on Data

Teamwork

Statistics

Partner with IT

Learn-Do-Teach

Replenish your toolbox

Math

Pop Quiz

What are the 3 most important things in Real Estate?1. Location2. Location3. Location

What are the 3 most important things in Statistics?4. Look at the data5. Look at the data6. Look at the data

… especially for Big Data Analytics:7. Look at the data before you analyze it: Exploratory Data Analysis (EDA)8. Look at the data while you analyze it: model diagnostics9. Look at the data after you analyze it: visualization and communication

Other Survival Tips

• Visualization and Communication– Tools: R & Rmd, Ggobi, Tableau, ArcGIS/GRASS…– Presentations: Tell them 3X, 5Ws

• Collaboration: working as a team– File and code version control– Google's R Style Guide

• Reproducible Research best practices– Avoid errors by Potti (Duke) and Rogoff & Reinhart (Harvard)

• http://en.wikipedia.org/wiki/Anil_Potti• http://en.wikipedia.org/wiki/Reinhart-Rogoff

Summary = Favorite Quotes

1. “when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind”

2. “it takes all the running you can do, to keep in the same place”

3. “The future is already here – it's just not evenly distributed”4. “The essence of strategy is the timing of the sunk cost

commitment”5. “Americans can always be counted on to do the right

thing...”“Those who cannot remember the past are condemned to repeat it.”– George Santayana

Q & A

Contact Information

E-mail:stevensroberta@johndeere.com (business)

robertandrewstevens@gmail.com (personal)

LinkedIn: http://www.linkedin.com/pub/robert-andrew-stevens-cfa/6a/a04/315

Twitter: https://twitter.com/RobertAndrewSt3

GitHub: https://github.com/robertandrewstevens

top related