cpts 475/575: data science what is data science? · • a common requirement in assessing whether...

15
Fall 2018 CptS 475/575: Data Science What is Data Science? Part II

Upload: others

Post on 27-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Fall 2018

CptS 475/575: Data Science

What is Data Science? Part II

Page 2: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

What is Data Science?

Outline: • Big Data and Data Science hype •  and getting past the hype

• Why now? • Landscape of perspectives • Skill set needed

Page 3: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

Landscape of perspectives

Example 1. Drew Conway’s Venn diagram of DS (2010)

Page 4: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

Landscape of perspectives

Example 2. Vasant Dhar, “Data Science and Prediction”, Communications of the ACM, Dec 2013. http://cacm.acm.org/magazines/2013/12/169933-data-science-and-prediction/fulltext Dhar makes the following three major points in the article: •  Data Science is the study of the generalizable extraction of knowledge from

data. •  A common requirement in assessing whether new knowledge is actionable for

decision making is its predictive power, not just its ability to explain the past. •  A data scientist requires an integrated skill set spanning math, machine

learning, statistics, computer science, along with a deep understanding of the craft of problem formulation to engineer effective solutions.

Page 5: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

A figure taken from Dahr’s article: Projected growth rate of unstructured and structured data

Page 6: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

Landscape of perspectives

Example 3. Berman et al, “Realizing the Potential of Data Science”, Communications of the ACM, April 2018. https://cacm.acm.org/magazines/2018/4/226372-realizing-the-potential-of-data-science/fulltext Key insights from the article: •  Data science can help connect previously disparate disciplines, communities, and users to provide

richer and deeper insights into current and future challenges. •  Data science encompasses a broad set of areas, including

•  data-focused algorithmic innovation and machine learning; •  data mining and the use of data for discovery; •  collection, organization, stewardship and preservation of data; •  privacy challenges and policy associated with data; and •  pedagogy to support the education and training of data-savvy professionals.

•  There is a growing gap between commercial and academic research practice for data systems that needs to be addressed.

Page 7: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

The Data Life Cycle: a picture taken from the Berman et al. article

“Data never exists in a vacuum. Like a biological organism, data has life cycle, from birth through an active life to “immortality” or some form of expiration. Also like a living and intelligent organism, it survives in an environment that provides physical support, social context, and existential meaning.”

https://cacm.acm.org/magazines/2018/4/226372-realizing-the-potential-of-data-science/fulltext

Page 8: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

The data life cycle vs. scientific communities •  The data life cycle diagram suggests a seamless set of actions and transformations on data, but

in many scientific communities and disciplines today these steps are isolated. •  Domain scientists may focus on generating and using data •  Computer scientists often focus on platform and performance issues, including mining, organizing,

modeling, and visualizing, as well as the mechanisms for eliciting meaning from the data through machine learning and other approaches

•  The physical processes of acquisition and instrument control are often the focus of engineering •  Statisticians may focus on the mathematics of models for risk and inference •  Information scientists and library scientists may focus on stewardship and preservation of data and the

“back-end” of the pipeline, following acquisition, decisions, and action in the realm of publishing, archiving, and curation

•  There is significant opportunity for bridging gaps in development of effective life cycles for valuable data •  within and among the computer science, information science, domain, and physical science and

engineering communities, and •  among machine learning, data analytics, statistics, and operations research communities.

Page 9: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

Skill Set Needed: A Data Science Profile

• Computer science • Math •  Statistics • Machine Learning • Domain expertise • Data visualization • Communication and presentation skills

Page 10: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

Author Schutt’s data science profile

Page 11: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

So What Is a Data Scientist, Really?

•  In Industry: •  A data scientist is someone who knows how to extract meaning from and

interpret data, which requires both tools and methods from statistics and machine learning, as well as being human. •  She spends a lot of time in the process of collecting and cleaning data. This

process requires persistence, statistics, and software engineering skills. •  Once she gets the data into shape, a crucial part is exploratory data analysis,

which combines visualization and data sense. •  She will find patterns, build models, and algorithms – some with the intention

of understanding product usage and others to serve as prototypes that ultimately get baked back into the product. •  She may design experiments, and she is a critical part of data driven decision

making. •  She will communicate with team members, engineers and leadership in clear

language and with data visualizations so that even if her colleagues are not immersed in the data themselves, they will understand the implications.

Page 12: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

So What is a Data Scientist, Really?

•  In Academia: •  An academic data scientist is a scientist, trained in anything from social

science to biology, who works with large amounts of data, and must grapple with computational problems posed by the structure, size, messiness, and the complexity and nature of the data, while simultaneously solving a real world problem.

Page 13: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

Speaking of data scientist in academia…

EXERIMENTAL THEORETICAL COMPUTATIONAL (Simulation)

The 4th PARADIGM (Data) connectedness

Page 14: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Impact

Page 15: CptS 475/575: Data Science What is Data Science? · • A common requirement in assessing whether new knowledge is actionable for decision making is its predictive power, not just

Assefaw Gebremedhin, CptS 475/575: Data Science, http://scads.eecs.wsu.edu

GoogleMarket Cap(2010 Jan 1): $189 billion Cisco Systemsnetworking gear Market cap (Jan 1, 2919): $112 billion

Facebookmarket cap: $50 billion

www.bizjournals.com/austin/news/2010/11/15/facebooks... - Cached

Economic Impact

Slide credit: Barabasi (Network Science)