big data, data science, arti cial intelligence and digital...

39
Big Data, Data Science, Artificial Intelligence and Digital Transformation: Is there a Shangri La? Wagner Meira Jr., PhD 1 1 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil February 17, 2020 Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 1 / 39

Upload: others

Post on 25-Jul-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Big Data, Data Science, Artificial Intelligence andDigital Transformation:Is there a Shangri La?

Wagner Meira Jr., PhD 1

1Department of Computer ScienceUniversidade Federal de Minas Gerais, Belo Horizonte, Brazil

February 17, 2020

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 1 / 39

Page 2: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

What is Shangri La?

Shangri-La is a fictional place described in the 1933 novel Lost Horizon by Britishauthor James Hilton. Shangri-La has become synonymous with any earthly

paradise – a permanently happy land, isolated from the world.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 2 / 39

Page 3: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Big Data?

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 3 / 39

Page 4: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Big Data: Is it a solved problem?

IoT?

Real time?

Heterogeneity?

Data Protection and Privacy Assurance?

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 4 / 39

Page 5: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data Science

Data Science is an area that aims to systematize processes and practices toexplore, analyze and generate models that enable description, prediction andprescription based on diverse data. Overall, it targets better performance andefficacy of organizations and life quality of both citizens and societies.Data Science models and transforms data towards supporting decision making,through computational thinking tasks.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 5 / 39

Page 6: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data Science Process

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 6 / 39

Page 7: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data Science Areas

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 7 / 39

Page 8: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data ScienceComplexities

Complexity refers to sophisticated characteristics in data science systems.

Data science problems may be viewed as complex systems involvingcomprehensive system complexities.

Data complexity

Behavior complexity

Domain complexity

Social complexity

Environment complexity

Learning complexity

Deliverable complexity

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 8 / 39

Page 9: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data Science Jobs

Data scientists are responsible for the handling of raw data, analyzing it with thehelp of various techniques and presenting insights in a manner that are useful forpredicting business problems. A Data Scientist uses Machine Learning and alsopredicts the future based on past patterns. The average salary range (US) for aData scientist is $119,000.

Data analyst is the one who analyses data. But this process requires creatingsystems that help users of business to draw out insights and ensure data quality.His role is to collect, process, and perform statistical data analyses. Data Analystfinds meaningful information from available data and uses R or SAS. Not just ITindustries, but all kinds of companies in the industries i.e. healthcare, automobile,finance, retail, and insurance need Data Analysts to run their business. Theaverage annual salary (US) for Data analysts is $62,000.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 9 / 39

Page 10: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data Science Jobs

The role of Data Architect is to create data management systems to integrate,protect and maintain data sources and company’s information. He is responsiblefor database architecture, design, creation and optimization of data. Technologieslike Pig, Spark, SQL, XML, and Hive are required to be mastered by dataarchitects. The average annual salary (US) for this career is $100,000.

Data Engineers are not the ones who analyze data but builds a certain softwareinfrastructure for other professionals to do the work. They are able to do this asthey have an in-depth knowledge of Hadoop and Big Data technologies such asMapReduce, Hive, and Pig, NoSQL technologies, SQL technologies. His role is todevelop, test and maintain large scale processing systems. More than 50 percentof the work is Data Wrangling, where data engineers excel who has a backgroundin software engineering. The average salary (US) for this job is $95,000.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 10 / 39

Page 11: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Artificial Intelligence

Artificial intelligence (AI), sometimes called machine intelligence, is intelligencedemonstrated by machines, in contrast to the natural intelligence displayed byhumans. Colloquially, the term ”artificial intelligence” is often used to describemachines (or computers) that mimic ”cognitive” functions that humans associatewith the human mind, such as ”learning” and ”problem solving”.

Analytical AI has only characteristics consistent with cognitive intelligence;generating a cognitive representation of the world and using learning basedon past experience to inform future decisions.

Human-inspired AI has elements from cognitive and emotional intelligence;understanding human emotions, in addition to cognitive elements, andconsidering them in their decision making.

Humanized AI shows characteristics of all types of competencies (i.e.,cognitive, emotional, and social intelligence), is able to be self-conscious andis self-aware in interactions.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 11 / 39

Page 12: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Impact on society?

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 12 / 39

Page 13: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Digital Transformation

Digital transformation (DX) is the reworking of the products, processes andstrategies within an organization by leveraging current technologies.Common challenges:

Scale: How can an established organization that operates on an analogbusiness model fundamentally change the way it identifies, develops, andlaunches new ventures without losing effectiveness?

Talent: How can organizations that desire digital transformation train,retain, and attract the most talented individuals to change their organizationwithout uprooting or losing sight of collaborators that made them greatcompanies in the first place?

Metrics: How do newly digital organizations measure their successes andfailures in comparison to their formerly analog selves?

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 13 / 39

Page 14: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Digital Transformation

Go Digital: It is a matter of infrastructure and investment.

Be Digital: It depends on culture and practices changes and much moreinvestment.

What’s the slope of your organization?

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 14 / 39

Page 15: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Digital Transformation

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 15 / 39

Page 16: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Analytics across time

Data

Report

Statistical analysis

Descriptive models

Predictive models

Human-in-the-loop models

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 16 / 39

Page 17: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Analytics across timeNavigation information

Data: vehicle location

Report: location history

Statistical analysis: probability distribution of route duration

Descriptive models: segmentation of route duration per time period

Predictive models: estimate time duration considering conditions

Human-in-the-loop models: route adaptation assisted by app.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 17 / 39

Page 18: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data Science Issues

Data

Models and techniques

Technonology

Skills and culture

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 18 / 39

Page 19: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

*Data*

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 19 / 39

Page 20: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data Science Effort Distribution

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 20 / 39

Page 21: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Top Data Science Methods

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 21 / 39

Page 22: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Models and Techniques

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 22 / 39

Page 23: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Top Analytics Software

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 23 / 39

Page 24: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Technology Landscape

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 24 / 39

Page 25: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data ScienceSkills

Computational thinking

Analytical ability

Quantitative ability

Algorithmical ability

Computational literacy

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 25 / 39

Page 26: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Data Science Ecosystem

Leaders: understand the potential of DS and create the conditions for itsdevelopment.

Data Scientists: Design and implement models, methods and techniquesthat are data intensive.

Translator: Identify opportunities and promote the matching betweendemands and resources available.

End user: Collaborators that will be empowered by Data Science.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 26 / 39

Page 27: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Road to Data Science

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 27 / 39

Page 28: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Are we done?

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 28 / 39

Page 29: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

FATES

Fairness

Accountability

Transparency

Ethics

Safety and Security

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 29 / 39

Page 30: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Fairness

Fairness means that the models we build are used to make unbiased decisions(e.g., classifications) or predictions.

Defining fairness formally is an active area of research, of interest to computerscientists, social scientists, and legal scholars.

Example: Propublica study shows that a machine learning model, used by courtsin the US, to predict recidivism is biased against blacks over whites. This studyled academics to show the impossibility of satisfying two different, but reasonablenotions of fairness simultaneously.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 30 / 39

Page 31: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Accountability

Accountability means to determine and assign responsibility – to someone orsomething – for a judgment made by a machine. Assigning responsibility can beelusive because there are people, processes, and organizations as well asalgorithms, models, and data behind any judgment.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 31 / 39

Page 32: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Accountability

Example: Google ads for high-paying jobs were shown more to men than towomen, it’s not at all clear whom or what to blame. Who is responsible for thediscrimination results? We can think of a few reasons why the discriminationresults may have appeared:

The advertiser’s targeting of the ad

Google explicitly programming the system to show the ad less often tofemales

Males and female consumers respond differently to ads and Google’stargeting algorithm responds to the difference (e.g., Google learned thatmales are more likely to click on this ad than females are)

More competition existing for advertising to females causing the advertiser towin fewer ad slots for females

Some third party (e.g., a hacker) manipulating the ad ecosystem

Some other reason we haven’t thought of.

Some combination of the above.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 32 / 39

Page 33: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Transparency

Transparency means being open and clear to the end user about how an outcome,e.g., a classification, a decision, or a prediction, is made. Transparency can enableaccountability.

The massive amounts of data collected by third parties about our behavior meansthere is more information that others have about us than we have about ourselves.This lack of transparency between data collectors and data underlies the “inverseprivacy” problem: the inaccessibility of data collected by others about us

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 33 / 39

Page 34: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Transparency

The EU GDPR’s “right to explanation” calls for transparency of data-drivenautomated decision-making (from Article 13, paragraph 2):

2. In addition to the information referred to in paragraph 1, the controller shall, atthe time when personal data are obtained, provide the data subject with thefollowing further information necessary to ensure fair and transparent processing:. . .(f) the existence of automated decision-making, including profiling, referred to inArticle 22(1) and (4) and, at least in those cases, meaningful information aboutthe logic involved, as well as the significance and the envisaged consequences ofsuch processing for the data subject.

“Meaningful information about the logic involved” suggests that some kind ofjustification is required by data collectors to provide data subjects.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 34 / 39

Page 35: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Explainable AI

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 35 / 39

Page 36: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Ethics

Ethics for data science means paying attention to both the ethical andprivacy-preserving use and collection of data as well as the ethical decisions thatthe automated systems we build will make.

1 the ethical issues relate to fairness, accountability, and transparency withrespect to the data collected about individuals and organizations. What dataneeds to be collected and for what purposes are the data intended to beused? How transparent to the end user are these policies?

2 machines will be programmed to make ethical decisions, some of which haveno right or wrong answer. The canonical “Trolley Car Problem” raises theethical question of whether it is better to kill one person or five. The ethicaldilemma is that there is no right answer to this question.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 36 / 39

Page 37: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Safety and Security

Safety and security means ensuring that the systems we build are safe (do noharm) and secure (guard against malicious behavior).

If we cannot ensure their safety, then consumers will not trust them. Onelongstanding technical challenge is to verify the safety of a digital controllerinteracting with a physical environment in the presence of uncertainty. One mustreason about a combinatorial number of possible events and many relevanthigh-dimensional variables.

A new technical challenge is to verify AI systems trained on big data, e.g., a smartcar’s cameras use computer vision models trained on DNNs. Examples of incorrectbehaviors are self-driving cars crashing into guardrails. This dimension alignsnicely with the need for accountability and transparency of machine-learningalgorithms and models.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 37 / 39

Page 38: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Safety and Security

Data science raises new security vulnerabilities.

Not only do we need to protect our network, our computers, our devices, and oursoftware, but now we need to protect our data and our machine learningalgorithms and models.

Attackers can tamper with the data, thus producing a model that makes wrongdecisions or predictions. The field of adversarial machine learning studies howmalicious actors can manipulate training and test data and attack machinelearning algorithms. The distinctive context here is that algorithms are working inan environment that adapts and learns from the system’s behavior to wreakhavoc. Whereas for safety, our trained systems need to work in unpredictableenvironments, for security they work in adversarial ones.

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 38 / 39

Page 39: Big Data, Data Science, Arti cial Intelligence and Digital …evcomp.dcc.ufmg.br/wp-content/uploads/meira20200217.pdf · 2020-02-27 · MapReduce, Hive, and Pig, NoSQL technologies,

Conclusions

Shangri La is not just a technical and/or technological issue

Analytics will happen anyway.

There is no single solution for all demands.

One professional profile does not fulfill all demands

Technology keeps advancing fast, despite some clear definitions.

The relevance of algorithms also comes with responsibilities.

Making algorithms compatible with ethics and legal requirements may be hard

Research and development opportunities in all levels.

Optimistic view about CS and its impact on society. Another opportunity!

Meira Jr. (UFMG) Is there a Shangri La? February 17, 2020 39 / 39