katja rausch – big data psb 2019 - kara.lu fileand analyze in real-time 500 million detailed...

37
Katja Rausch – Big Data PSB 2019 1

Upload: others

Post on 05-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Katja Rausch – Big Data PSB 2019

1

Katja Rausch – Big Data PSB 2019

Uses of Big Data

Understanding and Targeting Customers

create predictive models.

Understanding and Optimizing Business Processes

optimize stock or improve geographic positioning and RFID sensors.

Personal Quantification and Performance Optimization

Armband sensor collects data on calorie consumption, activity levels, and sleep patterns

Improving Sports Performance

track performance of player in football or baseball games

IBM SlamTracker tool for tennis tournaments

sensor technology in sports equipment (basket balls, golf)

Improving Security and Law Enforcement

Optimizing Cities and Countries

High-Financial Trading

Improving Healthcare and Public Health

decode entire DNA strings in minutes.

predict epidemics and disease outbreaks

Katja Rausch – Big Data PSB 2019

Disrupting data processes

Big Data is a disrupting data process.

Transition from the data warehouse paradigm

to a data lake, the cloud, and machine learning

along with deep learning and AI.

Katja Rausch – Big Data PSB 2019

Timeline of the term « Big Data »

Katja Rausch – Big Data PSB 2019

1997 : Term « big data » used for the first time in an article published by the ACM

1999 : « big data » mentioned in a titel “Big Data for Scientific Visualization”

2010 : Kenneth Cukier publishes in The Economist a Special Report

titled, “Data, data everywhere.”

Katja Rausch – Big Data PSB 2019

Big Data on the cover of The Economist

February 2010 Kenneth Cukier publishes in The Economist a Special Report titled, “Data, data everywhere.”

Cukier: “…the world contains an unimaginably vast amount of digital information which is getting ever vaster more rapidly…

… The effect is being felt everywhere, from business to science, from governments to the arts. Scientists and computer engineers have coined a new term for the phenomenon: ‘big data’.”

Katja Rausch – Big Data PSB 2019

Big Data : Definition

24.02.2019 15

Big Data covers 3 dimensions

3V

Volume

Variety

Velocity

Definition : Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions. (Oxford Dictionary)

Katja Rausch – Big Data PSB 2019

Variety

Structured data. Data that reside in fixed fields. Examples of structured data include relational databases or data in spreadsheets.

Unstructured data. Data that do not reside in fixed fields. Examples include free-form text (e.g., books, articles, body of e-mail messages), untagged audio, image and video data.

Semi-structured data. Data that do not conform to fixed fields but contain tags and other markers to separate data elements. Examples of semi-structured data include XML or HTML-tagged text.

Big Data comes in structured, semi-structured and unstructured data

With rapidly growing amount of unstructured data

(sensors, video streams from the cameras ofmonitoring to control points of interest...)

Katja Rausch – Big Data PSB 2019

Source:http://www.cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html, pg 5

Volume : Major Drivers of Data Demand: Streaming Video & Social Media

24.02.2019 21

Velocity

Big Data collection is mainly done in real-time.

For chronosensitive processes such as detection of fraud, Big Data is used to measure the collected data.

Possible to scan 5 million business events by day to identify potential frauds. And analyze in real-time 500 million detailed records of daily calls.

Katja Rausch – Big Data PSB 2019

Major Big Data technologies

Business intelligence (BI)

A type of application software designed to report, analyze, and present data. BI tools are used to read data stored in a data warehouse or data mart. BI tools create standard reports or to display real-time management dashboards.

Data warehouse

Specialized database optimized for reporting, often used for storing large amounts of structured data. Data is uploaded using ETL tools from operational data stores, and reports are often generated using business intelligence tools.

Data mart. Subset of a data warehouse.

Extract, transform, and load (ETL)

Software tools used to extract data from outside sources, transform them to fit operational needs, and load them into a database or data warehouse.

.

Katja Rausch – Big Data PSB 2019

Big Data techniques

A/B testing

A technique in which a control group is compared with a variety of test groups in order to determine what treatments (i.e., changes) will improve a given objective variable, e.g., marketing response rate. Also known as split testing or bucket testing.

Machine learning

A subspecialty of “artificial intelligence”. Algorithms that allow computers to evolve behaviors learned on empirical data. A major focus is to automatically learn to recognize complex patterns and make intelligent decisions based on data.

Natural language processing (NLP)

An example of machine learning. Linguistics that uses computer algorithms to analyze human (natural) language.

Neural networks also known as deep learning

Computational models, inspired by the structure of neural networks that find nonlinear patterns in data. Neural network applications involve supervised learning and unsupervised learning.

Katja Rausch – Big Data PSB 2019

Major ethical issue

Digital Divide

Katja Rausch – Big Data PSB 2019

GAFA

Katja Rausch – Big Data PSB 2019

Big Data & Digital Divide

Only a relatively small number of entities have the infrastructures and skills to acquire, hold, process and benefit

from big data

Katja Rausch – Big Data PSB 2019

William Uri – Harvard - « The power of a positive no »

Katja Rausch – Big Data PSB 2019

Big Data & Digital Divide

While the question of who owns data is a legal one, the

consequences of inequality of access poses ethical

questions.

Who can access data?

Who governs data access?

Katja Rausch – Big Data PSB 2019

High-profile ethical breaches

In June 2014 a Facebook-Cornell University study shocked when Facebook was revealed to have been experimenting on the emotional state of 700,000 of its users back. The general public

was outraged that the company had violated ethical guidelines and “harmed” its users.

(1) [d]ata through intervention or interaction with the individual, or (2) [i]dentifiable private information.”

Ever since, Facebook has established an ethics review

process for research based on the user data

Katja Rausch – Big Data PSB 2019

Big Data’s real-time data collection brings new

ethical challenges

Katja Rausch – Big Data PSB 2019

Ethical issues & data issues

Virtue

Utility

Responsibility

Freedom

Equality

Justice

consent, de- identification

accountability

ownership,

access

respect of human rights

intellectual property

group discrimination

power usage

Katja Rausch – Big Data PSB 2019

Modern Big Data/Analytics environment

1 Collection 2 Storage 3 Process

4 Use

Transparency

Accountability

Responsibility

Privacy

Integrity

5 Destruction

Katja Rausch – Big Data PSB 2019

The User’s Rights

UN Declaration of Human Rights - 1948 – All human beings are born free and equal in dignity and rights. Main ideas : Freedom and security.

The Nuremberg Code of Ethics - 1947 – Main ideas : 10 principles for human experimentation based on consent and good for society

The Declaration of Helsinki - 1964 - ethical principles for doctors involved in medical research. Main idea : safeguard of health to people

The Belmont Report – 1974 – Ethical principles for the protection of human in research

Katja Rausch – Big Data PSB 2019

Katja Rausch – Big Data PSB 2019

a set of 10 research ethics principles for human experimentation set as a result of the subsequent Nuremberg trials at the end of the Second World War.

Katja Rausch – Big Data PSB 2019

The World Medical Association's Declaration of Helsinki was first adopted in 1964. In its 40-year lifetime the Declaration has been revised 5 times and has risen to a position of prominence as a guiding statement of ethical principles for doctors involved in medical research.

Katja Rausch – Big Data PSB 2019

Katja Rausch – Big Data PSB 2019

European Union ?

Regulation (EU) 2016/6791, the European Union’s ('EU') new General Data Protection Regulation (‘GDPR’), regulates the processing by an individual, a company or an organisation of personal data relating to individuals in the EU.

25 May 2018

Katja Rausch – Big Data PSB 2019

What does it govern ?

The rules don’t apply to data processed by an individual for purely personal reasons or for activities carried out in one's home, provided there is no connection to a professional or commercial activity.

Examples

Applies for a company with an establishment in the EU provides travel services to customers based in the Baltic countries and in that context processes personal data of natural persons.

Doesn’t apply for any individual using their own private address book to invite friends via email to a party that they are organising (household exception).

Katja Rausch – Big Data PSB 2019

Katja Rausch – Big Data PSB 2019

What does it mean for us ?

Katja Rausch – Big Data PSB 2019

Katja Rausch – Big Data PSB 2019

Katja Rausch – Big Data PSB 2019

Katja Rausch – Big Data PSB 2019

Katja Rausch – Big Data PSB 2019

Katja Rausch – Big Data PSB 2019