katja rausch – big data psb 2019 - kara.lu fileand analyze in real-time 500 million detailed...
TRANSCRIPT
Katja Rausch – Big Data PSB 2019
Uses of Big Data
Understanding and Targeting Customers
create predictive models.
Understanding and Optimizing Business Processes
optimize stock or improve geographic positioning and RFID sensors.
Personal Quantification and Performance Optimization
Armband sensor collects data on calorie consumption, activity levels, and sleep patterns
Improving Sports Performance
track performance of player in football or baseball games
IBM SlamTracker tool for tennis tournaments
sensor technology in sports equipment (basket balls, golf)
Improving Security and Law Enforcement
Optimizing Cities and Countries
High-Financial Trading
Improving Healthcare and Public Health
decode entire DNA strings in minutes.
predict epidemics and disease outbreaks
Katja Rausch – Big Data PSB 2019
Disrupting data processes
Big Data is a disrupting data process.
Transition from the data warehouse paradigm
to a data lake, the cloud, and machine learning
along with deep learning and AI.
Katja Rausch – Big Data PSB 2019
1997 : Term « big data » used for the first time in an article published by the ACM
1999 : « big data » mentioned in a titel “Big Data for Scientific Visualization”
2010 : Kenneth Cukier publishes in The Economist a Special Report
titled, “Data, data everywhere.”
Katja Rausch – Big Data PSB 2019
Big Data on the cover of The Economist
February 2010 Kenneth Cukier publishes in The Economist a Special Report titled, “Data, data everywhere.”
Cukier: “…the world contains an unimaginably vast amount of digital information which is getting ever vaster more rapidly…
… The effect is being felt everywhere, from business to science, from governments to the arts. Scientists and computer engineers have coined a new term for the phenomenon: ‘big data’.”
24.02.2019 15
Big Data covers 3 dimensions
3V
Volume
Variety
Velocity
Definition : Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions. (Oxford Dictionary)
Katja Rausch – Big Data PSB 2019
Variety
Structured data. Data that reside in fixed fields. Examples of structured data include relational databases or data in spreadsheets.
Unstructured data. Data that do not reside in fixed fields. Examples include free-form text (e.g., books, articles, body of e-mail messages), untagged audio, image and video data.
Semi-structured data. Data that do not conform to fixed fields but contain tags and other markers to separate data elements. Examples of semi-structured data include XML or HTML-tagged text.
Big Data comes in structured, semi-structured and unstructured data
With rapidly growing amount of unstructured data
(sensors, video streams from the cameras ofmonitoring to control points of interest...)
Katja Rausch – Big Data PSB 2019
Source:http://www.cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html, pg 5
Volume : Major Drivers of Data Demand: Streaming Video & Social Media
24.02.2019 21
Velocity
Big Data collection is mainly done in real-time.
For chronosensitive processes such as detection of fraud, Big Data is used to measure the collected data.
Possible to scan 5 million business events by day to identify potential frauds. And analyze in real-time 500 million detailed records of daily calls.
Katja Rausch – Big Data PSB 2019
Major Big Data technologies
Business intelligence (BI)
A type of application software designed to report, analyze, and present data. BI tools are used to read data stored in a data warehouse or data mart. BI tools create standard reports or to display real-time management dashboards.
Data warehouse
Specialized database optimized for reporting, often used for storing large amounts of structured data. Data is uploaded using ETL tools from operational data stores, and reports are often generated using business intelligence tools.
Data mart. Subset of a data warehouse.
Extract, transform, and load (ETL)
Software tools used to extract data from outside sources, transform them to fit operational needs, and load them into a database or data warehouse.
.
Katja Rausch – Big Data PSB 2019
Big Data techniques
A/B testing
A technique in which a control group is compared with a variety of test groups in order to determine what treatments (i.e., changes) will improve a given objective variable, e.g., marketing response rate. Also known as split testing or bucket testing.
Machine learning
A subspecialty of “artificial intelligence”. Algorithms that allow computers to evolve behaviors learned on empirical data. A major focus is to automatically learn to recognize complex patterns and make intelligent decisions based on data.
Natural language processing (NLP)
An example of machine learning. Linguistics that uses computer algorithms to analyze human (natural) language.
Neural networks also known as deep learning
Computational models, inspired by the structure of neural networks that find nonlinear patterns in data. Neural network applications involve supervised learning and unsupervised learning.
Katja Rausch – Big Data PSB 2019
Big Data & Digital Divide
Only a relatively small number of entities have the infrastructures and skills to acquire, hold, process and benefit
from big data
Katja Rausch – Big Data PSB 2019
Big Data & Digital Divide
While the question of who owns data is a legal one, the
consequences of inequality of access poses ethical
questions.
Who can access data?
Who governs data access?
Katja Rausch – Big Data PSB 2019
High-profile ethical breaches
In June 2014 a Facebook-Cornell University study shocked when Facebook was revealed to have been experimenting on the emotional state of 700,000 of its users back. The general public
was outraged that the company had violated ethical guidelines and “harmed” its users.
(1) [d]ata through intervention or interaction with the individual, or (2) [i]dentifiable private information.”
Ever since, Facebook has established an ethics review
process for research based on the user data
Katja Rausch – Big Data PSB 2019
Ethical issues & data issues
Virtue
Utility
Responsibility
Freedom
Equality
Justice
consent, de- identification
accountability
ownership,
access
respect of human rights
intellectual property
group discrimination
power usage
Katja Rausch – Big Data PSB 2019
Modern Big Data/Analytics environment
1 Collection 2 Storage 3 Process
4 Use
Transparency
Accountability
Responsibility
Privacy
Integrity
5 Destruction
Katja Rausch – Big Data PSB 2019
The User’s Rights
UN Declaration of Human Rights - 1948 – All human beings are born free and equal in dignity and rights. Main ideas : Freedom and security.
The Nuremberg Code of Ethics - 1947 – Main ideas : 10 principles for human experimentation based on consent and good for society
The Declaration of Helsinki - 1964 - ethical principles for doctors involved in medical research. Main idea : safeguard of health to people
The Belmont Report – 1974 – Ethical principles for the protection of human in research
Katja Rausch – Big Data PSB 2019
a set of 10 research ethics principles for human experimentation set as a result of the subsequent Nuremberg trials at the end of the Second World War.
Katja Rausch – Big Data PSB 2019
The World Medical Association's Declaration of Helsinki was first adopted in 1964. In its 40-year lifetime the Declaration has been revised 5 times and has risen to a position of prominence as a guiding statement of ethical principles for doctors involved in medical research.
Katja Rausch – Big Data PSB 2019
European Union ?
Regulation (EU) 2016/6791, the European Union’s ('EU') new General Data Protection Regulation (‘GDPR’), regulates the processing by an individual, a company or an organisation of personal data relating to individuals in the EU.
25 May 2018
Katja Rausch – Big Data PSB 2019
What does it govern ?
The rules don’t apply to data processed by an individual for purely personal reasons or for activities carried out in one's home, provided there is no connection to a professional or commercial activity.
Examples
Applies for a company with an establishment in the EU provides travel services to customers based in the Baltic countries and in that context processes personal data of natural persons.
Doesn’t apply for any individual using their own private address book to invite friends via email to a party that they are organising (household exception).