vidya jyothi prof v. k. samaranayake memorial oration 2016 ... · powerpoint presentation author:...

36
BLAH BLAH BLAH Name Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 Big Myths and Big Opportunities in Big Data Analytics: A Data Engineering Perspective Professor Saman Halgamuge Optimization and Pattern Recognition Group The University of Melbourne

Upload: others

Post on 29-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

BLAH BLAH BLAH

Name

Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016

Big Myths and Big Opportunities in Big Data Analytics: A Data Engineering

PerspectiveProfessor Saman Halgamuge

Optimization and Pattern Recognition GroupThe University of Melbourne

Page 2: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

2

Outline• A personal tribute to the great visionary Prof V. K.

Samaranayake- Stories of inspiration

• Why is Data Science/Data Engineering the continuation of his legacy?

• What is Big Data? How does it affect us?

• Data Engineering: an engineering enabled approach to data analytics

• Some examples of our applied research in Big Data Analytics in collaboration with industry and other organisations

• Two examples of Big Data Analytic Problems relevant to Sri Lanka

Page 3: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

3

Two Pioneers of Statistics, Data Science and Computing in the Region

Prof Prasanta Chandra Mahalanobis, Founder of Indian Statistical Institute in Calcutta

Prof V. K Samaranayake, Founder of University of Colombo School of Computing

Page 4: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Vision, Courage, Efficiency and Strategy

Page 5: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Inspiring, Caring and Nurturing

The inventor of iHelmet – Forbes Asia 30 under 30: Consumer Tech (2016)

Minor Planet (26441) named after him (2010)

At the age of 15: Young Computer Scientist (YCS) - Gold award, Sri Lanka Association for the Software Industry (SLASI)2006

Page 6: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Real complex problems may not have many clues or reliable data!

Page 7: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

7

Big Data: A Data Engineering Perspective

• The 3 Vs of Big Data:– Velocity – Variety (Variability, Veracity)– Volume

• Need to capture, curate, store, visualize, update, process/analyse and pay attention to information privacy

• Data engineers develop and use electronic/mechanical/chemical/biologicalhardware/``wetware” for– Capturing data– Storing data– Process/analyse big data quickly (Big Data Analytics)– Information privacy/security– ……..

Page 8: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

8

Big Data Analytics: Good old wine in new bottles?

• We create data profiles about our behaviours that can be captured, stored and sold!

• Yet some problems are still unsolved: e.g. MH 370

• How does Big Data Analytics differ from “Data Analysis”, “Pattern Recognition”, “Data Mining”…?

• Need to develop new and faster methods to analyse large volumes of data of multiple varieties/uncertainties/inaccuracies!

• Sometimes we know only a little or nothing about the information hidden in Data!

Page 9: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

9

Big Data Analytics: Machine Learning Approaches

• Supervised Learning (know everything)

• Unsupervised Learning (know nothing)– Self Organizing Maps

– Growing Self Organizing Maps (Our past work)

– DEEP UNSUPERVISED LEARNING (Our new work)

• Semi-supervised learning (know something)– Near Supervised learning: Mostly available methods

– Near Unsupervised Learning (NUL): Our recent work

• What are the major differences? – Speed, Accuracy, Applicability…

Page 10: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

APPROACH: SupervisedLearning

100%

50%

50% 100%

Percentage of Known data

labels

Percentage of known classes

DATA ANALYTICS: Classification/Clustering Approaches- Supervised

EXAMPLE:SEPARATE Bananas from Apples:

We teach a child using some “samples”:

APPLES are ROUND BANANAS are NOT ROUNDTaste differentLook different…..

The child can then learn to differentiate Apples from Bananas

TRAIN A CHILDNeed a supervisor

Page 11: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

SupervisedLearning

APPROACH: UnsupervisedLearning

100%

50%

50% 100%

Percentage of Known data

labels

Percentage of known classes

DATA ANALYTICS: Classification/Clustering Approaches- Unsupervised

EXAMPLE:No clue about the different types of fruits in the basket

If Supervised Learning is about training a child, Unsupervised Learning is about asking a learned/experienced person/professor for the opinion/best guess

No training is possible

Page 12: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Near-UnsupervisedLearning

Semi-SupervisedLearning

SupervisedLearning

UnsupervisedLearning

100%

50%

50% 100%

Percentage of known classes

Near Unsupervised Learning (our ARC funded work)

Percentage of Known data

labels

Page 13: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Near-UnsupervisedLearning

Semi-SupervisedLearning

SupervisedLearning

UnsupervisedLearning

100%

50%

50% 100%

Percentage of known classes

Positive Unlabelled Learning (Current research with applications)

50% 100%

Positive Unlabelled Learning

Percentage of Known data

labels

Page 14: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Semi-SupervisedLearning

SupervisedLearning

UnsupervisedLearning

100%

50%

50% 100%

Percentage of known classes

My Interpretation of Data Analytics Space (marked in Yellow)

50% 100%

Percentage of Known data

labels

Positive Unlabelled Learning

Good news for enthusiastic Computer Scientists, Mathematicians, Engineers…:Mostly unexplored Territory when moving into the era of “BIG DATA”

Page 15: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

THE MICROBIAL WORLD

15

Image: http://commons.wikimedia.org/BIG DATA APPLICATIONS:LEARNING FROM THE SMALEST AND THE OLDEST LIFE FORMS

Page 16: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM
Page 17: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

17

Application in Metabolomics• Metabolomics (2013) =>

Page 18: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Application in Neural Engineering

Page 19: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Computational Neuroscience

19

Micro Electrode Array: New Technology demanding Big data Analytics

Page 20: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Culture Preparation & Data Acquisition

Dissection of newborn mice and extracting the cortex

Culture Preparation

Plating MEAs

Maintaining cultures

Electrophysiological recording

Images courtesy of Multichannel Systems, Potter Lab (GeorgiaTech), Nature protocols

Electrodes Neurons

MEA voltage signals

Page 21: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Data aquisition

High Pass Filtering

Spike Detection

Burst detection

NB Detection

NB analysis

Rate

Intervals

Duration

Spikes in NBs

Channels in NBs

Jitter

Burst analysis

Rate

Duration

Spikes in Bursts

Spike analysis

Firing rate

Amplitude (dep/hyp)

Spike width

Spike Burst

Network Burst

(NB)

Ch

ann

els

High frequency features

Time (s)

Tonic

channel

Page 22: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Low frequency features

Drug A Drug B

Data acquisition

Low Pass Filtering

LFP Detection

LFP analysis

Rate

Amplitudes

Intervals

Width

• Local Field Potentials (LFPs): Low frequency events that represents ionic gradients near electrodes resulting from the activity of multiple neurons

• Can be used to differentiate drugs

Page 23: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Select concentrations and time points

Signature network activity patterns for:• genotypes • drugs

Microelectrode Array (MEA) data

NormalizationOutlier

removal

Machine Learning

Feature extraction

Exploratory Data Analysis

Feature selection

Data Analysis Pipeline

Model

Unlabelled drugs or genotypes

Predict classes

Data Analytic System

Page 24: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Epileptic Phenotypes in Cultured Neuronal Networks

24

Extracellular voltages

Neuronal Networks Microelectrode Array

Elec

tro

des

/Ch

ann

els

Time

Page 25: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Epileptic Phenotypes in Cultured Neuronal Networks25

With genetic mutations“transgenic”

Wild type

Change in signal dynamics

Anti-epileptic Drugs ?

Electrical Stimulations

?

Time

Ele

ctro

des

Page 26: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Epileptic Phenotypes in Cultured Neuronal Networks

26

With genetic mutationsWild type

Change in signal dynamics

Spontaneous network activity

Features:• Amplitudes• Frequencies• Action potential (spike) timing• Bursting• Synchronization• Action propagation patterns

Network activity after drug application

Spontaneous network activity

Network activity after drug application

Exploratory data analysis

+Pattern Recognition

BIG DATA PROBLEM CREATED DUE TO NEW MULTI ELECTRODE ARRAY TECHNOLOGY

Page 27: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Clustering Weekly recordings of Neuronal cultures

Transgenic (with mutation)

Wild Type

27

Biological variability: • Variability between cultures containing neurons from different animals• Variability within different cultures containing neurons from the same animal• Variability among different “culture ages”

GOAL:Investigating separability between transgenic and wild type cultures amidst biological variability

Unsupervised Learning Method GrowingSelf Organizing Map with nodesrepresenting 3 weeks of recordings each ofcultures from 6 different mice

PhD project of Dulini Mendis supervised by Steve Petrou and Saman Halgamuge

Page 28: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Can we eradicate Cancer?: CRISPR

Noise

Clean data

Incompletedata

Culprit“Target Gene”

Noise

Noise

CRISPRTechnology

Typical scenario: data of interest hidden among noise and largely incomplete data

CRISPR: Clustered regularly interspaced short palindromic repeats

Page 29: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Data Engineering: CRISPR

Noise

NoiseNoise

Generate more data where needed (informed by “smart” learning algorithms)

CRISPR

CRISPR: Clustered regularly interspaced short palindromic repeats

Page 30: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

30

Big Data in Marketing and Advertising

• Information collected about you and me:

– Anytime, Anywhere, Anything

• We create data profiles about our behaviours that can be captured, stored and sold!

• Targeted advertising using our browsing history..

• US internet advertising revenue $40bn in 2012

• Internet- only 13-19% of total media advertisement expenditure spent

Page 31: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Part 1: Synergy in Online Advertising Sequences

20 % 50 % 30 %

Part 2: Credit Allocation for Online Advertising Channels (Attribution)

How does On-line Advertising “convert” us to purchase?

Page 32: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

“Not Enough Data available in Sri Lanka”?

Crop Variation in Sri Lanka

Decision Makers

Example: Forward planning in Agriculture

Locally collected inaccurate data

NOISE

How far are we away from “Precision Agriculture”?

Photo credit: https://mosesorganic.org/wp-content/uploads/Publications/Broadcaster/March2014/cover-crop-field.jpg

Can Data Engineers help?

Page 33: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

“Not Enough Energy Storage in Sri Lanka”?

Geothermal heating andcooling

Source – http://www.smartgrids.eu/News_2014_and_beforehttps://www.bspq.com.au/tesla-powerwall-reviewhttp://thephilanews.com/eu-u-s-coordinating-electric-vehicle-smart-grid-development-41215.htm

Least Cost Storage & Transmission Assets

Two Current relevant PhD Projects: Khalid Abdulla and Hansani Weeratunga

WHY SOME PEOPLE ARE AFRAID OF CONSIDERING OFF-THE-GRID Solutions?

We consider them as options for other places, for example in Indonesia

Page 34: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

“Unreliable, inefficient, expensive…”?

• Utilizing renewable energy resources as solar, wind, geothermal,hydro power, tidal, bio fuel in a controlled manner with cleverstorage management would overcome the draw backs of asingle system.

• It would lead to a hybrid renewable energy system which cansupport off the grid electrification system which enables areliable power supply and Increases the overall percentage ofrenewable energy generation capacity.

Source-http://energy.gov/eere/femp/renewable-energy-technologies-federal-projects

Can we predict the peak demand and reduce its dominance?

Page 35: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

35

Acknowledgment

This work is partially supported by 15 University of Melbourne PhD Scholarships and Australian Research Council Grants: “Near Unsupervised computational methods for exploring omic data (DP150103512), “Discovering Patterns using Near Unsupervised Learning to Support the Quick Detection of New Animal Disease Outbreaks Caused by Viruses” (LP140100670) and YourGene Australia.

Current PhD students: C. Wijetunga, D. Mendis, Y. Deerasooriya, H. Weeratunga, P. Hamead, C. Jayawardena, D. Herath, D. Senanayake, A. Khalaj, W. Wei, Y. Sun and K. Abdulla and previous students K. Amarasinghe, Z. Li, U. Premaratne, S. Jayasekara, D. Jayasundara, D. Alahakoon and K. Chan and

Research collaborators: S. L. Tang, S. Petrou, U. Kayande, K. Steer, A. Wirth, B. Chang, M. Premaratne, A. Hsu, I. Saeed, U. Roessner, M. Kirley, K. Verspoor, J. Browne, G. Narsilio, D. Ackland and A. Bacic are acknowledged.

Page 36: Vidya Jyothi Prof V. K. Samaranayake Memorial Oration 2016 ... · PowerPoint Presentation Author: ucsc Created Date: 7/7/2016 12:00:48 PM

Thank you

[email protected]