information theory (it) of machine learning for big datajunga1/futureittalk.pdf · claude elwood...

21
aalto-logo-en-3 Information Theory (IT) of Machine Learning for Big Data Alex(ander) Jung, Aalto University October 24, 2017 1 / 21

Upload: others

Post on 15-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Information Theory (IT) ofMachine Learning for Big Data

Alex(ander) Jung, Aalto University

October 24, 2017

1 / 21

Page 2: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Outline

1 Introduction

2 The IT Age

3 IT for Machine Learning Research

4 Wrap Up

2 / 21

Page 3: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

About Me

MSc (2008) and Phd (2012) in electrical engineering/signalprocessing at TU Vienna

Post-Doc stay at ETH Zurich 2012

Assistant Professor TU Vienna 2013-2015

since 2015, Ass. Prof. for Machine Learning at Aalto CS

3 / 21

Page 4: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

My Research Group

heading the group “Machine Learning for Big Data”

currently five Phd students, several MSc and BSc students

research revolves around fundamental limits and efficientalgorithms for machine learning involving massive,decentralised datasets (big data)

4 / 21

Page 5: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

My Teaching

since 2015, CS-E3210 “Machine Learning: Basic Principles”(this year 600 students)

since 2016, CS-E4020 “Convex Optimization for Big Data”(this year 50 students)

from 2018, CS-E4800 “Artificial Intelligence” (expected atleast 100 students)

5 / 21

Page 6: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Some Brainy Quotes on The Data Deluge

“We’re Drowning in Information and Starving for Knowledge.”- Rutherford D. Rogers.

“There is Nothing More Practical Than a Good Theory.”- Kurt Lewin.

6 / 21

Page 7: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Outline

1 Introduction

2 The IT Age

3 IT for Machine Learning Research

4 Wrap Up

7 / 21

Page 8: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

A Father of IT

Claude Elwood Shannon (1916 - 2001)

8 / 21

Page 9: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

The Communication Problem

characterize noisy channel by single number C (capacity)

reliable communication possible for rates (in bit/s) < C

9 / 21

Page 10: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

The Evolution of IT

Shannon’s key paper on channel capacity published 1948

it took some years to find efficient coding methods ...

milestone is invention of Turbo Codes (TC) in 1990s

TC reach capacity using “simple” hardware (your mobile)

TC used nowadays in

3G and 4G mobile telephony standards

satellite communication

wireless network standards (WiMAX)

recent focus on network information theory

10 / 21

Page 11: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

A Modern Communication System

11 / 21

Page 12: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Outline

1 Introduction

2 The IT Age

3 IT for Machine Learning Research

4 Wrap Up

12 / 21

Page 13: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Ski Resort Marketing

you are working in the marketing agency of a ski resort

hard disk full of webcam snapshots (gigabytes of data)

want to group them into “winter” and ”summer” images

you have only a few hours for this task ...

13 / 21

Page 14: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Webcam Snapshots

ith snapshot represented by feature vector x(i) ∈ Rd

find labels y (i) = 1 if ith image from summer, else y (i) = 0

14 / 21

Page 15: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Labeled Webcam Snapshots

select randomly N = 6 snapshots

manually categorise/label them (y = 1 for summer)

15 / 21

Page 16: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Towards an ML Problem

we have few labeled snapshots

need an algorithm/method/software-app to automaticallylabel all snapshots as either “winter” or “summer”

interpret this ML problem as communication problem ...

16 / 21

Page 17: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Machine Learning Problem = Communication Problem

labeled dataset X(train) provides training for learning channel

classifier y(x) decodes the feature x to detect true label y

when is reliable classification possible?

what are good classifiers y(x)?

17 / 21

Page 18: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

A Modern Machine Learning Problem: Weather Prediction

will there be sun tmrw in Helsinki? (research collaboration withFinnish Meteorological Institute)

18 / 21

Page 19: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Outline

1 Introduction

2 The IT Age

3 IT for Machine Learning Research

4 Wrap Up

19 / 21

Page 20: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Take Home Messages

machine learning is particular form of communication

fundamental limits by capacities of observation channels

efficient coding/decoding algorithms for machine learning

20 / 21

Page 21: Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

aalto-logo-en-3

Reading Material

C.E. Shannon, “A Mathematical Theory of Communication”,The Bell System Technical Journal, Vol. 27, pp. 379-423,623–656, July, October, 1948.

see our papers at https://users.aalto.fi/~junga1/

21 / 21