information theory (it) of machine learning for big datajunga1/futureittalk.pdf · claude elwood...
TRANSCRIPT
aalto-logo-en-3
Information Theory (IT) ofMachine Learning for Big Data
Alex(ander) Jung, Aalto University
October 24, 2017
1 / 21
aalto-logo-en-3
Outline
1 Introduction
2 The IT Age
3 IT for Machine Learning Research
4 Wrap Up
2 / 21
aalto-logo-en-3
About Me
MSc (2008) and Phd (2012) in electrical engineering/signalprocessing at TU Vienna
Post-Doc stay at ETH Zurich 2012
Assistant Professor TU Vienna 2013-2015
since 2015, Ass. Prof. for Machine Learning at Aalto CS
3 / 21
aalto-logo-en-3
My Research Group
heading the group “Machine Learning for Big Data”
currently five Phd students, several MSc and BSc students
research revolves around fundamental limits and efficientalgorithms for machine learning involving massive,decentralised datasets (big data)
4 / 21
aalto-logo-en-3
My Teaching
since 2015, CS-E3210 “Machine Learning: Basic Principles”(this year 600 students)
since 2016, CS-E4020 “Convex Optimization for Big Data”(this year 50 students)
from 2018, CS-E4800 “Artificial Intelligence” (expected atleast 100 students)
5 / 21
aalto-logo-en-3
Some Brainy Quotes on The Data Deluge
“We’re Drowning in Information and Starving for Knowledge.”- Rutherford D. Rogers.
“There is Nothing More Practical Than a Good Theory.”- Kurt Lewin.
6 / 21
aalto-logo-en-3
Outline
1 Introduction
2 The IT Age
3 IT for Machine Learning Research
4 Wrap Up
7 / 21
aalto-logo-en-3
A Father of IT
Claude Elwood Shannon (1916 - 2001)
8 / 21
aalto-logo-en-3
The Communication Problem
characterize noisy channel by single number C (capacity)
reliable communication possible for rates (in bit/s) < C
9 / 21
aalto-logo-en-3
The Evolution of IT
Shannon’s key paper on channel capacity published 1948
it took some years to find efficient coding methods ...
milestone is invention of Turbo Codes (TC) in 1990s
TC reach capacity using “simple” hardware (your mobile)
TC used nowadays in
3G and 4G mobile telephony standards
satellite communication
wireless network standards (WiMAX)
recent focus on network information theory
10 / 21
aalto-logo-en-3
A Modern Communication System
11 / 21
aalto-logo-en-3
Outline
1 Introduction
2 The IT Age
3 IT for Machine Learning Research
4 Wrap Up
12 / 21
aalto-logo-en-3
Ski Resort Marketing
you are working in the marketing agency of a ski resort
hard disk full of webcam snapshots (gigabytes of data)
want to group them into “winter” and ”summer” images
you have only a few hours for this task ...
13 / 21
aalto-logo-en-3
Webcam Snapshots
ith snapshot represented by feature vector x(i) ∈ Rd
find labels y (i) = 1 if ith image from summer, else y (i) = 0
14 / 21
aalto-logo-en-3
Labeled Webcam Snapshots
select randomly N = 6 snapshots
manually categorise/label them (y = 1 for summer)
15 / 21
aalto-logo-en-3
Towards an ML Problem
we have few labeled snapshots
need an algorithm/method/software-app to automaticallylabel all snapshots as either “winter” or “summer”
interpret this ML problem as communication problem ...
16 / 21
aalto-logo-en-3
Machine Learning Problem = Communication Problem
labeled dataset X(train) provides training for learning channel
classifier y(x) decodes the feature x to detect true label y
when is reliable classification possible?
what are good classifiers y(x)?
17 / 21
aalto-logo-en-3
A Modern Machine Learning Problem: Weather Prediction
will there be sun tmrw in Helsinki? (research collaboration withFinnish Meteorological Institute)
18 / 21
aalto-logo-en-3
Outline
1 Introduction
2 The IT Age
3 IT for Machine Learning Research
4 Wrap Up
19 / 21
aalto-logo-en-3
Take Home Messages
machine learning is particular form of communication
fundamental limits by capacities of observation channels
efficient coding/decoding algorithms for machine learning
20 / 21
aalto-logo-en-3
Reading Material
C.E. Shannon, “A Mathematical Theory of Communication”,The Bell System Technical Journal, Vol. 27, pp. 379-423,623–656, July, October, 1948.
see our papers at https://users.aalto.fi/~junga1/
21 / 21