applied machine learning past, present, and future: a personal vie · 2016. 8. 2. · past,...

67
1 Applied machine learning Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC San Diego July 9, 2013 [email protected]

Upload: others

Post on 17-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

1

Applied machine learning

Past, present, and future: A personal view

Charles ElkanComputer Science and Engineering

UC San Diego

July 9, 2013

[email protected]

Page 2: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

2

Page 3: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

What is applied machine learning?

BI, DSS, KDD, fast data, big data, unstructured data, dataviz, NoSQL, Hadoop, Hive, Pig, map-reduce.

Convert data into knowledge + capture value = statistics + optimization

Statistics = machine learning = data mining (≠ data snooping)Optimization = microeconomics + operations research

JARGON

3

Page 4: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

What’s new?

Traditional data:• Tables in databases

New types of data:• Documents• Networks• Videos• XML

• And scale!

4

Page 5: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Half empty or half full?

However much data we have, important data is always missing.

5

Page 6: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

What we can learn from statistics

6

Page 7: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Traditional data

Database systems are cost centers, not profit centers.• Business question: how to turn data into profit?

What’s new? Big data and fast data.• Example: 29 billion rows, 50 thousand columns.

7

Page 8: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

From data to predictions to actions

What can we do with traditional data?• Answer: Predict or recognize, then take actions.

Essential to apply decision theory:• What are the probabilities of alternative outcomes?• What are the costs and benefits of alternative actions?

8

Page 9: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

9

Page 10: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Beyond decision theory ...

What can we do with traditional data?• Answer: Predict or recognize, then take actions.

Microeconomics and micropolitics!• Who pays the costs, who reaps the benefits?

10

Page 11: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Protecting confidentiality

Problem: Toyota wants to learn using Facebook information.• But Facebook users expect their data to stay private?

Solution: Assume some users opt-in to sharing with Toyota. • Facebook computes and publishes weights making these

people representative of all Facebook users.11

Page 12: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Privacy can be almost free

12

Page 13: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

From predictions to suggestions

• Many organizations use “look-alike” models to identify prospects for targeting.

• But if you buy one TV, will you buy another?• And, are you influenceable?

• A recommender system predicts “if you choose this, then you are likely to choose that.”

13

Page 14: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

14

Page 15: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

15

Page 16: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Schwan Food Company

16

Page 17: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

17

Page 18: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Predicting behavior, e.g. “buy” based on “view”

Page 19: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Predicting the behavior of shoppers

A customer's actions can include { look at product, add to cart, finish checkout, write review, return for refund, ... }.

19

Page 20: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Dyadic prediction

Task: Given labels for some dyads, predict labels of other dyads.20

Page 21: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Latent feature models

• Each user, each movie has its own values for latent features.• A prediction is the dot-product of latent vectors.• Infer the most predictive vector for each user and movie.

21

Page 22: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11

Sense and Sensibility

Gus

Dave

Latent features are hidden dimensions

Dimension 1

Dimension 2

Page 23: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

What’s new in our LFL approach

1. Using side-information about users and items2. Allowing any set of discrete labels3. Predicting calibrated probabilities.4. Learning from unbalanced data5. Scaling to billions of pairs6. Unifying disparate problems in a single framework.

23

Page 24: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Users do not provide opinions at random

Yahoo! survey answersYahoo! music ratings

Likelihood of

selecting

Page 25: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

The LFL model

The log-linear framework is a model of discrete choice.Finding latent vectors is essentially factor analysis.

One latent vector per person-label; also one per item-label.• Vy captures effects of person r and item c attributes.• vy captures effects of attributes specific to the (r,c) pair.

25

Page 26: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

The LFL model

1. Important algorithmic innovations:1. For scalability, train using stochastic gradient descent.2. Use L2 regularization to prevent overfitting.3. Alternative loss functions: Maximum likelihood, AUC,

absolute or squared error, and more.

26

Page 27: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Factor analysis inside discrete choice

Compared to conjoint analysis in marketing:• Simultaneous modeling of consumers and products• Attributes are inferred from revealed preferences• Can handle billions of observations• Modern algorithms give improved accuracy.

27

Page 28: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

LFL applied to link prediction

Task: Given data about known people and connections, infer which connections do exist but are unknown.

28

Page 29: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Learning from data is vital

Three theoretical models, one trained LFL model (green).• On all datasets, LFL gives the highest accuracy.• No theoretical model is best always.

29

Page 30: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Solving the cold-start problem

How can we predict choices for new customers and items?• Blue: LFL with rating data only. • Red: LFL with movie and user demographic data.

30

Sheet4

Page 1

Standard Cold-start users Cold-start users + movies0.0000

0.2000

0.4000

0.6000

0.8000

1.0000

1.2000

0.7162

0.8039

0.9608

0.7063 0.71180.7451

Baseline

LFL

Setting

Te

st

set

MA

E

Page 31: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Part II: New types of data

31

Page 32: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

What’s new about social media?

One-to-one

One-to-many

Many-to-one

Many-to-many

Page 33: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

What’s important about social media?

Communication is about feelings as much as about facts.• Shared feelings drive actions.

How to understand opinions in text automatically?• Sentiment analysis ...

Page 34: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Sentiment analysis in 2002

Page 35: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Earlier: Sentiment analysis in 2001

• ... labels designate level of quality, such as interestingness, appropriateness, timeliness, humor, style of language, obscenity, sentiment

• ... a classifier means effective to automatically associate a quality value to items of data

35

Page 36: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Sentiment analysis in 2010

36

Page 37: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Correlation versus causation

Page 38: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Opinion analysis in 2013

38

Page 39: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Some reviews are more helpful than others

39

Page 40: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

But, how can new helpful reviews emerge?

40

Page 41: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Measuring helpfulness automatically

41

Page 42: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Why are public search engines so good?

42

Page 44: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Needed: Search with more understanding

LaVerne Council, Chief Information Officer of Johnson & Johnson:

“... allow anyone to ask a question ... folks that have given us access to their email ... data mining for answers to that question

... help us solve a very hairy issue for one of our products ... one of the associates had completed his thesis in college on that very topic ... they weren’t in the same company

... we were able to really come back with answers.”

44

Page 45: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Squid: A new search engine

45

Page 46: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Applications are in verticals

46

Page 47: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Research challenge: Fewer topics, better fit

47

Page 48: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Medline and PubMed

48

Page 49: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Learning to label documents

Each article is labeled by humans with up to 30 labels selected from 26,853 choices.• One million documents are indexed per year.• Ten minutes per document requires 100 staff.

Budget cuts are causing delays in indexing• Can we learn 26,853 classifiers from 2M documents?

49

Page 50: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Research challenges

1. Share preprocessing between 26,853 classifiers.2. Balance overfitting and underfitting without cross-validation.3. Handle needle-in-a-haystack labels.4. Estimate accurate, calibrated probabilities.5. Set 26,853 thresholds individually to maximize F1 scores.6. Measure the accuracy of human indexers.

50

Page 51: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Making optimal decisions about labels

Definition of F1 score: α = 2tp/(2tp + fp + fn).

Example: α = 0.8 means four correct labels (tp) for each wrong label (fp) and each missed label (fn).

Theorem: If the optimal F1 score is α, then the optimal prediction for document x has probability threshold α/2:

ŷ = I( p(y=1|x) > α/2 ).

[Results on next slide: Laptop with four gigabytes of memory.]

51

Page 52: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

F1 score results

Observations:• The MTI software used at NIH works for some labels only.• SVD-based regression with 500 topics dominates Adaboost.• Machine learning approaches the accuracy of expert indexers.

52

Page 53: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Towards semantic processing

• Recursive neural nets for language understanding www.socher.org/index.php/Main/ParsingNaturalScenes AndNaturalLanguageWith RecursiveNeuralNetworks

53

Page 54: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Part III: Back to making decisions

Many learning and decision-making applications have a short-term view.• But, if you use more credit now, are you more likely to

default in the future?• What about priming, saturation, and spontaneity?

We need to choose actions to maximize long-term benefit.

54

Page 55: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Reinforcement learning

Mathematical framework is Markov decision processes (MDPs).

www.cns.atr.jp/cnb/crp/

55

Page 56: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Data-driven individualized marketing

The agent is a vendor; each customer is one random instance of the environment.• The agent takes an action (sends a catalog, etc.) then

gets a reward (profit from a purchase, etc.).The agent must learn how the environment evolves and a long-term optimal policy.

56

Page 57: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Where is the learning in RL?

How the environment evolves is unknown.• Online RL: The agent learns while interacting.• Batch RL: The agent learns from historical data.

Technical challenge: We must learn a good policy from data collected using an unknown different policy.

57

Page 58: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

In 1948 ...

58

Page 59: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Management Science 14(7)503‒507.59

Sears, Roebuck

Page 60: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

In 2013: Managing a wind turbine with storage

2002 2004 2006 2008 20100

50

100

150

200

250

300

Year$/MWh

Training Testing

1998 2000 2002 2004 20060

5

10

15

20

Year

m/s

Training Testing

60

Page 61: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

How one electricity market works

At midnight, the price for each hour the next day is revealed.• The agent then chooses how much to promise to supply.

Electricity generation depends nonlinearly on wind strength.• Failure to supply => 2x penalty.• Overproduction => dumping or storage.

Max storage is 30, 60, or 120 hours of average production.

61

Page 62: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Research challenges

Multidimensional, continuous state space• State vector contains wind speeds, storage level, and

prices (w1, ..., w24, s24, p1, ..., p24)

Multidimensional, continuous action space• Action vector is commitments (a1, a2, ..., a24)

Training period: 2 yearsTest period: 3 years

62

No more toys!

Page 63: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

NSPI algorithm

• Stage 1: Learn a linear transition model:

• Stage 2: Learn coefficients of an approximate Q function:

• At midnight each test day, choose the best action vector:

63

Page 64: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

64

Page 65: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Compared to previous research

Observations:• NSPI yields higher profit given limited storage. • The marginal benefit of storage is diminishing.

65

Page 66: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Needed to be more realistic ...

• Storing electricity is at best 75% efficient.• Storage should be used for general arbitrage.• Wind speeds and electricity prices are not independent.• Tomorrow’s prices may depend on today’s actions.

• Ultimately, game theory.

66

Page 67: Applied machine learning Past, present, and future: A personal vie · 2016. 8. 2. · Past, present, and future: A personal view Charles Elkan Computer Science and Engineering UC

Discussion

• Applied machine learning = inference from data + optimization

• More data = more opportunities

• Success = domain understanding + methods + leadership

67