franhouder july2013

TRANSFER LEARNING AND SE TIM@MENZIES.US

WVU, JULY 2013

SOUND BITES

•  Ye olde worlde SE

•  “The” model of SE (defects, effort, etc)

•  21st century SE

•  Models (plural) •  No generality in models •  But , perhaps generality in how we find those models

•  Transfer learning

WHAT IS TRANSFER LEARNING?

•  Source = old= Domain1 = < Eg1, P1>

•  Target = new = Domain2 = <Eg2, P2>

•  If we move from domain1 to domain2, do have have to start afresh?

•  Or can we learn faster in “new” … •  … Using lessons learned from “old”?

•  NSF funding (2013..2017):

•  Transfer learning in Software Engineering •  Menzies, Layman, Shull , Diep

WHO CARES? (WHAT’S AT STAKE?)

•  “Transfer” is a core scientific issue

•  Lack of transfer is the scandal of SE

•  Replication is Empirical SE is rare

•  Conclusion instability •  It all depends.

•  The full stop syndrome

•  The result?

•  A funding crisis

MANUAL TRANSFER (WAR STORIES)

•  Brazil, SEL, 2002: need domain knowledge (but now gone)?

•  NSF, SEL, 2006: need better automatic support

•  Kitchenham, Mendes et al, TSE 2007: for = against

•  Zimmermann FSE, 2009: cross works in 4/600 times

WAR STORIES (EFFORT ESTIMATION) Effort = a . locx . y

•  learned using Boehm’s methods

•  20*66% of NASA93 •  COCOMO attributes •  Linear regression (log

pre-processor) •  Sort the co-efficients

found for each member of x,y

WAR STORIES (DEFECT ESTIMATION)

BUT THERE IS HOPE

•  Maybe we’ve been looking in the wrong direction

•  SE project data = surface features of an underlying effect •  Go beneath the surface

Focused too much on what we can see at first glance

Did not check the nuances on the hidden structure beneath

BUT THERE IS HOPE

With new data mining technologies, true picture emerges, where we can see what is going on

12/1/2011 11

BUT THERE IS HOPE

ESEM, 2011 : How to Find Relevant Data for Effort Estimation

TIM MENZIES, EKREM KOCAGUNELI

THERE IS HOPE

USD DOD MILITARY PROJECTS (LAST DECADE)

You must segment to find relevant

DOMAIN SEGMENTATIONS

Q: What to do about rare

zones?

A: Select the nearest ones from the rest But how?

IN THE LITERATURE: WITHIN VS CROSS = ??

BEFORE THIS WORK

Kitchenham et al. TSE 2007

•  Within-company learning (just use local data)

•  Cross-company learning (just use data from other companies)

Results mixed •  No clear win from cross

or within

Cross vs within are no rigid boundaries

•  They are soft borders •  And we can move a

few examples across the border

•  And after making those moves

•  “Cross” same as “local”

SOME DATA DOES NOT DIVIDE NEATLY ON EXISTING DIMENSIONS

THE LOCALITY(1) ASSUMPTION

Data divides best on one attribute 1.  development centers of developers; 2.  project type; e.g. embedded, etc; 3.  development language 4.  application type (MIS; GNC; etc); 5.  targeted hardware platform; 6.  in-house vs outsourced projects; 7.  Etc

If Locality(1) : hard to use data across these boundaries

•  Then harder to build effort models: •  Need to collect local data (slow)

THE LOCALITY(N) ASSUMPTION

Data divides best on combination of attributes If Locality(N)

• Easier to use data across these boundaries

•  Relevant data spread all around

•  little diamonds floating in the dust

HOW TO FIND RELEVANT TRAINING DATA?

independent attributes

w x y z class similar 1

0 1 1 1 2 similar 2

0 1 1 1 3 different 1 7 7 6 2 5 different 2 1 9 1 8 8 different 3 5 4 2 6 10

alien 1 74 15 73 56 20 alien 2 77 45 13 6 40 alien 3 35 99 31 21 60 alien 4 49 55 37 4 80

Use similar?

Use more variant?

Use aliens ?

VARIANCE PRUNING

independent attributes

w x y z class similar 1

0 1 1 1 2 similar 2

0 1 1 1 3 different 1 7 7 6 2 5 different 2 1 9 1 8 8 different 3 5 4 2 6 10

alien 1 74 15 73 56 20 alien 2 77 45 13 6 40 alien 3 35 99 31 21 60 alien 4 49 55 37 4 80

1) Sort the clusters by “variance” 2) Prune those high variance things 3) Estimate on the rest

“Easy path”: cull the examples that hurt the learner

PRUNE !

KEEP !

TEAK: CLUSTERING + VARIANCE PRUNING (TSE, JAN 2011)

•  TEAK is a variance-based instance selector • It is built via GAC trees

•  TEAK is a two-pass system • First pass selects low-variance relevant projects • Second pass retrieves projects to estimate from

ESSENTIAL POINT

TEAK finds local regions important to the estimation of particular cases

TEAK finds those regions via locality(N)

•  Not locality(1)

WITHIN AND CROSS DATASETS

Note: all Locality(1) divisions

EXPERIMENT1: PERFORMANCE COMPARISON OF WITHIN AND CROSS-SOURCE DATA

•  TEAK on within & cross data for each dataset group (lines separate groups) •  LOOCV used for runs •  20 runs performed for each treatment •  Results evaluated w.r.t. MAR, MMRE, MdMRE and Pred(30), but see http://goo.gl/6q0tw

•  If within data outperforms cross, the dataset is highlighted with gray

•  See only 2 datasets highlighted

EXPERIMENT 2: RETRIEVAL TENDENCY OF TEAK FROM WITHIN AND CROSS-SOURCE DATA

EXPERIMENT2: RETRIEVAL TENDENCY OF TEAK FROM WITHIN AND CROSS-SOURCE DATA

Diagonal (WC) vs. Off-Diagonal (CC) selection percentages sorted

Percentiles of diagonals and off-diagonals

HIGHLIGHTS

1.  Don’t listen to everyone •  When listening to a crowd, first

filter the noise

2.  Once the noise clears: bits of me are similar to bits of you

•  Probability of selecting cross or within instances is the same

3.  Cross-vs-within is not a useful distinction

•  Locality(1) not informative •  Enables “cross-company”

learning

SO, THERE IS HOPE

•  Assuming locality(N), not locality(1)

•  No cross-, no within- •  Its all data we can learn from

TSE, 2013 : LOCAL VS. GLOBAL MODELS FOR EFFORT ESTIMATION AND DEFECT PREDICTION TIM MENZIES, ANDREW BUTCHER (WVU) ANDRIAN MARCUS (WAYNE STATE) THOMAS ZIMMERMANN (MICROSOFT) DAVID COK (GRAMMATECH)

Do not on what we can see at first glance

Check the nuances on the hidden structure beneath

THERE IS HOPE

12/1/2011 32

Cluster then learn (using envy)

•  Seek the fence where the grass is greener on the other side.

•  Learn from there

•  Test on here

•  Cluster to find “here” and “there”

12/1/2011 33

ENVY = THE WISDOM OF THE COWS

12/1/2011 34

@attribute recordnumber real @attribute projectname {de,erb,gal,X,hst,slp,spl,Y} @attribute cat2 {Avionics, application_ground, avionicsmonitoring, … } @attribute center {1,2,3,4,5,6} @attribute year real @attribute mode {embedded,organic,semidetached} @attribute rely {vl,l,n,h,vh,xh} @attribute data {vl,l,n,h,vh,xh} … @attribute equivphyskloc real @attribute act_effort real @data 1,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,25.9,117.6 2,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,24.6,117.6 3,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,7.7,31.2 4,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,8.2,36 5,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,9.7,25.2 6,de,avionicsmonitoring,g,2,1979,semidetached,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,2.2,8.4 ….

DATA = MULTI-DIMENSIONAL VECTORS

CAUTION: DATA MAY NOT DIVIDE NEATLY ON RAW DIMENSIONS

The best description for SE projects may be synthesize dimensions extracted from the raw dimensions

12/1/2011 35

FASTMAP

Fastmap: Faloutsos [1995] O(2N) generation of axis of large variability

•  Pick any point W; •  Find X furthest from W, •  Find Y furthest from Y.

c = dist(X,Y) All points have distance a,b to (X,Y)

•  x = (a2 + c2 − b2)/2c •  y= sqrt(a2 – x2)

Find median(x), median(y) Recurse on four quadrants

HIERARCHICAL PARTITIONING Prune

Find two orthogonal dimensions

Find median(x), median(y)

Recurse on four quadrants

Combine quadtree leaves with similar densities

Score each cluster by median score of class variable

Learning via “envy”

•  Seek the fence where the grass is greener on the other side.

•  Learn from there

•  Test on here

•  Cluster to find “here” and “there”

ENVY = THE WISDOM OF THE COWS

Combine quadtree leaves with similar densities

Score each cluster by median score of class variable

Combine quadtree leaves with similar densities Score each cluster by median score of class variable This cluster envies its neighbor with better score and max abs(score(this) - score(neighbor)) 41

Where is grass greenest?

Q: HOW TO LEARN RULES FROM NEIGHBORING CLUSTERS

A: it doesn’t really matter • Many competent rule learners

But to evaluate global vs local rules: • Use the same rule learner for local vs global rule learning

This study uses WHICH (Menzies [2010])

• Customizable scoring operator • Faster termination • Generates very small rules (good for explanation)

DATA FROM HTTP://PROMISEDATA.ORG/DATA

Effort reduction = { NasaCoc, China } : COCOMO or function points

Defect reduction = {lucene,xalan jedit,synapse,etc } : CK metrics(OO)

Clusters have untreated class distribution.

Rules select a subset of the examples:

•  generate a treated class distribution

0 20 40 60 80 100

untreated global local

Distributions have percentiles:

Treated with rules learned from all data

Treated with rules learned from neighboring cluster

Lower median efforts/defects (50th percentile)

Greater stability (75th – 25th percentile)

Decreased worst case (100th percentile)

BY ANY MEASURE, LOCAL BETTER THAN GLOBAL

RULES LEARNED IN EACH CLUSTER

What works best “here” does not work “there”

•  Misguided to try and tame conclusion instability •  Inherent in the data

Can’t tame conclusion instability.

•  Instead, you can exploit it •  Learn local lessons that do better than overly generalized global theories

RULES LEARNED IN EACH CLUSTER

What works best “here” does not work “there”

•  Misguided to try and tame conclusion instability •  Inherent in the data

Can’t tame conclusion instability.

•  Instead, you can exploit it •  Learn local lessons that do better than overly generalized global theories

Check the nuances on the structures within our data

•  Cluster, then envy

SO THERE IS HOPE

Conclusion

LACK OF TRANSFER = THE GREAT SCANDAL OF SE

•  Replication is Empirical SE is rare

•  Conclusion instability

•  “It all depends.” is not good enough

•  A funding crisis

BUT THERE IS HOPE

•  Assuming locality(N), not locality(1)

•  No cross-, no within- •  Its all data we can learn from

Check the nuances on the structures within our data

•  Cluster, then envy

BUT THERE IS HOPE

With new data mining technologies, true picture emerges, where we can see what is going on

12/1/2011 52

BUT THERE IS HOPE

franhouder july2013

Technology

tourism news july2013

newsletter july2013 final

investor presentation july2013

newsletter july2013 digital

goldfish webportfolio july2013

microsoft brandguide july2013

oruma june july2013

turukanova3 july2013

caa2013 9 10-july2013

a1 july2013 final

dpp newsletter july2013

wotm july2013

portfolio clarabalsachsole july2013

winchester today - july2013

gm chef july2013

cap presentation-july2013

eiu - romania - july2013

e-edition july2013

sol july2013

11-inspiredcrochet july2013