gradient boosting - wikipedia, the free encyclopedia

Upload: kabhijit04

Post on 05-Mar-2016

3 views

Category:

Documents


0 download

DESCRIPTION

algorithm

TRANSCRIPT

  • 8/28/2015 GradientboostingWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/Gradient_boosting 1/6

    GradientboostingFromWikipedia,thefreeencyclopedia

    Gradientboostingisamachinelearningtechniqueforregressionandclassificationproblems,whichproducesapredictionmodelintheformofanensembleofweakpredictionmodels,typicallydecisiontrees.Itbuildsthemodelinastagewisefashionlikeotherboostingmethodsdo,anditgeneralizesthembyallowingoptimizationofanarbitrarydifferentiablelossfunction.

    TheideaofgradientboostingoriginatedintheobservationbyLeoBreiman[1]thatboostingcanbeinterpretedasanoptimizationalgorithmonasuitablecostfunction.ExplicitregressiongradientboostingalgorithmsweresubsequentlydevelopedbyJeromeH.Friedman[2][3]simultaneouslywiththemoregeneralfunctionalgradientboostingperspectiveofLlewMason,JonathanBaxter,PeterBartlettandMarcusFrean.[4][5]Thelattertwopapersintroducedtheabstractviewofboostingalgorithmsasiterativefunctionalgradientdescentalgorithms.Thatis,algorithmsthatoptimizeacostfunctionaloverfunctionspacebyiterativelychoosingafunction(weakhypothesis)thatpointsinthenegativegradientdirection.Thisfunctionalgradientviewofboostinghasledtothedevelopmentofboostingalgorithmsinmanyareasofmachinelearningandstatisticsbeyondregressionandclassification.

    Contents

    1Informalintroduction2Algorithm3Gradienttreeboosting

    3.1Sizeoftrees4Regularization

    4.1Shrinkage4.2Stochasticgradientboosting4.3Numberofobservationsinleaves4.4PenalizeComplexityofTree

    5Usage6Names7Seealso8References

    Informalintroduction

    (ThissectionfollowstheexpositionofgradientboostingbyLi.[6])

    Likeotherboostingmethods,gradientboostingcombinesweaklearnersintoasinglestronglearner,inaniterativefashion.Itiseasiesttoexplainintheleastsquaresregressionsetting,wherethegoalistolearnamodel thatpredictsvalues ,minimizingthemeansquarederror tothetruevaluesy(averagedoversometrainingset).

    Ateachstage ofgradientboosting,itmaybeassumedthatthereissomeimperfectmodel (attheoutset,averyweakmodelthatjustpredictsthemeanyinthetrainingsetcouldbeused).Thegradientboostingalgorithmdoesnotchange

    inanywayinstead,itimprovesonitbyconstructinganewmodelthataddsanestimatorhtoprovideabettermodel.Thequestionisnow,howtofind ?Thegradientboostingsolutionstartswiththeobservation

    thataperfecthwouldimply

    or,equivalently,

    .

    Therefore,gradientboostingwillfithtotheresidual .Likeinotherboostingvariants,each learnstocorrect

  • 8/28/2015 GradientboostingWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/Gradient_boosting 2/6

    Therefore,gradientboostingwillfithtotheresidual .Likeinotherboostingvariants,each learnstocorrectitspredecessor .Ageneralizationofthisideatootherlossfunctionsthansquarederror(andtoclassificationandrankingproblems)followsfromtheobservationthatresiduals arethenegativegradientsofthesquarederrorlossfunction

    .So,gradientboostingisagradientdescentalgorithmandgeneralizingitentails"pluggingin"adifferentloss

    anditsgradient.

    Algorithm

    InmanysupervisedlearningproblemsonehasanoutputvariableyandavectorofinputvariablesxconnectedtogetherviaajointprobabilitydistributionP(x,y).Usingatrainingset ofknownvaluesofxandcorrespondingvaluesofy,thegoalistofindanapproximation toafunctionF*(x)thatminimizestheexpectedvalueofsomespecifiedlossfunctionL(y,F(x)):

    .

    Gradientboostingmethodassumesarealvaluedyandseeksanapproximation intheformofaweightedsumoffunctionsh(x)fromsomeclass,calledbase(orweak)learners:

    .

    Inaccordancewiththeempiricalriskminimizationprinciple,themethodtriestofindanapproximation thatminimizestheaveragevalueofthelossfunctiononthetrainingset.Itdoessobystartingwithamodel,consistingofaconstantfunction ,andincrementallyexpandingitinagreedyfashion:

    ,

    ,

    wherefisrestrictedtobeafunctionfromtheclassofbaselearnerfunctions.

    However,theproblemofchoosingateachstepthebestfforanarbitrarylossfunctionLisahardoptimizationproblemingeneral,andsowe'll"cheat"bysolvingamucheasierprobleminstead.

    Theideaistoapplyasteepestdescentsteptothisminimizationproblem.Ifweonlycaredaboutpredictionsatthepointsofthetrainingset,andfwereunrestricted,we'dupdatethemodelperthefollowingequation,whereweviewL(y,f)notasafunctionaloff,butasafunctionofavectorofvalues :

    Butasfmustcomefromarestrictedclassoffunctions(that'swhatallowsustogeneralize),we'lljustchoosetheonethatmostcloselyapproximatesthegradientofL.Havingchosenf,themultiplieristhenselectedusinglinesearchjustasshowninthesecondequationabove.

    Inpseudocode,thegenericgradientboostingmethodis:[2][7]

  • 8/28/2015 GradientboostingWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/Gradient_boosting 3/6

    Input:trainingset adifferentiablelossfunction numberofiterations

    Algorithm:

    1. Initializemodelwithaconstantvalue:

    2. Form=1toM:1. Computesocalledpseudoresiduals:

    2. Fitabaselearner topseudoresiduals,i.e.trainitusingthetrainingset .3. Computemultiplier bysolvingthefollowingonedimensionaloptimizationproblem:

    4. Updatethemodel:

    3. Output

    Gradienttreeboosting

    Gradientboostingistypicallyusedwithdecisiontrees(especiallyCARTtrees)ofafixedsizeasbaselearners.ForthisspecialcaseFriedmanproposesamodificationtogradientboostingmethodwhichimprovesthequalityoffitofeachbaselearner.

    Genericgradientboostingatthemthstepwouldfitadecisiontree topseudoresiduals.Let bethenumberofitsleaves.Thetreepartitionstheinputspaceinto disjointregions andpredictsaconstantvalueineachregion.Usingtheindicatornotation,theoutputof forinputxcanbewrittenasthesum:

    where isthevaluepredictedintheregion .[8]

    Thenthecoefficients aremultipliedbysomevalue ,chosenusinglinesearchsoastominimizethelossfunction,andthemodelisupdatedasfollows:

    Friedmanproposestomodifythisalgorithmsothatitchoosesaseparateoptimalvalue foreachofthetree'sregions,insteadofasingle forthewholetree.Hecallsthemodifiedalgorithm"TreeBoost".Thecoefficients fromthetreefittingprocedurecanbethensimplydiscardedandthemodelupdaterulebecomes:

  • 8/28/2015 GradientboostingWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/Gradient_boosting 4/6

    Sizeoftrees

    ,thenumberofterminalnodesintrees,isthemethod'sparameterwhichcanbeadjustedforadatasetathand.Itcontrolsthemaximumallowedlevelofinteractionbetweenvariablesinthemodel.With (decisionstumps),nointeractionbetweenvariablesisallowed.With themodelmayincludeeffectsoftheinteractionbetweenuptotwovariables,andsoon.

    Hastieetal.[7]commentthattypically workwellforboostingandresultsarefairlyinsensitivetothechoiceof inthisrange, isinsufficientformanyapplications,and isunlikelytoberequired.

    Regularization

    Fittingthetrainingsettoocloselycanleadtodegradationofthemodel'sgeneralizationability.Severalsocalledregularizationtechniquesreducethisoverfittingeffectbyconstrainingthefittingprocedure.

    OnenaturalregularizationparameteristhenumberofgradientboostingiterationsM(i.e.thenumberoftreesinthemodelwhenthebaselearnerisadecisiontree).IncreasingMreducestheerrorontrainingset,butsettingittoohighmayleadtooverfitting.AnoptimalvalueofMisoftenselectedbymonitoringpredictionerroronaseparatevalidationdataset.BesidescontrollingM,severalotherregularizationtechniquesareused.

    Shrinkage

    Animportantpartofgradientboostingmethodisregularizationbyshrinkagewhichconsistsinmodifyingtheupdateruleasfollows:

    whereparameter iscalledthe"learningrate".

    Empiricallyithasbeenfoundthatusingsmalllearningrates(suchas )yieldsdramaticimprovementsinmodel'sgeneralizationabilityovergradientboostingwithoutshrinking( ).[7]However,itcomesatthepriceofincreasingcomputationaltimebothduringtrainingandquerying:lowerlearningraterequiresmoreiterations.

    Stochasticgradientboosting

    SoonaftertheintroductionofgradientboostingFriedmanproposedaminormodificationtothealgorithm,motivatedbyBreiman'sbaggingmethod.[3]Specifically,heproposedthatateachiterationofthealgorithm,abaselearnershouldbefitonasubsampleofthetrainingsetdrawnatrandomwithoutreplacement.[9]Friedmanobservedasubstantialimprovementingradientboosting'saccuracywiththismodification.

    Subsamplesizeissomeconstantfractionfofthesizeofthetrainingset.Whenf=1,thealgorithmisdeterministicandidenticaltotheonedescribedabove.Smallervaluesoffintroducerandomnessintothealgorithmandhelppreventoverfitting,actingasakindofregularization.Thealgorithmalsobecomesfaster,becauseregressiontreeshavetobefittosmallerdatasetsateachiteration.Friedman[3]obtainedthat leadstogoodresultsforsmallandmoderatesizedtrainingsets.Therefore,fistypicallysetto0.5,meaningthatonehalfofthetrainingsetisusedtobuildeachbaselearner.

    Also,likeinbagging,subsamplingallowsonetodefineanoutofbagestimateofthepredictionperformanceimprovementbyevaluatingpredictionsonthoseobservationswhichwerenotusedinthebuildingofthenextbaselearner.Outofbagestimateshelpavoidtheneedforanindependentvalidationdataset,butoftenunderestimateactualperformanceimprovementandtheoptimalnumberofiterations.[10]

    Numberofobservationsinleaves

    Gradienttreeboostingimplementationsoftenalsouseregularizationbylimitingtheminimumnumberofobservationsintrees'terminalnodes(thisparameteriscalledn.minobsinnodeintheRgbmpackage[10]).Itisusedinthetreebuildingprocessbyignoringanysplitsthatleadtonodescontainingfewerthanthisnumberoftrainingsetinstances.

  • 8/28/2015 GradientboostingWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/Gradient_boosting 5/6

    Imposingthislimithelpstoreducevarianceinpredictionsatleaves.

    PenalizeComplexityofTree

    Anotherusefulregularizationtechniquesforgradientboostedtreesistopenalizemodelcomplexityofthelearnedmodel.[11]Themodelcomplexitycanbedefinedproportionalnumberofleavesinthelearnedtrees.Thejointlyoptimizationoflossandmodelcomplexitycorrespondstoapostpruningalgorithmtoremovebranchesthatfailtoreducethelossbyathreshold.Otherkindsofregularizationsuchasl2penaltyontheleavevaluescanalsobeaddedtoavoidoverfitting.

    Usage

    Recently,gradientboostinghasgainedsomepopularityinthefieldoflearningtorank.ThecommercialwebsearchenginesYahoo[12]andYandex[13]usevariantsofgradientboostingintheirmachinelearnedrankingengines.

    Names

    Themethodgoesbyavarietyofnames.Friedmanintroducedhisregressiontechniqueasa"GradientBoostingMachine"(GBM).[2]Mason,Baxteret.el.describedthegeneralizedabstractclassofalgorithmsas"functionalgradientboosting".[4][5]

    Apopularopensourceimplementation[10]forRcallsit"GeneralizedBoostingModel".CommercialimplementationsfromSalfordSystemsusethenames"MultipleAdditiveRegressionTrees"(MART)andTreeNet,bothtrademarked.

    Seealso

    AdaBoostRandomforest

    References1. Brieman,L."ArcingTheEdge(http://statistics.berkeley.edu/sites/default/files/techreports/486.pdf)"(June1997)2. Friedman,J.H."GreedyFunctionApproximation:AGradientBoostingMachine.(http://wwwstat.stanford.edu/~jhf/ftp/trebst.pdf)"

    (February1999)3. Friedman,J.H."StochasticGradientBoosting.(https://statweb.stanford.edu/~jhf/ftp/stobst.pdf)"(March1999)4. Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(1999)."BoostingAlgorithmsasGradientDescent"

    (http://papers.nips.cc/paper/1766boostingalgorithmsasgradientdescent.pdf)(PDF).InS.A.SollaandT.K.LeenandK.Mller.AdvancesinNeuralInformationProcessingSystems12.MITPress.pp.512518.

    5. Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(May1999).BoostingAlgorithmsasGradientDescentinFunctionSpace(http://maths.dur.ac.uk/~dma6kp/pdf/face_recognition/Boosting/Mason99AnyboostLong.pdf)(PDF).

    6. ChengLi."AGentleIntroductiontoGradientBoosting"(http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf)(PDF).NortheasternUniversity.Retrieved19August2014.

    7. Hastie,T.Tibshirani,R.Friedman,J.H.(2009)."10.BoostingandAdditiveTrees".TheElementsofStatisticalLearning(http://wwwstat.stanford.edu/~tibs/ElemStatLearn/)(2nded.).NewYork:Springer.pp.337384.ISBN0387848576.

    8. Note:incaseofusualCARTtrees,thetreesarefittedusingleastsquaresloss,andsothecoefficient fortheregion isequaltojustthevalueofoutputvariable,averagedoveralltraininginstancesin .

    9. Notethatthisisdifferentfrombagging,whichsampleswithreplacementbecauseitusessamplesofthesamesizeasthetrainingset.10. Ridgeway,Greg(2007).GeneralizedBoostedModels:Aguidetothegbmpackage.(http://cran.rproject.org/web/packages/gbm/gbm.pdf)11. TianqiChen.IntroductiontoBoostedTrees(http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf)12. Cossock,DavidandZhang,Tong(2008).StatisticalAnalysisofBayesOptimalSubsetRanking

    (http://www.stat.rutgers.edu/~tzhang/papers/it08ranking.pdf),page14.13. Yandexcorporateblogentryaboutnewrankingmodel"Snezhinsk"(http://webmaster.ya.ru/replies.xml?item_no=5707&ncrnd=5118)(in

    Russian)

    Retrievedfrom"https://en.wikipedia.org/w/index.php?title=Gradient_boosting&oldid=678013581"

    Categories: Decisiontrees Ensemblelearning

  • 8/28/2015 GradientboostingWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/Gradient_boosting 6/6

    Thispagewaslastmodifiedon26August2015,at22:37.TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.WikipediaisaregisteredtrademarkoftheWikimediaFoundation,Inc.,anonprofitorganization.