big data - what it is v2 - information assurance | · pdf filebig data what it is, how to do...

29
BIG DATA What it is, how to do it right! James Luck Principal Data ScienAst Clockwork SoluAons 805 Las Cimas Parkway #100 AusAn, TX, 78746 800 - 994 - 1336

Upload: vuongmien

Post on 21-Mar-2018

222 views

Category:

Documents


6 download

TRANSCRIPT

BIGDATAWhatitis,howtodoitright!

JamesLuckPrincipalDataScienAstClockworkSoluAons805LasCimasParkway#100AusAn,TX,78746800-994-1336

AT&TConsul+ng

TODAY’SPRESENTATION¢  I’mnottryingtosellyouanything.¢  Thisisahigh-levelapproachtounderstandingand

implemenAngBigData.¢  Baseduponmyrecentexperiencestalkingwithpeoplejust

likeyou!¢  EveryorganizaAonhasthesameissues&concerns.¢  YOUcanavoidthemistakesothershavemade!

2

AT&TConsul+ng

JAMESLUCKBIO¢  JamesisaPrincipalDataScienAstwith

ClockWorkSoluAons.

¢  Hehas25+yearsofexperienceindataanalyAcs,inaddiAontoextensivetelecommunicaAonsandmanagedservicesdevelopment.HeholdsadvanceddegreesinbothAerospaceandElectricalEngineering,andanMBA.

¢  Previously,JameswasaSeniorConsultantforAT&TConsulAng,providingclientswithassistanceisdesigningandmapping-outtheirownBigDataprograms.

¢  PriortoAT&T,hewasscienAstandPhDcandidateatSandiaLabsandAirForceResearchLab.HisexperiencethereincludesavarietyofprojectsusingcomplexdataanalyAcsinorbitalsystemsandSyntheAcApertureRadar.

3

AT&TConsul+ng

AGENDA

¢ WhatisBigData?¢ Terminology¢ FromBusinessAnalyAcstoBigData¢  ImplemenAngBigData–Infrastructure¢  ImplemenAngBigData–DataAnalyAcs

4

AT&TConsul+ng

FOCUSONASUCCESSFULPROJECT!

¢ AllgoodBigDataprojectssucceedinthesameway.¢ AllfailedBigDataprojectsfailintheirownuniqueway!

AvoidlistsofDON’T’S

FocusontakingacAonsthatwillmakeyousuccessful

AT&TConsul+ng

6

BIGDATAWhatisit?

AT&TConsul+ng

WHATISBIGDATA?¢  Bigdataishigh-volume,high-velocityandhigh-varietyinformaAonassetsthat

demandcost-effecAve,innovaAveformsofinformaAonprocessingforenhancedinsightanddecisionmaking.(Gartner,ITGlossary)

¢  A“termofart”usedtodescribelargedataprojects.

¢  AnyprojectwherecollecAng,storing,retrieving,orprocessingthedatabecomesasignificantpartoftheproblem.

AlmostallorganizaAonshaveaVarietyproblem,notreallyaVolumeproblem.

7

AT&TConsul+ng

WEKNOWWHATITIS…..SOWHAT’STHEPROBLEM?

¢  Overly-broaddefiniAon

¢  Nocommonindustryunderstanding

¢  EveryoneorganizaAonhasaslightlydifferentdefiniAon

¢  VendorscanlabelawidevarietyofhardwareandsonwareproductsandservicesasBigData

ToomuchfocusonhardwareandsonwareproductsYoucan’t“buy”aBigData

8

AT&TConsul+ng

THEVALUEOFBIGDATA¢  You’resiongonaGoldMineandyoudon’tevenknowit!

¢  YourdatacontainsawealthofinsightsandinformaAonunavailablefromanyothersource.

¢ UsethedatayoualreadyowntorunyourorganizaAon–it’sFREE!

¢  Thereisnooutsidedatayoucanpurchasethatwilltellyoumoreaboutyourbusinessthanyourowndata.

9

AT&TConsul+ng

THEPROMISEOFBIGDATAWhattheytellyou……

“Giveusallyourdata,everything,sales,markeAng,customerssurveys,manufacturing,accounAng,structured,unstructured,text,logical,numericdata.We’llcrunchittogetherandproduceinsightsandacAonableinformaAonthatwillenableyoutorunyourbusinessbeqer.”

Whattheydon’ttellyou……

It’samazinglyexpensiveandAme-consumingtodoBigDatathatway.Thinkmillionsofdollarsandseveralyears.

Thegoodnews……

Itdoesn’thavetobeall-or-nothing.OrganizaAonsgetexcellentresultswithafocused,programmaAcapproach

10

AT&TConsul+ng

KEEPINMIND…..

¢  BigDatacanINFORMyourbusinesspracAces¢ Helpyoutomakeinformeddecisions

¢  BigDataCANNOTtellyouhowtorunyourbusiness!

BigDatacannotcreateyourbusinessgoalsoryourmission,visionorvalueforyou.

11

AT&TConsul+ng

Sowhydo

Amazon,Google,Yahoo,Microsondothissowell?

It’sTheirCoreBusiness!TheyAREBigData

ThevastmajorityoforganizaAonshaveothercore

missionsthattheyaugmentwithBigData

AT&TConsul+ng

13

SOMETERMINOLOGY….

AT&TConsul+ng

BIGDATA&FRIENDS¢  BusinessIntelligence

•  Theaggrega+onandprocessingofbusinessdatatoprovidea360-degreeviewofthebusiness.Focusaroundaggrega+ngandvisualizingandrepor+ngontheoverallbusiness

¢  DataAnalysis/AnalyAcs•  Theoverallprocessofanalyzingdata,fromcollec+ngdatathoughtanalysisthrough

visualiza+on

¢  DataScience•  Theoverarchingtermfortoolsandtechniquestoextractinforma+onfromdata

¢  DataMining•  Toolsandtechniquesfordiscoveringpa^ernsindatasets

¢  PredicAveAnalyAcs•  Toolsandtechniquesthatanalyzetrendsandhistoricaldatatomakepredic+ons•  NOTE:Youcanpredicttrends,youCANNOTpredictthefuture!

¢  TextAnalyAcs•  Dataanaly+csfortext

¢  BusinessAnalyAcs•  Generalnamefordataanaly+csperformedonbusinessdata 14

AT&TConsul+ng

15

FROMBUSINESSANALYTICSTOBIGDATAWhereAreWeToday?

AT&TConsul+ng

CURRENTBUSINESSANALYTICSPARADIGM¢  Focusondataproductsformanagingthebusiness

¢  TypicalquesAons:•  Howmanycallsdidwetakeyesterday?•  Howmuchdidwesellyesterday?•  Howmuchinventorydowehave?•  FocusonKPI’s,metrics

¢  Lookingforchangesfromthenorm

¢  UsingdescripAvestaAsAcs•  Summarizedata•  Mean,variance,trends

¢  ReporAng•  Chart,graphs,trendplots

¢  Allabout“monitoringthemachine”•  Focusonanarrowsetofdata

16

AT&TConsul+ng

BIGDATAANALYTICSVS“TRADITIONAL”BUSINESSANALYTICS

¢  BusinessAnalyAcsPLUSawholelotmore¢  Usemuchlargersetofdata¢  Manydifferentdatatypes&combinaAons

•  Structured,unstructured,logical,text

¢  Typicallycan’tprocesswithtradiAonalsystems•  Newalgorithmsandapproaches

¢  PredicAveAnalyAcs•  Whoismostlikelytobuythiswidget?•  Ifadevicefails,howlikelyisittofailagainin30days?

¢  DataMining•  Whatmakescustomersunhappy?

¢  TextAnalyAcs•  Sen+mentanalysis•  TopicModeling

¢  VisualizaAon•  Heatmapsofcustomersa+sfac+onbycounty

¢  FindingrelaAonships•  Whatfactorsmostaffectemployeereten+on? 17

AT&TConsul+ng

SOMEPITFALLS……¢  Literally,thousandsoftechniques

•  Whichone(s)shouldyouuse?

¢  Thesetechniquesrequirealotofskilltouseproperly•  Datacleanlinessrequirements,robustness•  Havetoknowhowtointerpretresults

¢  Generallynotpossibletoverifyresults•  Howdoyoucheckthat100,000trouble+cketswereproperlycategorizedbyanalgorithm?•  Canverifyasmallfew,can’tcheckthemall

¢  Relyupon“goodness-of-fit”tocheckquality¢  Algorithmsdon’tlendthemselvestoauto-runtools¢  IdenAfiedrelaAonshipsmaynotactuallyexist.

•  Ar+factofapar+culardataset

Youneedexperiencedalgorithmspeople(datascienAsts)topickalgorithms,buildmodelsandinterpretresults

properly.18

AT&TConsul+ng

19

IMPLEMENTINGBIGDATAIt’snotabouttheinfrastructure!

AT&TConsul+ng

THEMOSTCOMMONBIGDATAFAIL

Failureto(1)createUseCasesthatare(2)AedtoBusinessGoals

20

AT&TConsul+ng

BUILDINGANINFRASTRUCTURE

IDBusinessGoalsGatherStakeholdersCreateUseCasesfor

thebusiness

CreateaStrategy&Roadmapthatmeets

theUseCaserequirements

Implementinfrastructure

21

AT&TConsul+ng

BIGDATAINFRASTRUCTUREFAILS¢  Buyingfromvendorsbeforeyouhaveaplan

¢  BuildinganinfrastructureBEFOREyoudefineusecases

¢  NeglecAngtoengagestakeholders

¢  Nothavingawell-definedS&Rplan

¢  NeglecAngtouseexisAngsystems

¢  UnderesAmaAngstorage&processingrequirements

¢ BigData≠Hadoop¢ Youdon’tneedHadooptoimplementBigData

22

AT&TConsul+ng

23

IMPLEMENTINGBIGDATADataAnaly+cs

AT&TConsul+ng

FIRSTTHINGSFIRST¢ NoonereallyknowswheretheirdataisorwhattheyhaveØ Performadatasurveybeforeyoustart!

¢ Youwillspend90%ofyour+medoingdataclean-upØ Acceptthisasafact.Don’texpectresultsforthefirstfewmonths.

DecideiftheselimitaAonsareworkableforyou!24

AT&TConsul+ng

BIGDATAISATEAMSPORT¢  BusinessAnalyst

•  Gatherrequirements,createusecases

¢  DataEngineer•  Design,build,maintainBigDatainfrastructure

¢  DataScienAsts•  Selectalgorithms,build,verifymodels

¢  DataCurators•  Acquireandpreservedatasets•  Handledatagovernanceandqualityissues

¢  DataVisualizers•  Createdataproductsfromtheinforma+ongleanedfromthedata

25

AT&TConsul+ng

BIGDATAISAPROGRAMMATICAPPROACHIden+fy

BusinessGoals

CreateUseCases

BigDataAnaly+cs

InsightsDataProducts

Implementintobusinessprocesses

MeasureandEvaluate

26

WhyaProgrammaAcApproach?

Noteveryusecasewillproducedesiredresults

RepeatunAlresults

achieved

AT&TConsul+ng

BIGDATAISAPROGRAMMATICAPPROACHIden+fy

BusinessGoals

CreateUseCases

DataAnaly+cs

InsightsDataProducts

Implementintobusinessprocesses

MeasureandEvaluate

27

Mostbreakdownsoccur

AT&TConsul+ng

DATAANALYTICSFAILS¢  Failingtoassembleateam

¢  CreaAngrandomdataproductsandtryingtofeedthosebackintothebusiness

¢  PoorUseCases&businessgoals(yes,again!)

¢  FailingtoimplementrecommendaAons

¢  Failingtointegratedataproductsintobusinessprocesses•  Whataretheysupposedtodowiththesethings?

¢  Failingtomeasuretheimpactonthebusiness•  Can’tjus+fywhyyou’redoingthis

¢  FailingtoconAnuallyimplement/improveunAlresultsareachieved

28

AT&TConsul+ng

QUESTIONS?