privacy overview and data mining csc 301 spring 2018 ... · csc 301 spring 2018 ... internet for...
TRANSCRIPT
![Page 1: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/1.jpg)
PrivacyOverviewandDataMining
CSC301Spring2018
HowardRosenthal
![Page 2: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/2.jpg)
CourseNotes:
� Muchofthematerialintheslidescomesfromthebooksandtheirassociatedsupportmaterials,belowaswellasmanyofthereferencesattheclasswebsite
Baase,SaraandHenry,Timothy,AGiftofFire:Social,Legal,andEthicalIssuesforComputingTechnology(5thEdition)Pearson,March9,2017,ISBN-13:978-0134615271Quinn,Michael,EthicsfortheInformationAge(7thEdition),Pearson,Feb.21,2016,ISBN-13978-0134296548
2
![Page 3: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/3.jpg)
LessonGoals
� Basicprinciplesinprivacy� Definingprivacy� Threatstoprivacy� Impactsoftechnologyonprivacy� Securingpersonalprivacy� Technologyexcursion–DataMining
3
![Page 4: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/4.jpg)
4
![Page 5: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/5.jpg)
ThereAreManyAspectsToSecurityandPrivacy
5
![Page 6: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/6.jpg)
WhatIsPrivacy?� IsprivacyaNaturalRight
� Isprivacyatypeofproperty?� Ifyouinvadeaperson’sprivacyitcanbeamajorcoerciveforce
� Privacyusedtobefairlysimple� Yourhomecouldnotbeinvaded,noryourpropertyseized,without
dueprocess� Todayyourprivateinformationiseverywhere
� Onthenet� Onyourphone� Onyourcomputer� Inthecloud� Inyouremployer’sdatabases� Withthegovernment
� Evenifthepeopleyougiveinformationtodonotmisusethatinformation,theinformationismoresusceptibletotheftviahackingorothermischiefthaneverbefore� RecentlytheFederalGovernment’sOfficeofPersonalManagement
washackedanddetailedinformationoneveryonewithasecurityclearancewasstolen
� Governmentacceptedverylittleresponsibilityforthistheft6
![Page 7: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/7.jpg)
ThereAreThreeKeyAspectsToPrivacy� Freedomfromintrusion� Controloverinformationaboutoneself� Freedomfromsurveillance(physical,electronic,etc.)
7
![Page 8: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/8.jpg)
OurPrivacyIsAlwaysBeingThreatened� Therearemanythreatstoourprivacy
� Intentionaluseormisuseofinformationbybusinessesorgovernment
� Unauthorizedreleasetoinsidersbyinformationmaintainers� Theftofinformationbycriminalsorhostilegovernments� Inadvertentleakagethroughnegligenceorcarelessness
� Ourownactions,suchaspostingtoomuchdataontheInternetforeitherbenign(B)ormalicious(M)purposes� Givetoonecharityandtenotherswillcomeknocking(B)� Listof“offcolor”moviesyoumayhavewatched(M)-usedtodiscredityou
� Divorceproceeding(M)–sometimesusedbypoliticians� Stealingfinancialdata(M)–usedtoopenloans,buyhomes,etc.allinyourname
8
![Page 9: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/9.jpg)
9
![Page 10: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/10.jpg)
NewTechnologyCreatesManyNewOpportunitiesToInvadeOurPrivacy� Someofthesethreatscombinebothlowtechtechniques,suchas
eavesdroppingorlookingoverashoulder,withhightechtechniques� Governmentandprivatedatabases� Sophisticatedtoolsforsurveillanceanddataanalysis� Vulnerabilityofdata
� Searchenginescollectmanyterabytesofdatadaily.� Dataisanalyzedtotargetadvertisinganddevelopnewservices.� Whogetstoseethisdata?Whyshouldwecare?� Thissamedata,whenaggregated,createsadetailedbiographyofyou� Datacollectedforonepurposewillfindotheruses� Assumethateverythingincyberspaceisrecordedandreplicated
� Youcreatenewpotentialsecurityleakseveryday� Facebook� E-mails� Texts� Mapinstructions� Twitter� IfinformationisonapublicWebsite,itisavailabletoeveryone
� Ifyoupostpicturesofyourvacationwhileyouareonityoumaycomehometoanemptyhouse
10
![Page 11: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/11.jpg)
Re-identification
� Re-identificationistheprocessofidentifyingindividualsusinganonymousdata.� Re-identificationhasbecomemucheasierduetothequantityofinformationandpowerofdatasearchandanalysistools
� Acollectionofsmallitemscanbeaggregatedtoprovideadetailedpicture
� Yoursearchhistorycouldidentifywhoyouare.� Workingbackwardsfromthemetadataisoftenpossiblewithenoughcomputingpoweranddata.
� Reportersoftenuseanonymousdataastheyworktowardsidentifyingindividuals.
� IfinformationisonapublicWebsite,itisoftenavailabletoeveryone
11
![Page 12: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/12.jpg)
PersonalSecurityandPrivacyAreOftenThreatenedByOurOwnActions
12
![Page 13: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/13.jpg)
EverythingYouAccessMayBeMonitored� SearchEngines
� Mayrecordallyoursearches� IfyousearchforabookonAmazonyou’llgete-mailsaboutthatbookor
otherseveryfewdays� Someofyoursearchesyoumaywanttokeepprivate
� Lookingforanewjob� Searchingforcertainspecificproducts� Medicalsearches
� Smartphones� Areoftentransmittinglocationdata
� Greatifaphoneislostorstolen� Horribleifahousethiefgetsthedata
� Passwordsandcodesforkeyaccountsareoftenstoredwithoutyourknowledgeandthenuploadedtothecloudwithotherdata� Ifthecloudishackedyourinformationmaybeonthemarketwithoutyour
knowledge� Contactlistscanbecompromised� Photosmaybegatheredandsubjectedtovariousformsofanalysis
� Software� Manypiecesofsoftwarerecordalltypesofdata� Thisdatamayultimatelybecollectedandanalyzed� Sometimesitsimplysitsforgottenuntilsomeonedecidestoseewhat’sthere
13
![Page 14: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/14.jpg)
ManagingPersonalData–TerminologyandPrinciples(1)� Personalinformationisanyinformationrelatingtoanindividualperson
� Invisibleinformationgathering� Datacollectedwithoutyourknowledge
� Alwaysreadthefineprint� Howoftendoyouclickagreewhendownloading?
� Thisisanethicalissue� Cookies
� FilesaWebsitestoresonavisitor’scomputer� Secondaryuse
� Useofpersonalinformationforapurposeotherthanthepurposeforwhichitwasprovided� Saleofconsumerinformationyoumarketersorotherbusinesses� Useofinformationinvariousdatabasestodenysomeoneajob� UseofvehicleregistrationsbytheIRStofindpersonswithhigh
incomes� Useoftextmessagestoprosecuteforacrime� Usingyourinformationinanillegalmannerafterstealingorgleaningit
fromlegitimatesources
14
![Page 15: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/15.jpg)
ManagingPersonalData–TerminologyandPrinciples(2)� Datamining
� Searchingandanalyzingmassesofdatatofindpatternsanddevelopnewinformationorknowledge
� Computermatching� Combiningandcomparinginformationfromdifferentdatabases(usingsocialsecuritynumber,forexample)tomatchrecords.
� Computerprofiling� Analyzingdatatodeterminecharacteristicsofpeoplemostlikelytoengageinacertainbehavior
15
![Page 16: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/16.jpg)
InformedConsentProvidesAnEthicalFrameworkForInformationCollection� InformedConsent
� Youmustagreebeforeyourinformationcanbecollectedorused� Couldbeusedtopressureyouifyouaredeniedaservicewithout
agreeingtosharethisdata� LoJackcollectsinformationaboutyourcarlocationcontinuously–
wasthisinformedconsent� TheAAAtriedcollectinginformationbyaskingyouifyou’dliketo
hookdatacollectorsintoyourcar–thentheyreportedthatdatatotheinsurancesideofthehouse
� Twocommonformsforprovidinginformedconsentareoptoutandoptin:� Inoptoutapersonmustrequest(usuallybycheckingabox)that
anorganizationnotuseinformation.� Inoptinthecollectoroftheinformationmayuseinformationonly
ifpersonexplicitlypermitsuse(usuallybycheckingabox).� DiscussionQuestion:
� Haveyouseenopt-inandopt-outchoices?Where?Howweretheyworded?Wereanyofthemdeceptive?
16
![Page 17: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/17.jpg)
FairInformationPrinciples� Abasicsetofprinciplesforbusinessestohandledatainanethicalway� Informpeoplewhenyoucollectdata� Collectonlythedatathatisneeded� Makeoptinyourdefault� Offeroptoutmethodsthatcanbeusedatanytime
� Itishardertoensureifalldataisdeletedifyouoptinandthenoptout
� Keepdataonlyaslongasisneed� Maintainaccuracyofdata� Protectthedata.Useallreasonablesecuritymethodstodoso.
� Developpoliciesforrespondingtolawenforcementrequests� Manygovernmentorganizationsaredevelopingguidelines
� FTCFairInformationPracticePrinciples.pdf
17
![Page 18: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/18.jpg)
DataMining
18
http://www.tutorialspoint.com/data_mining/dm_quick_guide.htm
![Page 19: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/19.jpg)
WhatIsDataMining?
� Dataminingisdefinedasextractinginformationfromhugesetsofdata.� Inotherwords,wecansaythatdataminingistheprocedureof
miningknowledgefromdata.� Dataminingcanintegratemanydifferentdatasets
� Theinformationorknowledgeextractedfromdataminingcanbeusedforanyofthefollowingapplications� Profiling–Thisiswhereprivacyreallygetsinvolved� CustomerRetention� PatternAnalysis� MarketAnalysis� FraudDetection� ProductionControl� ScienceExploration
19
![Page 20: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/20.jpg)
DataMiningTasks� Dataminingdealswiththekindofpatternsthatcanbemined.Onthebasisofthekindofdatatobemined,therearetwocategoriesoffunctionsinvolvedinDataMining−� TheDescriptiveFunctiondealswiththegeneralpropertiesofdata
inthedatabase.� Class/ConceptDescription� MiningofFrequentPatterns� MiningofAssociations� MiningofCorrelations� MiningofClusters
� ClassificationandPredictionistheprocessoffindingamodelthatdescribesthedataclassesorconcepts.Thepurposeistobeabletousethismodeltopredicttheclassofobjectswhoseclasslabelisunknown.Thisderivedmodelisbasedontheanalysisofsetsoftrainingdata.Thederivedmodelcanbepresentedinthefollowingforms−� Classification(IF-THEN)Rules� DecisionTrees� MathematicalFormulae� NeuralNetworks
20
![Page 21: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/21.jpg)
DescriptiveTasksInDataMining(1)� TheClass/ConceptDescriptionreferstothedatatobeassociatedwiththe
classesorconcepts.Forexample,inacompany,theclassesofitemsforsalesincludecomputerandprinters,andconceptsofcustomersincludebigspendersandbudgetspenders.Suchdescriptionsofaclassoraconceptarecalledclass/conceptdescriptions.Thesedescriptionscanbederivedbythefollowingtwoways−� DataCharacterizationreferstosummarizingdataofclassunderstudy.This
classunderstudyiscalledasTargetClass.� DataDiscriminationreferstothemappingorclassificationofaclasswith
somepredefinedgrouporclass.� MiningofFrequentPatternslooksatpatternsarethosepatternsthatoccur
frequentlyintransactionaldata.Thelistofkindoffrequentpatternsincludes� TheFrequentItemSetisasetofitemsthatfrequentlyappeartogether,for
example,milkandbread.� TheFrequentSubsequenceisasequenceofpatternsthatoccurfrequently
suchaspurchasingacameraisfollowedbymemorycard.� TheFrequentSubStructurereferstodifferentstructuralforms,suchas
graphs,trees,orlattices,whichmaybecombinedwithitem−setsorsubsequences.
21
![Page 22: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/22.jpg)
DescriptiveTasksInDataMining(2)� MiningofAssociation
� Thisprocessreferstotheprocessofuncoveringtherelationshipamongdataanddeterminingassociationrules.
� Associationsareusedinretailsalestoidentifypatternsthatarefrequentlypurchasedtogether,helpingtoidentifypotentialbuyers� Forexample,aretailergeneratesanassociationrulethatshowsthat70%oftime
milkissoldwithbreadwhileonly30%oftimesarebiscuitssoldwithbread.� MiningofCorrelations
� Additionalanalysisperformedtouncoverinterestingstatisticalcorrelationsbetweenassociated-attribute−valuepairsorbetweentwoitemsetstoanalyzethatiftheyhavepositive,negativeornoeffectoneachother.
� Wanttounderstandifthereisactualcausation� MiningofClusters
� Clusterreferstoagroupofsimilarkindofobjects.� Clusteranalysisreferstoforminggroupofobjectsthatareverysimilar
toeachotherbutarehighlydifferentfromtheobjectsinotherclusters.
� Cangroupbygender,age,homelocation,language,….
22
![Page 23: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/23.jpg)
ClassificationandPredictionFunctions
� Classification−Itpredictstheclassofobjectswhoseclasslabelisunknown.Itsobjectiveistofindaderivedmodelthatdescribesanddistinguishesdataclassesorconcepts.TheDerivedModelisbasedontheanalysissetoftrainingdatai.e.thedataobjectwhoseclasslabeliswellknown.
� Prediction−Itisusedtopredictmissingorunavailablenumericaldatavaluesratherthanclasslabels.RegressionAnalysisisgenerallyusedforprediction.Predictioncanalsobeusedfordistributiontrendsbasedonavailabledata.
� OutlierAnalysis−Outliersmaybedefinedasthedataobjectsthatdonotcomplywiththegeneralbehaviorormodelofthedataavailable.
� EvolutionAnalysis−Evolutionanalysisreferstothedescriptionandmodelregularitiesortrendsforobjectswhosebehaviorchangesovertime.
23
![Page 24: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/24.jpg)
DataWarehousing
� Datawarehousingistheprocessofconstructingandusingthedatawarehouse.Adatawarehouseisconstructedbyintegratingthedatafrommultipleheterogeneoussources.Itsupportsanalyticalreporting,structuredand/oradhocqueries,anddecisionmaking.� Datawarehousinginvolvesdatacleaning,dataintegration,anddataconsolidations.Tointegrateheterogeneousdatabases,wehavethefollowingtwoapproaches−� QueryDrivenApproach� UpdateDrivenApproach
24
![Page 25: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/25.jpg)
QueryDrivenApproach
� Thisisthetraditionalapproachtointegrateheterogeneousdatabases.� Buildswrappersandintegratorsontopofmultipleheterogeneous
databases.Theseintegratorsarealsoknownasmediators.� TheprocessoftheQueryDrivenApproach
� Whenaqueryisissuedtoaclientside,ametadatadictionarytranslatesthequeryintooneormorequeries,appropriatefortheindividualheterogeneoussiteinvolved.
� Nowthesequeriesaremappedandsenttothelocalqueryprocessor.� Theresultsfromheterogeneoussitesareintegratedintoaglobal
answerset.� Advantages
� Governmentdoesn’tgettokeepalargedatabaseofinformationonpermanentfile
� Don’tneedtomaintainalargeITinfrastructure� Disadvantages
� TheQueryDrivenApproachneedscomplexintegrationandfilteringprocesses.� Itisveryinefficientandveryexpensiveforfrequentqueries.� Thisapproachisexpensiveforqueriesthatrequireaggregations(constant
regrouping)ofdata
25
![Page 26: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/26.jpg)
UpdateDrivenApproach� Today'sdatawarehousesystemsfollowupdate-drivenapproachratherthan
thetraditionalapproachdiscussedearlier.� Intheupdate-drivenapproach,theinformationfrommultipleheterogeneous
sourcesisintegratedinadvanceandstoredinawarehouse.� Thisincludesdatascrubbing–theprocessofvalidatingdataforcorrectnessin
advance� Thisinformationisavailablefordirectqueryingandanalysis.� Advantages
� Thisapproachprovideshighperformance.� Thedatacanbecopied,processed,integrated,annotated,summarizedand
restructuredinthesemanticdatastoreinadvance.� Inotherwords,westoredataintheway(s)wewanttolookatit
� Queryprocessingdoesnotrequireaninterfacewiththeprocessingatthelocaloriginaldatasources.� Muchlessintrusiveandresourceintensivetopullthedataonce,ratherthanwhenever
youwanttoquery� Disadvantages
� Mustmaintainalargeinfrastructuretoimport,storeandmaintaindata� Privacyconcernssincethegovernmentnowhasaccesstosomuchdata
� ThewholedebateonthePatriotActcenteredaroundwhetherornotthegovernmentcouldcontinuouslycollectandstoremetadatafromtheISPsandcell/land-linephoneproviders� Apolitical/privacyargumentconflictedwithatechnicalargument
26
![Page 27: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/27.jpg)
DataWarehousingandDataMining� OnlineAnalyticalMiningintegrateswithOnlineAnalyticalProcessing
todiscoverknowledgeacrossmultidimensionaldatabases.
27
![Page 28: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/28.jpg)
On-lineAnalyticalMining
� On-lineAnalyticalMining(OLAM)hasthefollowingimportantattributes� Highqualityofdataindatawarehouses
� Thedataminingtoolsarerequiredtoworkonintegrated,consistent,andcleaneddatawhichareverycostlyinthepreprocessingofdata.
� ThedatawarehousesconstructedbysuchpreprocessingarevaluablesourcesofhighqualitydataforOLAPanddataminingaswell.
� Acomplexinformationprocessinginfrastructuresurroundseachdatawarehouses� Informationprocessinginfrastructurereferstoaccessing,integration,
consolidation,andtransformationofmultipleheterogeneousdatabases,web-accessingandservicefacilities,reportingandOLAPanalysistools.
� On-lineAnalyticalProcessing(OLAP)−basedexploratorydataanalysis� Exploratorydataanalysisisrequiredforeffectivedatamining.� OLAPprovidesfacilitiesfordataminingonvarioussubsetofdataandat
differentlevelsofabstraction.� Onlineselectionofdataminingfunctions
� IntegratingOLAPwithmultipledataminingfunctionsandonlineanalyticalminingprovidesuserswiththeflexibilitytoselectdesireddataminingfunctionsandswapdataminingtasksdynamically.
28
![Page 29: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/29.jpg)
StepsInDataMining� DataCleaning
� Thenoiseandinconsistentdataisremoved.� DataIntegration
� Multipledatasourcesarecombined.� DataSelection
� Datarelevanttotheanalysistaskareretrievedfromthedatabase.� DataTransformation
� Dataistransformedorconsolidatedintoformsappropriateforminingbyperformingsummaryoraggregationoperations.
� DataMining� Intelligentmethodsareappliedinordertoextractdatapatterns.
� PatternEvaluation� Datapatternsareevaluated.
� KnowledgePresentation� Knowledgeisrepresented,oftengraphically
29
![Page 30: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/30.jpg)
30
TheProcessofKnowledgeDiscovery
![Page 31: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/31.jpg)
Multi-DimensionalDatabases
� Multidimensionalstructuresuseavariationoftherelationalmodeltoorganizedataandexpresstherelationshipsbetweendata.� Morecomplexthanthetypicalrow/columnrelationaldatabase.Eachcellwithinamultidimensionalstructurecontainsaggregateddatarelatedtoelementsalongeachofitsdimensions
� Timeisanadditionaldimensionusedintheanalysisofdata
31
![Page 32: Privacy Overview and Data Mining CSC 301 Spring 2018 ... · CSC 301 Spring 2018 ... Internet for either benign (B) or malicious (M) purposes ... Classification − It predicts the](https://reader033.vdocuments.site/reader033/viewer/2022043021/5f7705300a10bb29354e40bf/html5/thumbnails/32.jpg)
ExampleOfAMulti-DimensionalDatabaseStructure
32