protecting your data: backups, archives & data preservation · data preservation is more than...
TRANSCRIPT
ProtectingYourData:Backups,Archives&DataPreservation
DataONECommunityEngagement&OutreachWorkingGroup
KeyDigitalPreservationConceptsBackups:ThingstoConsiderDataPreservationRecommendedPractices
LessonTopics
DefinethedifferencesbetweenbackupsandarchivingdataIdentifysignificantissuesrelatedtodatabackupsIdentifywhybackupplansareimportantandhowtheycanfitintolargerbackupproceduresDiscusswhatdatapreservationcoversListseveralrecommendedpractices
LearningObjectivesAftercompletingthislesson,theparticipantwillbeableto:
TheDataONEDataLifeCycle
DifferencesataGlance
DataProtectionIncludestopicssuchas:backups,archives,&preservation;alsoincludesphysicalsecurity,encryption,andothersnotaddressedhereMoreinformationaboutthesetopicscanbefoundinthe“References”section
DataProtection,Backups,Archiving,Preservation
Terms“backups”and“archives”areoftenusedinterchangeably,butdohavedifferentmeanings
Backups:copiesoftheoriginalfilearemadebeforetheoriginalisoverwrittenArchives:preservationofthefile
DataPreservationIncludesarchivinginadditiontoprocessessuchasdatarescue,datareformatting,dataconversion,metadata
DataProtection,Backups,Archiving,Preservation(continued)
BackupsUsedtotakeperiodicsnapshotsofdataincasethecurrentversionisdestroyedorlostBackupsarecopiesoffilesstoredforshortornear-long-termOftenperformedonasomewhatfrequentschedule
ACloserLook:Backupsvs.Archiving
Savetime,money,productivityHelppreparefordisasters
AccidentaldeletionsFires,naturaldisastersSoftwarebugs,hardwarefailures
Reproduceresultsofpastprocedures(iftheywerebasedonolderfiles)RespondtodatarequestsLimitliability
WhyPerformBackups?Limitlossofdata,someofwhichmaynotbereproducible
Whataretheexistingpoliciesthatmightaffecthowandwhenyoudodatabackups?
Maybeseparateproject,office,department,fundingsource,ororganizationalpolicesPoliciesmaydifferbetweengroups;whichhasprecedence?Arebackupsalreadypartofalargerdatamanagementorcontingencyplanforyourgroup?
Whoisresponsibleforperformingbackups?Users?Systemadministrators?Both?
Dothesevariouspoliciesfityourneeds?
Backups:ThingstoConsider
Howoftenshouldyoudobackupstocapturesignificantchange?CostversusbenefitContinually?Daily?Weekly?Monthly?
Whatkindofbackupsshouldyouperform?Partial:backinguponlythosefilesthathavechangedsincethelastbackupFull:backing-upallfilesHowoftenandwhatkindwilldependuponwhatkindofdatayouhaveandhowuniqueitis
Whataboutnon-digitalfiles(suchaspapers)?Considerdigitizingfiles
Backups:ThingstoConsider(continued)
Wherewillyoubackupyourfiles?Maydependuponprojectrequirements,etc.Personalexternaldisk,centralizedcomputerstorage(Dropbox),“cloud”storage(Amazon,Google)
CDsandDVDs,whilecheapandconvenient,arenotgoodmediaforbackups
Whatmetadataisneededwhenusingthesesystems?Arethefilesbackedupindividuallyorasonelargefile?Considerthatnotallbackupsmaybeimmediatelyavailable,dependingonhowthefilesarepackagedGoodpracticetokeepbackupsindifferentlocationthansourcedataIfadisasterstrikes,itcandestroybothversionsofdata
Backups:ThingstoConsider(continued)
Howarebackupscarriedout?Manuallymayworkforsinglefiles,butrequiresthattheuserrememberstoperformregularbackupsandcanbetime-consumingAutomatedbackupscanberunonasetschedulethatdoesn’trequiretheusertoremember
WhatdoIdoifIneedtogetafilefrombackups?BackupmodemaydeterminehowthefilecanberetrievedYoushouldknowhowtoobtainfilesfrombackups,wheretheyarelocated,andwhotocontactYouneedtoknowthisinformationbeforehand,asoftenyouneedafilefromabackupinanemergency!
Understandingthebackupprocessispartofcreatinggooddatamanagementpractices
Considerations
Howdoyouverifyabackuphasbeensuccessfullyperformed?Mostbackupsoftwarewillhavealogfilethatcontainsdetailsofthebackup(whichfiles,whenthebackupwascreated)However,don’trelysolelyonthelogfileEvenifalogfilestatesthebackupwassuccessful,youstillneedtocheckthebackuptomakesurethefilesarethereandaccessibleTestbytryingtopullafileofffrombackupandrestoreittoanotherlocationHardwareandsoftwarefailurescanhappenafterbackupsandlogfilesaremadeMakesureyoursystemisbackingupthecorrectfiles
Considerations
Ifyouareworkingwithsomeone,suchasanITgroup,whohelpsmanageandperformbackups,confirmandverifythatthebackupprocesshasbeensuccessfullycompletedHowdoyouverifyabackuphasbeensuccessfullyperformed?
Sincemanualchecksofallofthefilesinyourbackupisprobablynotpossible,youshouldutilizeothermethodssuchascheckingfilesizes,datestamps,andchecksumvalues.Checksumaremathematicalcalculationsbaseduponaspecificfile.Ifthecalculatedchecksumsmatchbetweenthebackupcopyandtheoriginalfile,chancesarethefileisthesameandwasnotmodifiedwhencopiedorstored.
Considerations
Aretherebackupsofthebackups?Necessaryforhigh-valuedataUsuallydifferentcopiesofbackupsarekeptindifferentlocations
Howlongdoyoukeepyourbackups?Dependsuponspecificsituation,andshouldbedeterminedinconcertwithstakeholdersandresourcemanagersUnderstandrelevantguidelines,policiesandrulesforretentionofdata
Whatarethelongtermstorageandaccesssolutionsthatarerelevantfortheproject?Whattodowhenfundingendsorkeystaffdepart?
Changesinthestatusoftheproject,funding,orkeystaffareimportantreasonstohaveafullunderstandingofrelatedoptionsandrequirementsforstorageandaccess
Considerations
Adesignfirmwashandlingtheirownbackups.Thesystemwasworkingandthebackupsoftwarewasreportingthatthedatawassuccessfullybackedup.Theadministratorcheckedthebackupsimmediatelyaftertheyweredoneandconfirmedtheyweregood.
DatainRealLife
DatainRealLifeAfteracomputerviruserasedmostoftheirfiles,theywentbacktotheirbackups.Unfortunatelytheyfoundthatthebackupswereallblankandallofthedatawasgone.Onlyaftersomeinvestigationdidtheydiscoverthatthecomputertapes(whichcontainedthebackups)wereplacedagainstawallthathadanelevatorontheothersideofit.Whentheelevatorwentpast,themagnetsinsideerasedallofthetapes.
Hadtheycheckedtheirbackupsproperly,theyprobablywouldhavenoticedthisbeforetherewasanemergency
Canyoureaddatafromolderbackups?Mediachanges.Youmaynolongerbeabletoreadolderversionsandformatssuchasfloppydisks,JazzandZipdrives,WordPerfectfiles,etc.
Mediacandegradequickly,unexpectedly,inconsistentlyEvenifyoucanopenafiletoday,thatdoesn’tmeanyoucaninamonthfromnow
Howwillyoudisposeofoutdateddata?Makedecisiontocopy,archiveRemember:backupthedatayoucan’taffordtolose!
FinalConsiderations
Bymanagingandpreservingyourdatawell,datarescuemaynotbenecessary.Why?
Additionofrelevantmetadata,properfilenaming(canhelpthefilefromgettinglostinthesystem),utilizationofproperfileformats(letsyouopenthefilewithouthavingtoconvertthefile),backups(limitslossoffiles),andmediatypes(limitsdegradationoffiles),youmaylimitorpreventtheneedfordatarescue.
Agooddatamanagementplanisanothertooltohelplimittheneedfordatarescue.
DataPreservation
Includesbackupsandarchivinginadditiontoprocessessuchasdataconversion,datareformatting,anddatarescue
Olderfilesmaynolongerbeinausableformatandmayrequireconversionor“rescue”beforethedatacanbeused.Datareformatting,conversion,andbackupbecomesevenmoreimportantasprojectsfinishupand/orarenolongerfunded.Datamayhavebeenkeptattheendoftheproject,butifnooneismanagingthedata,datamaybeleftinformatsthatarenolongerusableorinlocationsthatarenolongeraccessible.
Additionally,datapreservationrequiresplanning,structure,andongoingmanagementandassessment
ProcessesRelatedtoDataPreservation
Createuseful,relevantmetadataDataConversionsandFormats
Usenon-proprietary,standardformatsConverttextfilesfrom.docor.xlsto.txt,imagefilesto.tiffor.pdfBesuretocheckfilesafterconvertingthem,asdata,metadata,andformattinglosscanoccur
VersioningUseconsecutivenumbersandletterstohelpkeeptrackofchangestoafilethroughoutvariouseditsandrevisions.Thiswillhelpyouquicklydifferentiatebetweenfileswithsimilarnames.
FileNamingUsefilenamesthatareconsistent,descriptive,andconcisesothatyoucanfindandquicklyidentifythefilethefileatalatertime.Renamefilesthathaveadefaultfilenamewhenexportedsuchas“image.jpg”or“archive.zip”
PreservationFormatsandVersionStrategies
Createapreservationpolicythatclearlyidentifies:rolesresponsibilitieswherethedataisbackeduphowoftenthefilesarebackeduphowtoaccessthefilesrecommendedfileformatstobeusedpoliciesformigratingdatatoassuredataarenotlostduetomediadegradationorchangingformatsorprograms
Reviewyourpreservationpolicyandplanperiodicallytoensureitisstillvalidandapplicable
RecommendedPractices
Minimizeorremoverelianceonuserstoperformownmanualbackups(ifpossible)
ImplementstandardizedandautomaticbackupsIfpossible,putexpertsinchargeofthistask(computerstaff)astheyaremorelikelytokeepup-to-dateregardingsoftwareupdates,hardwareissues,bestpractices,etc.
Don’tassumebackupsarebeingperformedforyouYoudon’twanttofindoutafterthefactthatnobackupshavebeenperformedIfyouareusingthird-partysoftware(likeYahooorGoogleMail),whathappensiftheyloseyourfiles?
Usenon-proprietary,standardformatsConverttextfilesfrom.docor.xlsto.txt,imagefilesto.tiff,or.pdf
RecommendedPractices(continued)
CheckyourbackupsmanuallyStartwithlogfiles,astheymaytellyouthebackupwasunsuccessfulDonotrelysolelyonthelogfiles–theymaybeincorrectorthedatamayhavebecomecorruptedafterthefilewastransferredLookatfiledatesandfilesizestoseeiftheymatch;calculateachecksumontheoriginalandarchivedfileandmakesuretheymatchEnsureyoucanreadfilesoffofolderbackupsandarchives.
HavemultipleversionsofbackupsonmultipleformatsinmultipleplacesGooddatamanagementwilllimittheamountofdatarescuethatneedstobeperformedonolderdata
RecommendedPractices(continued)
DatainRealLifeIn2011,asoftwarebugcausedsomeGmailuserstoloseaccesstotheiremail.Fortunately,Googlehadbackups!
Datapreservationismorethanjustbackingupandarchivingyourfilesorganizationalinfrastructure,technologicalsituation,resources
Whendevisingapreservationstrategy,oneneedstoconsiderhowoftentoperformbackups,wheretobackup,accessibilitytobackupsandhowlongtokeepthefilesTherearemanyreasonsweneedtoperformbackups,primarilytopreventdatalossCheckforbackupsonoutdatedmediaandtestyourbackupsoften!
Summary
References1. StanfordUniversityLibraries,DataManagementPlans,(StanfordUniversity
Libraries),https://library.stanford.edu/research/data-management-services,(accessed9/21/2016)
2. Albanesius,Chloe,Google:Storagesoftwareupdateledtoe-mailbug,http://www.pcmag.com/article2/0,2817,2381168,00.asp(accessed09/21/2016)
3. VandenEynden,Veerle,Corti,Louise,Woollard,Matthew,Bishop,LibbyandHorton,Laurence,ManagingandSharingData,http://www.data-archive.ac.uk/media/2894/managingsharing.pdf,andcompanionmaterials,https://www.ukdataservice.ac.uk/manage-data/handbook(accessed09/21/2016)
Formoreinformationaboutphysicalsecurity,encryption,anddatadisposal,visit:http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
AboutParticipateinourGitHubrepo:https://dataoneorg.github.io/dataone_lessons/
Thefullslidedeck(inPowerPoint)maybedownloadedfrom:http://www.dataone.org/education-modules
Suggestedcitation:DataONEEducationModule:DataManagement.DataONE.RetrievedNovember12,2016.Fromhttp://www.dataone.org/sites/all/documents/L01_DataManagement.pptx
Copyrightlicenseinformation:Norightsreserved;youmayenhanceandreuseforyourownpurposes.WedoaskthatyouprovideappropriatecitationandattributiontoDataONE.