managing & accessing data effectively with databases · director, research techops hbs 12...
TRANSCRIPT
1
Managing&AccessingDataEffectivelywithDatabases
RCS2017BrownBagSeries
BobFreeman,PhDDirector,ResearchTechOps
HBS
12April,2017
Overview• RCSServices&whoami• Benefitsanddrawbacksofdatabases(DBs)• Usecases• Examples• UsingDBsviaexternalprograms
2
RCS Services
StatisticalSupport
MethodologicalConsulting
Consultationon“ReviseandResubmit”Requests
StatisticalProgramming
DataCollection,Cleaning,Parsing,Transfer,andMatching
Advocacy,Researching,andTestingofNewMethodsandApproaches
TechnologyTrainingSupport
UseofITforResearchConsulting
HBSComputeGrid,Storage,&FASRCOdysseyTraining
StatisticsPackageTraining
ProgrammingPackageTraining
DatabaseSystemTraining
VisualizationTraining
ResearchInfrastructure
SupportResearchGridAccess,Training,Management,andSupport
LicenseNegotiation,Installation,Training,andHelp-DeskSupportfor
HBSResearchSoftware
ManagementofHBSResearchersonOdysseySupercomputingClusterand
AmazonWebServices
DatabaseProgrammingandAdministration
DataSecurityandUsageAgreementSupport
3
4
Benefits&DrawbacksofDBs
TypesofDBs?• Threemajortypes:relational,hierarchical,&network
• Mostcommon:relationalDBmanagementsystems
5
RelationalDBs…• Dataisstoredintables• Columns(alsoknownasfields)describethedata• Rows(alsoknownasrecords)containthedata
6
RelationalDBs…
7
RelationalDBs…
8
WhyUseaDB?• DBsweredesignedforstoringandretrievingdata• Includepowerfulanalysistools• Canhandlelarge&complexdatasets• Haveevolvedtohandledisparatedatatypes:free-form(unstructured)text,images,
• Keepsactofstoringdataseparatefromanalysis• Canenforcequalitycontrol(constraints)duringdataentry(e.g.ensuredatesareproper&realistic)
• Offloadworkfromfront-endprogram(e.g.RorStata)totheDB
• AsktheDBtobringinonlythedatathatyouneed.I.e.Noneedtoload1000-columnwidedatafile
9
RelationalDBs…
10
WhyNotUseaDB?• Smalldata,simplesolution(morethanyouneed)• LearningcurveforSQL,butcanberathersmall*• Dataisnotappropriatetypeorshape• "It'sablackbox!"or"Mydataistrapped!"
*Dependsonhowyouwishtoengineerthings! 11
12
UseCases
RelationalDBs…
13
ResearchInboxMetrics
14DatafromFY16andQ1-Q2FY17
UsingDBsforaData-drivenWorkflow• Goal:Contrastgroupsharingpromotionalmaterialsagainstthoseworkingsolotoseeifthereisadifferenceinsales
• Data: 20franchiseswithapproximately4– 6salesreps/franchise
• Approach: Createamobile-enabled,web-basedsharingsystem,trackingmovementsthroughuploading,viewing,andsharing.In2– 4weeks!
• Tech: FileMakerPro,FileMakerProServer,andFileMakerWebDirect (nowebprogramming!)
15
DBUseCase#2
16
17
Examples
DBUseExamples• FileMakerPro• SQLite• MariaDB
18
19
UsingDBsviaExternalPrograms
SimpleGUIAdd-ons• Navicat,MySQLWorkbench,etc• GiveyouGUIcontrolsinadditiontoSQLcommands• Canbevery powerful
20
AccesstheDBviaDirectConnections• Directconnections isonemethod:• Gothroughcompany'sDBdriver(connector,library)• Youwork/programinyourenvironmentofchoice,aslongasitcanconnecttothedriver/library
• EachDBrequiresitsowndriver
21
AccesstheDBviaODBC• OpenDatabaseConnectivity istheothermethod!
• Modular,genericapproach• Usesabroker(manager)tohideDB-specificcommands
• YourprogramcanusegenericSQL• EachDBrequiresitsowndriverbetweenODBCManagerandtheDB
• GreatWindowssupport;EasiersaidthandoneonMac
22
Thoughts• Implicitassumption:workisdoneoutsideoftheDB• Demoisforlocalmachine(localhost),butthere'snoreasonwhyyoucannotspecify:
host=researchgrid.hbs.edu
^^comingvery soontoacomputegridnearyou
23
PseudoCode Example01:loadnecessarylibrariesfordatabase02:createandopenthedatabaseconnection03:initializethepointer(cursor)intothedatabasesystem04:executetheSQLquery05:fetchthequeryresults06:iterateoverthedataandprinttheresults07:closethedatabaseconnection
24
ODBCUseExamplesAttemptingtheImpossible!!• FileMakerPro• SQLite• MariaDB
25
• Pleasetalktoyourpeers,and…• Wewishyousuccessinyourresearch!
• http://intranet.hbs.edu/dept/research/• https://grid.rcs.hbs.org/• https://training.rcs.hbs.org/
• @hbs_rcs
ResearchComputingServices
26