large databases in industry wendy moncur
TRANSCRIPT
![Page 1: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/1.jpg)
Large Databases Large Databases in Industryin Industry
Wendy MoncurWendy Moncur
Department of Computing Science, Department of Computing Science,
University of AberdeenUniversity of Aberdeen
![Page 2: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/2.jpg)
Large Databases in Large Databases in IndustryIndustry
Database design & management in a Database design & management in a major bankmajor bank
Case studyCase study 6000-table Personnel database6000-table Personnel database
![Page 3: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/3.jpg)
My backgroundMy background
DataBase AdministratorDataBase Administrator (DBA) at one of (DBA) at one of UK’s largest banks.UK’s largest banks.
Designed databases for high performance Designed databases for high performance & availability. & availability.
Platform: DB2 & SQLPlatform: DB2 & SQL
Largest database: 6000 tablesLargest database: 6000 tables
![Page 4: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/4.jpg)
DBA SalariesDBA Salaries
DBA Average Minimum Salary DBA Average Minimum Salary £41,896£41,896
DBA Average Maximum Salary DBA Average Maximum Salary £47,147£47,147
Source: Source: http://www.itjobswatch.co.ukhttp://www.itjobswatch.co.uk (2008) (2008)
![Page 5: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/5.jpg)
What does a What does a DBADBA do? do?
Database design & creationDatabase design & creation
Quality assurance of SQLQuality assurance of SQL
Database optimisationDatabase optimisation
Performance managementPerformance management
Database administrationDatabase administration
SecuritySecurity
![Page 6: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/6.jpg)
Database design & Database design & creationcreation
Process of fitting a database design to Process of fitting a database design to clients’ requirements.clients’ requirements.
Database design achieved in 3 phases:Database design achieved in 3 phases:1.1. Conceptual Conceptual – model data independent of all physical – model data independent of all physical
considerationsconsiderations
2.2. LogicalLogical – refine and map conceptual model onto – refine and map conceptual model onto relational model (or some other database model such relational model (or some other database model such as object-oriented )as object-oriented )
3.3. Physical Physical – map logical model onto a specific DBMS– map logical model onto a specific DBMS
![Page 7: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/7.jpg)
Quality assurance of SQLQuality assurance of SQL
Review application code written by Review application code written by developersdevelopers Understand application Understand application
Use EXPLAIN to check individual SQL statementsUse EXPLAIN to check individual SQL statements May need to change application or indexesMay need to change application or indexes
Are indexes used?Are indexes used?
Is the run time acceptable?Is the run time acceptable? BatchBatch OnlineOnline
![Page 8: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/8.jpg)
Database optimisationDatabase optimisation
Improve indexing Improve indexing Delete redundant indexesDelete redundant indexes Check order of columns in multi-column Check order of columns in multi-column
indexes matches application needs:indexes matches application needs: e.g. – Personnel table with index on e.g. – Personnel table with index on
Surname, FirstNameSurname, FirstName versus versus FirstName, FirstName, SurnameSurname
Confirm whether indexes should be Confirm whether indexes should be AscendingAscending or or DescendingDescending
Verify clustering key is appropriateVerify clustering key is appropriate
![Page 9: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/9.jpg)
Performance Performance managementmanagement
Clients will specify:Clients will specify: Database implementation date Database implementation date Online transaction times in millisecondsOnline transaction times in milliseconds Batch process run timesBatch process run times Recoverability of dataRecoverability of data
![Page 10: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/10.jpg)
Database administrationDatabase administration Once the database is ‘Once the database is ‘livelive’:’:
Backup and recovery strategyBackup and recovery strategy How far back?How far back? How many transactions lost? – depends on business How many transactions lost? – depends on business
data helddata held
Reorganisation strategyReorganisation strategy Trade off between availability & performanceTrade off between availability & performance
Implementation of changes on databaseImplementation of changes on database
Application implementation – stability testsApplication implementation – stability tests
![Page 11: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/11.jpg)
SecuritySecurity
Control access to data in test & Control access to data in test & productionproduction Even test data may be sensitiveEven test data may be sensitive
Sample data from live databaseSample data from live database LEB: “Baroness Gardner of Parkes”LEB: “Baroness Gardner of Parkes” Coutts is the bank for the Queen ….Coutts is the bank for the Queen ….
Only DBAs should have access to delete Only DBAs should have access to delete or modify the database….or modify the database….
Use views to control users’ & developers’ Use views to control users’ & developers’ information accessinformation access
![Page 12: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/12.jpg)
Other DBA DeliverablesOther DBA Deliverables
DocumentationDocumentation for: for: Requirements specificationRequirements specification
As defined by clients, developers, managers, As defined by clients, developers, managers, contractorscontractors
Design decisions – in case of problems/ Design decisions – in case of problems/ upgradesupgrades
Application design reviews and testsApplication design reviews and tests Handover to Handover to ProductionProduction
![Page 13: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/13.jpg)
Career structureCareer structure
1.1. GraduateGraduate
2.2. Trainee DBATrainee DBA
3.3. DBA DBA May be split into production May be split into production oror
developmentdevelopment Production - £££ for being on callProduction - £££ for being on call Development – less stress!Development – less stress!
4.4. Consultant/ Team leaderConsultant/ Team leader
![Page 14: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/14.jpg)
Case study: the monster database
• 6000+ tables• 18000+ indexes
![Page 15: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/15.jpg)
Part1: ChallengesPart1: Challenges
““One size fits all”One size fits all” External supplierExternal supplier 6000+ tables6000+ tables 18000+ indexes18000+ indexes 1 tablespace1 tablespace Short timescaleShort timescale
![Page 16: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/16.jpg)
Challenges: “one size fits all”?Challenges: “one size fits all”?
One size does One size does notnot fit all. fit all.
Performance of SQL statements Performance of SQL statements dependent on:dependent on:
Database designDatabase design Index designIndex design
The The DATADATA
![Page 17: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/17.jpg)
Challenges: “one size fits all”?Challenges: “one size fits all”?
Every company has different requirements.Every company has different requirements.
Customers Customers demand demand high performance... and high performance... and control the budget. control the budget.
Service Level Agreements (Service Level Agreements (SLAsSLAs) dictate … ) dictate … Minimum transaction speedMinimum transaction speed Number of concurrent usersNumber of concurrent users Number of remote locationsNumber of remote locations Daily system availabilityDaily system availability
Database must be Database must be tailored tailored to achieve site-specific to achieve site-specific SLAs.SLAs.
![Page 18: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/18.jpg)
Challenges: external Challenges: external suppliersupplier
Software package & database from Software package & database from external supplier.external supplier.
CannotCannot change this. change this.
![Page 19: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/19.jpg)
Challenges: 6,000+ tablesChallenges: 6,000+ tables
Cannot Cannot change tables: no denormalisation change tables: no denormalisation allowed.allowed.
Supplied program code demands these tables Supplied program code demands these tables exist. exist.
Cannot change supplied program code unless Cannot change supplied program code unless essentialessential..
![Page 20: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/20.jpg)
Challenges: 18,000+ indexesChallenges: 18,000+ indexes
Can Can change indexes:change indexes:
Unique indexesUnique indexes
Clustering indexesClustering indexes
Secondary indexesSecondary indexes
![Page 21: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/21.jpg)
Unique indexUnique index
Defines what makes a row unique. Defines what makes a row unique.
Components of the index Components of the index cannotcannot be be changed. changed.
Order of componentsOrder of components cancan be be changed.changed.
![Page 22: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/22.jpg)
Unique indexUnique index
E.g. – for Table “E.g. – for Table “EMPLOYEEEMPLOYEE” ”
Unique index =Unique index = DateOfBirth, Firstname, Surname.DateOfBirth, Firstname, Surname.
Most queries ask for data where only Most queries ask for data where only Surname, Surname, FirstnameFirstname are known.are known.
SELECT Surname, Firstname, DateOfBirthSELECT Surname, Firstname, DateOfBirthFrom EmployeeFrom EmployeeWhere Where SurnameSurname = “Jenkins” And= “Jenkins” And FirstnameFirstname = “Malcolm” ;= “Malcolm” ;
Recommendation: Change order of unique index to Recommendation: Change order of unique index to Surname, Firstname, DateOfBirth. Surname, Firstname, DateOfBirth.
![Page 23: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/23.jpg)
Clustering indexesClustering indexes
Defines the physical order in which rows Defines the physical order in which rows of data should be stored.of data should be stored.
Components of the index Components of the index cancan be be changed. changed.
Order of components Order of components cancan be changed. be changed.
![Page 24: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/24.jpg)
Clustering indexesClustering indexes
E.g. – Table “EMPLOYEE” E.g. – Table “EMPLOYEE”
Clustering index = Clustering index = DateOfBirthDateOfBirth
Yet most queries order by Yet most queries order by EmploymentStartDateEmploymentStartDate
SELECT EmploymentStartDate, Surname, FirstnameSELECT EmploymentStartDate, Surname, Firstname
From EmployeeFrom Employee
Where Surname = “Jenkins” And Firstname = “Malcolm” ;Where Surname = “Jenkins” And Firstname = “Malcolm” ;
Order by Order by EmploymentStartDate;EmploymentStartDate;
Recommendation: Change clustering index to use Recommendation: Change clustering index to use
EmploymentStartDate.EmploymentStartDate.
![Page 25: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/25.jpg)
Secondary indexesSecondary indexes
Not unique.Not unique.
Do not dictate how the data is to be held.Do not dictate how the data is to be held.
Created to improve performance of queries and updates.Created to improve performance of queries and updates.
Increases cost of insert and update, as must be created and Increases cost of insert and update, as must be created and maintained along with the table.maintained along with the table.
Recommendation: Recommendation: Drop superfluous secondary Drop superfluous secondary indexes. indexes.
![Page 26: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/26.jpg)
At least 4 test environments:At least 4 test environments:
96,000 objects!96,000 objects! ((6,000 tables + 18,000 indexes) * 4 environments)((6,000 tables + 18,000 indexes) * 4 environments)
3 months3 months
Challenge: Short Challenge: Short timescaletimescale
Vanilla Unit test System test Pre-live
![Page 27: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/27.jpg)
ToolsTools
Use tools to…Use tools to…
Check performance of each SQL Check performance of each SQL statementstatement
Manage change processManage change process
![Page 28: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/28.jpg)
Check performanceCheck performance
““EXPLAIN”EXPLAIN”
Evaluates route to data for every SQL statement.Evaluates route to data for every SQL statement.
Identifies what indexes are used Identifies what indexes are used
Doesn’tDoesn’t identify redundant indexes identify redundant indexes
Doesn’tDoesn’t identify indexes that need to be changed. identify indexes that need to be changed.
![Page 29: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/29.jpg)
Manage change process Manage change process
Rigorous control neededRigorous control needed
Achieved through…Achieved through… Consistent naming standards Consistent naming standards Detailed record of every changeDetailed record of every change Consistent route through environments, no short cutsConsistent route through environments, no short cuts DBA tools DBA tools
![Page 30: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/30.jpg)
Part1: Recap of Part1: Recap of challenges challenges
Can’t change:Can’t change:
““One size fits all”One size fits all”
External supplierExternal supplier
6000+ tables6000+ tables
CanCan change: change:
18000+ indexes18000+ indexes
1 tablespace1 tablespace
Short timescaleShort timescale
![Page 31: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/31.jpg)
Part2: The Production Part2: The Production DatabaseDatabase
Does it perform?Does it perform?
Can the Can the rightright people use it? people use it?
If disaster strikes, can the data be recovered?If disaster strikes, can the data be recovered?
![Page 32: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/32.jpg)
Does the database perform?Does the database perform?
Database performance monitored against Service Database performance monitored against Service Level Agreements (SLAs).Level Agreements (SLAs).
Regular health checks carried out:Regular health checks carried out: Data stored in sequence?Data stored in sequence? Enough space? Enough space?
If sub-standard performance, further database If sub-standard performance, further database design work done. design work done.
![Page 33: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/33.jpg)
Can the right people access the data?Can the right people access the data?
PERSONNEL database
![Page 34: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/34.jpg)
Can the right people access the data?Can the right people access the data?
Personnel team
Query & update data at individual or regional level
PERSONNEL database
![Page 35: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/35.jpg)
Can the right people access the Can the right people access the data?data?
Personnel team
Query & update data at individual or regional level
PERSONNEL database
DBA
Backup/ restore data
Reorganise data
Change database definitions
Update statistics on data
![Page 36: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/36.jpg)
Can the right people access the Can the right people access the data?data?
Personnel team
Query & update data at individual or regional level
PERSONNEL database
DBA
Backup/ restore data
Reorganise data
Change database definitions
Update statistics on data
Chief executive
Employee statistics
![Page 37: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/37.jpg)
Can the right people access the Can the right people access the data?data?
Personnel team
Query & update data at individual or regional level
PERSONNEL database
DBA
Backup/ restore data
Reorganise data
Change database definitions
Update statistics on data
Chief executive
Employee statistics
Staff member
Their own data
![Page 38: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/38.jpg)
Can the right people use the Can the right people use the database?database?
Different people, different information needs.Different people, different information needs.
Sensitive data – salary, health, discipline…Sensitive data – salary, health, discipline…
Solution Solution VIEWSVIEWS Transaction ManagementTransaction Management
![Page 39: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/39.jpg)
If disaster strikes, If disaster strikes, can the data be recovered?can the data be recovered?
Robust backup & recovery strategies for:Robust backup & recovery strategies for: Hardware failureHardware failure Software failureSoftware failure
![Page 40: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/40.jpg)
![Page 41: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/41.jpg)
Part2: Recap of Part2: Recap of Production Database Production Database
issuesissues Database must perform to acceptable level.Database must perform to acceptable level.
Only the Only the rightright people should have access to any people should have access to any data item.data item.
No matter what, the data must be recoverable.No matter what, the data must be recoverable.
![Page 42: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/42.jpg)
SummarySummary
MSc learning relevant to real worldMSc learning relevant to real world
Everything is bigger out there!Everything is bigger out there!
Grounding in basic understanding Grounding in basic understanding lets you handle complex challengeslets you handle complex challenges
![Page 43: Large Databases in Industry Wendy Moncur](https://reader034.vdocuments.site/reader034/viewer/2022051400/55616649d8b42a72628b4ec2/html5/thumbnails/43.jpg)