systém riadenia bázy dát (database management system)
DESCRIPTION
Systém riadenia bázy dát (Database Management System). Ján GENČI PDT 2009. Obsah. RAID 2-phase multiway sort - merge Fyzick á organizácia dát Indexovanie Systémový katalóg Operácie relačnej algebry (krátko) Implementácia operácií relačnej algebry. Obsah (nestihneme). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/1.jpg)
Ján GENČI
PDT
2009
Systém riadenia bázy dát(Database Management System)
![Page 2: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/2.jpg)
2
Obsah
• RAID
• 2-phase multiway sort-merge
• Fyzická organizácia dát
• Indexovanie
• Systémový katalóg
• Operácie relačnej algebry (krátko)
• Implementácia operácií relačnej algebry
![Page 3: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/3.jpg)
3
Obsah (nestihneme)
• Transakčné spracovanie
• Paralelné spracovanie
• Zotavenie po chybách
![Page 4: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/4.jpg)
4
Literatúra [1]
• Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom: Database System Implementation, Prentice Hall, 1999. ISBN-10: 0130402648,
pp.653
• Database Systems: The Complete Book, 2001
![Page 5: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/5.jpg)
5
Literatúra [2]
• Elmasri R., Navathe S. B. : Fundamentals of database systems. 4th ed., Pearson Education, 2001. 5th ed. – 2006, pp. 1030 (ch. 13-15 -19; 120 resp. 220 str.)
![Page 6: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/6.jpg)
6
Literatúra [3]
• Ramakrishnan R., Gehrke J.: Database Management Systems. McGraw-Hill Science/Engineering/Math; 3rd ed., 2002, pp. 906 (ch. 7-14; 220 str.)
![Page 7: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/7.jpg)
7
Literatúra [4]
• Abraham Silberschatz, Henry Korth, S. Sudarshan: Database System Concepts. McGraw-Hill Science/Engineering/Math; 5th ed., 2005. pp.~920 (ch. 11-14-17; 170 resp. 290 str.
![Page 8: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/8.jpg)
RAID
Obrázky (väčšina) z [2]
![Page 9: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/9.jpg)
9
RAID
• Originally - Redundant Arrays of Inexpensive Disks.
• Currently - Redundant Array of Independent Disks
• Chen, Lee, Gibson, Katz, and Patterson (1994), ACM Computing Survey, Vol. 26, No.2 (June 1994).
• http://sk.wikipedia.org/wiki/RAID (pekne názorne spracované)
![Page 10: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/10.jpg)
10
RAID 0
![Page 11: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/11.jpg)
11
RAID 1, 2
![Page 12: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/12.jpg)
12
RAID 3, 4, 5, 6
![Page 13: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/13.jpg)
13
RAID – ďalšie kombinácie
• 10, 01 - Kombinácie základných RAIDov
• Performance:– Block-interleaved distributed-parity disk arrays
(RAID 5) have the best small read, large read, and large write performance of any redundant disk array.
– Small write requests are somewhat inefficient compared with redundancy schemes such as mirroring.
![Page 14: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/14.jpg)
Two phase, multiway sort-merge
Partially based on presentation of Simonas Šaltenis - Advanced Algorithm Design and Analysis
![Page 15: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/15.jpg)
15
Purpose of Algorithm
• Sorting of very large collection of data (Data>Memory)
• Classic algorithm – With’s sort-merge algorithm (Wirth C.: Algoritmy a dátové štruktúry.)
![Page 16: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/16.jpg)
16
Princíp – 1. fáza
1. Vytvoriť maximálne možné veľké „behy“ (utriedené postupnosti elementov) – najlepšie načítaním do dostupnej pamäte a zotriedením napr. quick-sortom
2. Spájanie behov (mergovanie)
![Page 17: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/17.jpg)
17
Princíp – 2. fáza
File Y:
File X:
Run 1 Run 2
Current page
Current page
EOF
Bf1p1
Bf2p2 Bfo
po
min(Bf1[p1], Bf2[p2], …, Bfk[pk])
Read, when pi = B
Write, when Bfo full
Run k=n/m
Current page
Bfkpk
![Page 18: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/18.jpg)
18
Zhodnotenie
• Phase 1: O(n), Phase 2: O(n)
• Total: O(n) I/Os!
• Files only of “limited” size can be sorted– Phase 2 can merge a maximum of m-1 runs
(m – number of buffers).– Which means: N/M (number of runs) < m-1
![Page 19: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/19.jpg)
19
Triedenie veľmi veľkých súborov
(m-1)2M
(m-1)3M = N
Phase 2
Phase 1
…
M M
(m-1)M
M M
M
M
… M M
(m-1)M
M M
M
M
… M M
(m-1)M
M M
M
M
…
… …. . .
. . .
. . .
. . . . . .
![Page 20: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/20.jpg)
20
Otázky
![Page 21: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/21.jpg)
SRBD – štruktúry a algoritmy
![Page 22: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/22.jpg)
22
![Page 23: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/23.jpg)
Primárne (fyzické) organizácie
![Page 24: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/24.jpg)
24
O čom budeme hovoriť
• Podporované dátové typy
• Formovanie záznamov
• Organizácia (radenie) záznamov– fyzická – logická
• „Umiestnenie“ DBMS v rámci OS
![Page 25: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/25.jpg)
25
Podporované dátové typy
• Tzv. built-in dátové typy
• Pre účely ukladania dát, je pre nás zaujímavá veľkosť dátového typu (sizeof(typ))
• „Sémantika“ typu je podporená implementáciou (HW alebo SW) relevantných operácií (out of scope)
![Page 26: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/26.jpg)
26
Storage Record Formats
• A fixed-length record
• A record with variable-length fields
• A variable-field record with separator characters.
![Page 27: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/27.jpg)
27
Storage Record Formats [2]
![Page 28: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/28.jpg)
28
Fixed length record
• Size of items is recorded in the system catalog
![Page 29: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/29.jpg)
29
Variable length records
• Result of item(s) of variable length
F 1 F 2 F 3 F 4$ $ $ $ Fi = po lo žka i
je dno tlivé po lia sú o dde le né o dde ľo vač m i
F 1 F 2 F 3 F 4
po le ukazo vate ľo v na po lo žky záznam u
![Page 30: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/30.jpg)
30
NULL value representation
• Prakticky väčšina zdrojov o spôsobe implementácie „mlčí“
• Pri záznamoch premenlivej dĺžky sa dá využiť null pointer na prvok záznamu
• ORACLE v dokumentácii pre ORA7 prezentoval ukladanie NULL hodnoty cez bitmapový prefix záznamu
![Page 31: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/31.jpg)
31
Fyzická organizácia záznamov
s lo t 1
s lo t 2
s lo t N
N
s lo t 1
s lo t 2
s lo t M
M
s lot 3
01 11
M 3 2 1
po č e t záznam o v po č e t s lo to vhlavič kas tránky
vo ľném ie s to
" p ac k e d " o r g an i zác i a " u n p ac k e d " ( b i t m ap o vá) o r g an i zác i a
![Page 32: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/32.jpg)
32
Fyzická organizácia záznamov 2
r id = ( i,N )
r id = ( i,2)
r id = ( i,1)
s t r án k a i
dĺžka 24
vo ľ n é m i e s t o
N241620
ad r e s ár s l o t o v( s l o t d i r e c t o r y )
N 2 1
poč et položiekv adres ár i s lo tov
d át o váo b l as ť
adres ár s lo tov obs ahujeokrem dĺžky každého záznam uaj ukazovateľ na zač iatokkaždého záznam u
![Page 33: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/33.jpg)
33
Umiestňovanie záznamov do fyzických blokov
• Spanned
• Unspanned
![Page 34: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/34.jpg)
34
Logické organizácie záznamov
• Sekvenčná
• Hašovaná
• Heap (hromada)
• Zhodnotenie z pohľadu operácií insert, find a delete
![Page 35: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/35.jpg)
35
Sekvenčná organizácia
![Page 36: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/36.jpg)
36
Zhodnotenie – sekvenčná org.
• Insert – drahá operácia (potreba posunúť priemerne N/2 záznamov) – oblasti pretečenia (overflow areas)
• Find – možnosť binárneho vyhľadávania podľa usporiadavajúceho atribútu - O(log2N), ináč O(N) = N/2 alebo N
• Delete – drahá operácia (potreba posunúť priemerne N/2 záznamov) – možnosť označovať záznamy ako zmazané pack
![Page 37: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/37.jpg)
37
Interné Hashovanie
![Page 38: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/38.jpg)
38
Zhodnotenie – hashovanie
• Insert – O(1) ak neuvažujeme konflikty; ak uvažujeme = najhorší prípad O(N)
• Find – O(1) – hashovací atribút, O(N) ostatné atribúty
• Delete – O(1)
• Štruktúra musí byť dimenzovaná na maximálny počet záznamov
![Page 39: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/39.jpg)
39
Externé hashovanie
![Page 40: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/40.jpg)
40
Zhodnotenie - externé hashovanie
• Ako interné hashovanie
• Konflikty sa riešia blokmi pretečenia (viď ďalší slajd )
![Page 41: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/41.jpg)
41
Ext. Hashovanie – overflow bloky
![Page 42: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/42.jpg)
42
Extendible hashing
![Page 43: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/43.jpg)
43
Zhodnotenie – ext. hashing
• Ako externé hashovanie
• Plusom je možnosť dynamického rozširovania „veľkosti hashovacieho poľa“
![Page 44: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/44.jpg)
44
Heap (hromada)
• Záznamy sú neusporiadané – nie je usporiadavací atrubút
• Strácame možnosť - binárne vyhľadávanie; primárny index (ale iba pre usporiad. atr.)
• Veľmi efektívna operácia INSERT
![Page 45: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/45.jpg)
45
Miesto DBMS v rámci OS
Cooked files Raw devices
• NTFS
DBMS
Služby OS
Filesystem
Driver
DBMS
Služby OS
-
Driver
![Page 46: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/46.jpg)
46
Otázky
![Page 47: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/47.jpg)
Indexovanie
Z podstatnej časti podľa [2]
Všetky obrázky z [2]
![Page 48: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/48.jpg)
48
Index
• Alternatívny spôsob prístupu k dátam
• Lokalizácia záznamu podľa obsahu
![Page 49: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/49.jpg)
49
Kategorizácia indexov
• Podľa počtu úrovní:– Jedno-úrovňové– Viac-úrovňové
• Podľa indexovaného atribútu:– Primárne– Klastrovacie (clustering)– Sekundárne
• Podľa počtu indexovaných záznamov:– Hustý (dense) – všetky záznamy v indexe– Riedky (sparse) – len časť záznamov v indexe
![Page 50: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/50.jpg)
50
Primárny index
• Indexuje „usporiadavajúci“ (ordering) atribút
• Riedky (sparse) index
• „Kotviaci“ záznam
• INSERT problém
![Page 51: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/51.jpg)
51
Clustering index
• Aj nad „neusporia-davajúcim“ atribú-tom
• Primárna organizá-cia sa usporiada podľa daného atri-bútu – pri budovaní indexu
![Page 52: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/52.jpg)
52
Clustering index
• Pri bežnej práci sa primárna organizácia nemodifikuje, ale používajú sa overflow bloky
![Page 53: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/53.jpg)
53
Sekundárny index
• Index nad neusporiada-vajúcim atribútom (ale kľúčovým)
• Hustý (dense) index
![Page 54: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/54.jpg)
54
Sekundárny index
• Nad nekľúčovým atribútom (opakujúce sa hodnoty)
![Page 55: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/55.jpg)
55
Priebežné zhodnotenie
• Zatiaľ iba jednoúrovňové indexy• Prínos (N – počet záznamov, r – záznamov v bloku)
– Vyhľadávanie nad „ordered“ kľúčom – log2N
– Vyhľadávanie nad „non-ordered“ kľúčom – N/2– Vyhľadávanie nad nekľúčovým atribútom – N
– Primárny index log2(N/r)
– Sekundárny index log2N (počet čítaných blokov – podstatne
menší, kvôli vyššiemu blokovaciemu faktoru)
![Page 56: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/56.jpg)
56
Príklad – sekvenčný súbor (ordering attribute)
• Ordered file with r = 30,000 records • Block size B = 1024 bytes. • Records are of fixed size and are unspanned• Record length R = 100 bytes. • The blocking factor
bfr = floor(B/R) = floor(1024/100) = 10 records per block.
• The number of blocks b = (r/bfr) = r (30,000/1O)l = 3000 blocks.
• A binary search would need approximately – floor(log2 b) = floor(log2 3000) = 12 block accesses.
![Page 57: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/57.jpg)
57
Primárny index
• Na osvieženie pamäti
![Page 58: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/58.jpg)
58
Príklad – primárny index
• Key field of the file is V = 9 bytes long, a block pointer is P = 6 bytes
• size of index entry R = (9 + 6) = 15 bytes, blocking factor
bfri = floor(B/Ri ) = floor(1024/15) = 68 entries per block.
• The total number of index entries ri is equal to the number of blocks in the data file - 3000.
• The number of index blocks is hence bi = ceiling(r/bfri) = ceiling(3000/68) = 45 blocks.
• To perform a binary search on the index file would need ceiling (log2 bi)l = ceiling (log245) = 6 (block accesses).
• To search for a record using the index, we need one additional block access to the data file - total of 6 + 1 = 7 block accesses
![Page 59: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/59.jpg)
59
Príklad – sekundárny index
• As example 1: r = 30,000 ,R = 100 bytes, B = 1024 bytes.• To do a linear search, we would require
b/2 = 3000/2 = 1500 block accesses (on the average, 3000 in the worst case)
• Supppose V = 9 and P = 6 bfri = 68– secondary index is dense the total number of index entries ri
is equal to the number of records = 30,000.– The number of blocks needed for the index is
bi = ceiling(r/bfr) = 1(30,000/68) l = 442 blocks.– A binary search on this secondary index needs
ceiling(log2bi ) = ceiling (log2442) = 9 block accesses.
![Page 60: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/60.jpg)
60
Porovnanie (single-level) indexov
![Page 61: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/61.jpg)
61
Multi-Level Indexes
• Because a single-level index is an ordered file, we can create a primary index to the index itself ; in this case, the original index file is called the first-level index and the index to the index is called the second-level index.
• We can repeat the process, creating a third, fourth, ..., top level until all entries of the top level fit in one disk block
• A multi-level index can be created for any type of first-level index (primary, secondary, clustering) as long as the first-level index consists of more than one disk block
![Page 62: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/62.jpg)
62
Multilevel indexy
• Prvá úroveň - dense alebo sparse
• Ďalšie úrovne už iba sparse
• Top level – iba jeden blok
• Vyhľadávanie vyžaduje pribl. (logbfribi) „block accesses“
• INSERT problém !!!
![Page 63: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/63.jpg)
63
Dynamic Multilevel Indexes Using B-Trees and B+-Trees
• Because of the insertion and deletion problem, most multi-level indexes use B-tree or B+-tree data structures, which leave space in each tree node (disk block) to allow for new index entries
• These data structures are variations of search trees that allow efficient insertion and deletion of new search values.
• In B-Tree and B+-Tree data structures, each node corresponds to a disk block
• Each node is kept between half-full and completely full
![Page 64: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/64.jpg)
64
Dynamic Multilevel Indexes Using B-Trees and B+-Trees (contd.)
• An insertion into a node that is not full is quite efficient; if a node is full the insertion causes a split into two nodes
• Splitting may propagate to other tree levels
• A deletion is quite efficient if a node does not become less than half full
• If a deletion causes a node to become less than half full, it must be merged with neighboring nodes
![Page 65: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/65.jpg)
65
Difference between B-tree and B+-tree
• In a B-tree, pointers to data records exist at all levels of the tree
• In a B+-tree, all pointers to data records exists at the leaf-level nodes
• A B+-tree can have less levels (or higher capacity of search values) than the corresponding B-tree
![Page 66: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/66.jpg)
66
B-tree structure
![Page 67: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/67.jpg)
67
B+-tree structure
![Page 68: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/68.jpg)
68
B+-tree example
![Page 69: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/69.jpg)
69
B-tree example - numbers
![Page 70: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/70.jpg)
70
B+-tree example - numbers
![Page 71: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/71.jpg)
71
B-tree – duplicate keys
![Page 72: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/72.jpg)
72
Otázky
![Page 73: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/73.jpg)
Systémový katalóg
Na základe prezentácie
Ľubomíra Miškoviča
![Page 74: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/74.jpg)
74
Čo je systémový katalóg
• Systémový katalóg uchováva dáta ktoré popisujú každú databázu (metadata)
• Obsahuje popis:– Položiek, viet, súborov a vzťahov medzi nimi– Konceptuálnej schémy, externých schém a
internú schému. Je tu popísané aj mapovanie medzi schémami na rôznych úrovniach
![Page 75: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/75.jpg)
75
Zjednodušený model prostredia databázového systému
![Page 76: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/76.jpg)
76
Obsah systémového katalógu
• Katalógy pre relačné SRBD obsahujú – Názvy relácií – Názvy atribútov– Domény atribútov– Primárne kľúče– Sekundárne kľúčové atribúty– Cudzie kľúče– Podmienky
![Page 77: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/77.jpg)
77
Obsah systémového katalógu
• Ďalej obsahujú popisy– Externých pohľadov– Uloženie štruktúr a indexov pre internú úroveň– Informácie o bezpečnosti a autorizácií, ktoré
definujú prístup používateľa k databázovým pohľadom
– Prihlasovacie mená tvorcov alebo vlastníkov každej relácie
![Page 78: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/78.jpg)
78
Obsah systémového katalógu
• Uchovávajú informácie ako– Veľkosť záznamu – Aktuálny počet záznamov– Počet indexov– Meno tvorcu každej relácie
![Page 79: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/79.jpg)
79
Spôsoby implementácie systémového katalógu
• Systémový katalóg môže byť vytváraný pre každú databázu v systéme, alebo môže byť spoločný pre všetky databázy
• Systémový katalóg môže byť tvorený tabuľkami, ktorých štruktúra je totožná s tabuľkou databázy alebo špeciálnou štruktúrou
![Page 80: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/80.jpg)
80
Príklad systémových katalógov pre Informix
• Systables – opisuje každú tabuľku v databáze. Obsahuje jeden riadok pre každú tabuľku v databáze, pohľad alebo synonymum definované v databáze. Zahŕňa všetky tabuľky v databáze aj tabuľku systémového katalógu
• Syscolumns – definuje každý stĺpec v databáze. Pre každý stĺpec definovaný v tabuľke alebo pohľade existuje jeden riadok
• Sysindex – popisuje indexy v databáze. Obsahuje jeden riadok pre každý index definovaný v databáze
![Page 81: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/81.jpg)
81
Systables
![Page 82: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/82.jpg)
82
syscolumns
![Page 83: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/83.jpg)
83
Vzťah medzi tabuľkami
![Page 84: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/84.jpg)
84
Oracle
![Page 85: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/85.jpg)
85
Postgres
![Page 86: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/86.jpg)
86
Otázky
![Page 87: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/87.jpg)
Relačná algebra (RA) a implementácia operácií RA
Podľa [2]
![Page 88: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/88.jpg)
88
Relačná algebra
• Relácia - podmnožina karteziánskeho súčinu
R D1 ... Dn
• Relačná algebra:– Formálny jazyk pre relačný model– Základný súbor operácií pre vyhľadávacie
dotazy
![Page 89: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/89.jpg)
89
• Selekcia • Projekcia • Kartézsky súčin • Spojenie (join) (theta-, equi-, natural- )
• Množinové (union kompatibilné):– Prienik (intersection) – Zjednotenie (union)– Rozdiel (difference) \
Operácie relačnej algebry
![Page 90: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/90.jpg)
90
Elementary conditionEC and condition C
• Definition:Elementary (simple) condition EC is clause of the form:
<Attribute> <Operator> <Value>
where operator is from the set of relational operators {=,<,>,<=,>=,≠}.
• Definition: Condition C is clause of the form :
[NOT] EC1 [{OR | AND } [ [NOT] EC2] …]
![Page 91: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/91.jpg)
91
Examples
• (O1): SSN='123456789'(EMPLOYEE)
• (O2): DNUMBER>5(DEPARTMENT)
• (O3): DNO=5(EMPLOYEE)
• (O4):
DNO=5 AND SALARY>30000 AND SEX=' F' (EMPLOYEE)
• (O5):
ESSN='123456789' AND PNO=10 (WORKS_ON)
![Page 92: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/92.jpg)
92
SELECT operation
• Definition:c = { tiR | c(ti)} (3-value
logic)Implementation:
– Linear search– Binary search– Using a primary index (or hash key)– Using a primary index to retrieve multiple records– Using a clustering index to retrieve multiple records– Using a secondary (B+-tree) index on an equality
comparison– ...
![Page 93: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/93.jpg)
93
S1:Linear search (brute force)
Retrieve every record in the file, and test whether its attribute values satisfy the
selection condition.
for every ti
if (c(ti) == TRUE)
output(ti)
![Page 94: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/94.jpg)
94
S2:Binary search
If the selection condition involves an equality comparison on a key attribute on which the
file is ordered.
SSN='123456789'(EMPLOYEE)
![Page 95: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/95.jpg)
95
S3: Using a primary index (or hash key)
If the selection condition involves an equality comparison on a key attribute with a primary index (or hash key), use the primary index (or hash key) to retrieve the record. Note
that this condition retrieves a single record (at most).
SSN='123456789'(EMPLOYEE)
![Page 96: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/96.jpg)
96
S4: Using a primary index to retrieve multiple records
If the comparison condition is >, >=, <', or <= on a key field with a primary index, use the
index to find the record satisfying the corresponding condition
DNUMBER>5(DEPARTMENT) (selectivity, distribution)
DNO=5 AND SALARY>30000 AND SEX=' F' (EMPLOYEE)
![Page 97: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/97.jpg)
97
S5: Using a clustering index to retrieve multiple records
If the selection condition involves an equality comparison on a non (key attribute with a
clustering index for example, DNO = 5 in S3) use the index to retrieve all the records
satisfying the condition.
DNO=5(EMPLOYEE) (if clusterred on DNO)
![Page 98: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/98.jpg)
98
S6: Using a secondary (B+-tree) index on an equality comparison
This search method can be used to retrieve a single record if the indexing field is a key (has unique values) or to retrieve multiple
records if the indexing field is not a key. This can also be used for comparisons involving
>, >=, <, or <=.
![Page 99: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/99.jpg)
99
S7: Conjunctive selection using an individual index
If an attribute involved in any single simple condition in the conjunctive condition has an access path that permits the use of one of the Methods S2 (binary search) to S6 (B-
tree), use that condition to retrieve the records and then check whether each retrieved record satisfies the remaining
simple conditions in the conjunctive condition.
![Page 100: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/100.jpg)
100
S8:Conjunctive selection using a composite index
If two or more attributes are involved in equality conditions in the conjunctive
condition and a composite index (or hash structure) exists on the combined fields-for example, if an index has been created on the composite key (ESSN, PNO) of the WORKS_ON file for O5-we can use the
index directly.
![Page 101: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/101.jpg)
101
JOIN operation
• R ⋈c S = {tiR,tjS| c(ti,tj) == TRUE }
• Implementácia– Nested-loop join (brute force)– Single-loop join (using an access structure to
retrieve the matching records)– Sort-merge join– Hash-join
![Page 102: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/102.jpg)
102
J1. Nested-loop join (brute force)
For each record t in R (outer loop), retrieve every record s from S (inner loop) and test
whether the two records satisfy the join condition c (incl. theta-join).
for each ti
for each sj
if( c(ti,sj) == TRUE )
output(ti.sj)
Improvement - nested-block join
![Page 103: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/103.jpg)
103
J2. Single-loop join (using an access structure to retrieve the matching
records)If an index (or hash key) exists for one of the
two join attributes-say, B of S,
retrieve each record t in R, one at a time (single loop), and then use the access
structure to retrieve directly all matching records s from S that satisfy
t[B] =t[A] (equi-join).
![Page 104: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/104.jpg)
104
J3. Sort-merge join
If the records of R and S are physically sorted (ordered) by value of the join attributes A and B,
respectively, we can implement the join in the most efficient way possible.
Both files are scanned concurrently in order of the join attributes, matching the records that have the same values for A and B. If the files are not sorted, they may be sorted first by using external sorting.
![Page 105: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/105.jpg)
105
J4. Hash-join
• The records of files R and S are both hashed to the same hash file, using the same hashing function on the join attributes A of R and B of S as hash keys.
• First, a single pass through the file with fewer records (say, R) hashes its records to the hash file buckets (partitioning phase - records of R are partitioned into the hash buckets).
• In the second phase (probing phase), a single pass through the other file (S) then hashes each of its records to probe the appropriate bucket, and that record is combined with all matching records from R in that bucket.
![Page 106: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/106.jpg)
106
PROJECT operation
<attribute list>(R)
• Implementation:– straightforward to implement if <attribute list> includes
a key of relation R – the same number of records.– If <attribute list> does not include a key of R,
duplicate tuples must be eliminated (sorting, hashing).– Index can be used in some cases.
![Page 107: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/107.jpg)
107
SET operation
• CARTESIAN PRODUCT operation R S is quite expensive, because its result includes a record for each combination of records from R and S.
• Can be improved by processing at the block level
• UNION, INTERSECTION, and SET DIFFERENCE apply only to union-compatible relations (that have the same number of attributes and the same attribute domains).
• Implementation - sort-merge technique and hashing
![Page 108: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/108.jpg)
108
Sort-merge technique (for the SET operation)
• The two relations are sorted on the same attributes.
• After sorting, a single scan through each relation is sufficient to produce the result.
• For example, we can implement the UNION operation, R S, by scanning and merging both sorted files concurrently, and whenever the same tuple exists in both relations, only one is kept in the merged result.
• For the INTERSECTION operation, R S, we keep in the merged result only those tuples that appear in both relations.
![Page 109: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/109.jpg)
109
Hashing (for the SET operation)
• One table is partitioned and the other is used to probe the appropriate partition.
• For example, to implement R S, first hash (partition) the records of R; then, hash (probe) the records of S, but do not insert duplicate records in the buckets.
• To implement R S, first partition the records of R to the hash file. Then, while hashing each record of S, probe to check if an identical record from R is found in the bucket, and if so add the record to the result file.
• To implement R - S, first hash the records of R to the hash file buckets. While hashing (probing) each record of S, if an identical record is found in the bucket, remove that record from the bucket.
![Page 110: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/110.jpg)
110
Implementing Aggregate Operations
• The aggregate operators (MIN, MAX, COUNT, AVERAGE, SUM), when applied to an entire table, can be computed by a table scan or by using an appropriate index, if available.
• For example, consider the following SQL query:SELECT MAX(SALARY)FROM EMPLOYEE;
• If an (ascending) index on SALARY exists for the EMPLOYEE relation, then the optimizer can decide on using the index to search for the largest value by following the rightmost pointer in each index node from the root to the rightmost leaf.
![Page 111: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/111.jpg)
111
Implementing Aggregate Operations
• The dense index can be used for the COUNT, AVERAGE, and SUM aggregates.
• The associated computation would be applied to the values in the index.
![Page 112: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/112.jpg)
112
GROUP BY clause
• When a GROUP BY clause is used in a query, the aggregate operator must be applied separately to each group of tuples.
• In this case, the computation is more complex - the table must first be partitioned into subsets of tuples, where each partition (group) has the same value for the grouping attributes.
• Sorting or hashing are used to partition the file into the appropriate groups
• If a clustering index exists on the grouping attributes, then the records are already partitioned (grouped) into the appropriate subsets.
![Page 113: Systém riadenia bázy dát (Database Management System)](https://reader036.vdocuments.site/reader036/viewer/2022081501/568151a6550346895dbfd432/html5/thumbnails/113.jpg)
113
• Otázky