lsst database server osman aÏdel. plan infrastructure state of the art post ingestion procedure...
TRANSCRIPT
![Page 1: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/1.jpg)
LSST Database serverLSST Database server
Osman AÏDEL
![Page 2: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/2.jpg)
PlanPlan
Infrastructure State of the art Post ingestion Procedure Indexes
jeudi 20 avril 2023Osman AÏDEL 2
![Page 3: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/3.jpg)
InfrastructureInfrastructure
Serveur NEC - Express5800/120Rg-12 processors Intel(R) Xeon(R) CPU 5160 @ 3.00GHz (Dual core)
40GB RAM
2 SAS disks [15k tpm] of 300 GB (RAID 1)
QLogic 4Gb/s FC dual-port card
Pillar Axiom 6001 SAN Slammer
8 x Brick SATA v2 (13x2TB)
File systemExt4 : block size 4KB
12 To allocated in RAID5
jeudi 20 avril 2023Osman AÏDEL 3
![Page 4: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/4.jpg)
PlanPlan
Infrastructure State of the art Post ingestion Procedure Indexes
jeudi 20 avril 2023Osman AÏDEL 4
![Page 5: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/5.jpg)
State of the art State of the art
jeudi 20 avril 2023Osman AÏDEL 5
18 databases TOP 5 : SDSS DR7Survey Stripe 82
![Page 6: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/6.jpg)
State of the artState of the art
jeudi 20 avril 2023Osman AÏDEL 6
- MyISAM Engine- 87 columns- Single Primary key- 7 indexes (keys)
RunDeepForcedSource
- 2 billion rows- Row size : 395 B- Data size : around 710 GB- Index size : around 150 GB
![Page 7: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/7.jpg)
State of the artState of the art
jeudi 20 avril 2023Osman AÏDEL 7
![Page 8: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/8.jpg)
PlanPlan
Infrastructure State of the art Post ingestion Procedure Indexes
jeudi 20 avril 2023Osman AÏDEL 8
![Page 9: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/9.jpg)
Post ingestion procedurePost ingestion procedure
jeudi 20 avril 2023Osman AÏDEL 9
Total execution time 154 hours ( 6 days 10 hours)
https://dev.lsstcorp.org/trac/wiki/Summer2013/ConfigAndStackTestingPlans/DedupeForcedSources
![Page 10: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/10.jpg)
Post ingestion procedurePost ingestion procedure
Step 2 : enabling keys on RunDeepForcedSource
– Mysql needs to sort data before building the index
– Mysql has two sort algorithms for sorting• Key cache : recommended for small indexes• Filesort : recommended for huge indexes
– From Mysql client• Anyone can run it• Very difficult to force the filesort algorithm• Ìmpredictible switching from filesort to key cache algorithm
jeudi 20 avril 2023Osman AÏDEL 10
![Page 11: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/11.jpg)
Post ingestion procedurePost ingestion procedure
Step 2 : enabling keys on RunDeepForcedSource
– From Myisamchk command• Possibility to parallelize the index building• More flexible than Mysql client• Best performance in mono-thread with myisam_sort_buffer_size=2GB • A bug prevents to exceed 4GB on the sort_buffer_size• Local access is required• Table locked
– Consuming a lot of disk space ( 1 TB)
Step 6 – 7: Updating data on RunDeepForcedSource– Both steps were initialy merged into one step– Execution time VERY,VERY long > 4 days
jeudi 20 avril 2023Osman AÏDEL 11
![Page 12: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/12.jpg)
Post ingestion procedurePost ingestion procedure
Procedure only run at CC-IN2P3 :– Qserv not yet– At NSCA not yet
Suggestions :– Step 1 : Loadind data ( 40 hours)
• Create table RunDeepForcedSource without primary key• Alter table add primary key
– Step 2-3 could be grouped together– RunDeepForcedSource is, by default, a heap table it might be
interesting to sort data at the datafile level– Partitionning but on which column ?
jeudi 20 avril 2023Osman AÏDEL 12
![Page 13: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/13.jpg)
PlanPlan
Infrastructure State of the art Post ingestion Procedure indexes
jeudi 20 avril 2023Osman AÏDEL 13
![Page 14: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/14.jpg)
IndexesIndexes
jeudi 20 avril 2023Osman AÏDEL 14
id First Name
Last name
1 a a
3 z z
5 e e
8 r r
7 t t
2 y y
4 u u
6 i i
9 k k
0 p p
Select * from mytable where id =5;
Without index => full table scan Important cost => O(n)
![Page 15: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/15.jpg)
IndexesIndexes
jeudi 20 avril 2023Osman AÏDEL 15
id First Name
Last name
1 a a
3 z z
5 e e
8 r r
7 t t
2 y y
4 u u
6 i i
9 k k
0 p p
0
1
2
3
4
5
6
7
8
9
3
6
9
Create index idx on mytable(id);
Select * from mytable where id =5;
Low cost => 0(log)
![Page 16: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/16.jpg)
IndexesIndexes
jeudi 20 avril 2023Osman AÏDEL 16
id First Name
Last name
1 a a
3 z z
5 e e
8 r r
7 t t
2 y y
4 u u
6 i i
9 k k
0 p p
-> Optimizer based on COST-> Index selectivity = cardinality / Nb of rows-> example : Index selectivity = 1
Select * from mytable where id > 3
High Cost -> Full scan
0
1
2
3
4
5
6
7
8
9
3
6
9
![Page 17: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/17.jpg)
IndexesIndexes
jeudi 20 avril 2023Osman AÏDEL 17
![Page 18: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/18.jpg)
IndexesIndexes
jeudi 20 avril 2023Osman AÏDEL 18
Block size : 4Kb
datafile
![Page 19: LSST Database server Osman AÏDEL. Plan Infrastructure State of the art Post ingestion Procedure Indexes jeudi 22 octobre 2015Osman AÏDEL2](https://reader030.vdocuments.site/reader030/viewer/2022032414/56649ee75503460f94bf7b92/html5/thumbnails/19.jpg)
IndexesIndexes
jeudi 20 avril 2023Osman AÏDEL 19
Block size : 4Kb
datafile