english
DESCRIPTION
TRANSCRIPT
- 1. Thesis presentation Yakham NDIAYE November, 13 the2001 Interoperability of a Scalable Distributed Data Manager with an Object-relational DBMS
2.
- Develop techniques for the interoperability of a DBMS with an external SDDS file.
- Examine various architectural issues, making such a coupling the most efficient.
- Validate our technical choices by the prototyping and the experimental performances analysis.
- Our approach is at the crossing the main memory DBMS, the object-relational-DBMS with the foreign functions, and the distributed/parallel DBMS.
Objective 3.
- Multicomputers
- SDDSs
- AMOS-II & DB2 DBMSs
- Coupling SDDS and AMOS-II
- Coupling SDDS and DB2
- Experimental analysis
- Conclusion
Plan 4. Multicomputers
- A collection of loosely coupled computers
-
- Computers inter-connected by high-speed local area networks.
- Cost/Performance
-
- offers potentially storage and processing capabilities rivaling a supercomputer at a fraction of the cost.
- New architectural concepts
-
- offer to applications the cumulated CPU and storage capabilities of a large number of inter-connected computers.
5.
- New data structures specifically for Multicomputers
- Data arestructured
-
- -records with keys
-
-
- parallel scans & function shipping
-
- Data are on servers
-
- - waiting for access
- Overflowing servers split into new servers
-
- - appended to the file without informing the clients
- Queries come from multiple autonomous clients
-
- - Access initiators
-
- -Not using any centralized directory for access computations
- See for more : http://ceria.dauphine.fr
SDDS 6.
- AMOS-II: A ctiveM ediatingO bjectS ystem
- Amain memory database system .
- Declarative query language :AMOSQL .
- External data sources capability.
- External program interfaces AMOS-II using :
-
- - Call-level interface (call-in)
-
- - Foreign functions (call-out)
- See the AMOS-II page for more:
- http://www.dis.uu.se/~udbl/
AMOS-II DBMS 7.
- IBM object-relational DBMS
- DB2 Universal Database.
- Typical representative of a commercial relational-object DBMS .
- Capabilities to handle external data through the user-defined functions (UDF) .
DB2 Universal Database 8. Coupling Strategies
- AMOS-SDDS Strategy :
- - for a scalable RAM file supporting database queries
- - Use aDBMS for manipulations best handled through by the query language;
- - Direct fast data access for manipulations not supported well, or at all, by a DBMS;
- - Distributed queries processing with functions shipping .
9. AMOS-SDDS System AMOS-SDDS scalable parallel query processing 10. Coupling Strategies
- SD-AMOS Strategy :
- - UsesAMOS-IIas the memory manager at each SDDS storage site;
- - Scalable generalization of a parallel DBMS ;
- - D ata partitioning becomes dynamic .
11. SD-AMOS System SD- AMOS scalable parallel query processing 12. Couplage SDDS & DB2
- DB2-SDDS Strategy :
- - C oupling of a DBMS with an external data repository with direct fast data access .
- - Use of a SDDS file by a DBMS like an external data repository.
- - Offer to the user an interface more elaborate than that of SDDS manager, in particular by his query language.
13. Coupling SDDS & DB2 DB2-SDDSOverall Architecture Register a user-defined external table function: CREATE FUNCTION scan(Varchar(20)) RETURNS TABLE (ssn integer, name Varchar(20), city Varchar(20)) EXTERNAL NAME interface !fullscan' 14. Coupling SDDS & DB2Foreign functions to access SDDS records from DB2 : range (cleMin, cleMax) -> liste enregistrements dont cleMin < cl < cleMax scan( nom_fichier ) -> liste de tous les enregistrements du fichierSample queries : -Parallel scan All SDDS records. select * from table( scan(fichier) )as table_sdds(SSN, NAME,CITY) -Range query SDDS records where key between 1 and 100. select * from table( range(1, 100) )as table_sdds(SSN, NAME,CITY) order by Name 15.
- Six Pentium III 700 MHz with 256 MB of RAM running Windows 2000
- On a 100Mbit/s Ethernet network.
- One site is used as Client and the five other as Servers
- We run many servers at the same machine(up to 3 per machine) .
- File scaled from 1 to 15 servers .
The Hardware 16.
- Benchmark data :
-
- Table Person (SS#, Name, City) .
-
- Size 20,000 to 300,000 tuples of 25 bytes .
-
- 50 Cities.
-
- Random distribution .
- Benchmark query : couples of persons in the same city
-
- Query 1,the file resides at a single AMOS-II.
-
- Query 2,the file resides atAMOS-SDDS.
-
- Join evaluation : Two strategies.
- Measures :
-
- -Speed-up & Scale-up
- Processing time of aggregate functions
Benchmark queries 17. Server Query Processing
- E-strategy
-
- Data stay external to AMOS
-
- within the SDDS bucket
-
- Custom foreign functions perform the query
- I-strategy
-
- Data are dynamically imported into AMOS-II
-
-
- Possibly with the local index creation
-
-
-
- Deleted after the processing
-
-
-
- Good for joins
-
-
- AMOS performs the query
18. Speed-up Elapsed time of Query 2 according to the strategyfor a file of 20,000 records, distributed over1 to 5 servers. I-Strategy for Query 2: elapsed timeE-Strategy for Query 2: elapsed time Elapsed time per tuple of Query 2 according to the strategy Server nodes 1 2 3 4 5 Elapsed time(s) 1,344 681 468 358 288 Time per tuple (ms) 67.2 34 23.4 17.9 14.4 Serveur nodes 1 2 3 4 5 Nested-loop(s) 128 78 64 55 48 Index lookup(s) 60 39 37 36 32 19.
- The results showed an important advantage ofI-StrategyonE-Strategyfor the evaluation of the join query.
- For 5 servers, the rate is 6 times for the nested loop, and 9 times if an index is creates.
- The favorable result makes us study the scale-up characteristics of AMOS-SDDS on a file that scales up to 300,000 tuples.
Discussion 20. Scaling the number of servers Elapsed time of join queries to AMOS-SDDS Q1=AMOS-SDDSjoin;Q2=AMOS-SDDSjoin with count. Time per tuple(extrapolated for AMOS-SDDS) File size 20,000 60,000 100,000 160,000 200,000 240,000 300,000 # SDDS servers 1 3 5 8 10 12 15 Q1(ms) 3.05 5.02 6.84 11.36 12.77 16.25 18.55 Q2(ms) 2.55 3.08 3.35 6.16 6.39 8.43 8.75 Q1w. extrap. (ms) 3.05 5.02 6.84 8.28 9.6 10.64 12.72 Q2w. extrap.(ms) 2.55 3.08 3.35 3.11 3.2 2.84 2.94 AMOS-II(ms) 2.30 7.17 12.01 19.41 24.12 2 9.08 36.44 21. Scaling the number of servers
- Results are extrapolated to 1 server per machine.
- -Basically, the CPU component of the elapsed time is divided by 3
- The extrapolation of the processing time of the join query withcountshows a linear scalability of the system.
- Processing time per tuple remainsconstant (2.94ms) when the file size and the number of servers increase by the same factor.
Expected time per tuple of join queries to AMOS-SDDS 22. Aggregate Functioncount Elapsed time of aggregate function CountElapsed times for AMOS-II = 280ms Elapsed time of aggregate functionsCountunder AMOS-SDDS Elapsed time over 100,000-tuple file on AMOS-SDDS # servers 1 2 3 4 5 E-Stratgie (ms) 10 10 10 10 10 I-Stratgie (ms) 1,462 761 511 440 341 23. Aggregate Functionmax Elapsed time of aggregate function MaxElapsed times for AMOS-II = 471ms Elapsed time over 100,000-tuple file on AMOS-SDDS Elapsed time of aggregate functionsMaxunder AMOS-SDDS #servers 1 2 3 4 5 I-Stratgie (ms) 420 210 140 110 90 I-Stratgie (ms) 1,663 831 561 491 390 24.
- Contrary to the join query, the external strategy is gaining for the evaluation of aggregate functions.
- Forcountfunction,improvement is about 34 times .
- Formaxfunction,improvement is about 4 times .
- Due to the importation cost and to a SDDS property : the current number of records is a parameter of a bucket.
- LinearSpeed-up: processing time decreases with the number of servers.
- The use of the external functions can thus be very advantageous for certain kind of operations.
Discussion 25. SD-AMOS performance measurements Creation time of 3,000,000 records file. The bucket size is 750,000 records of 100 bytes Global and moving average insertion time of a record 26. SD-AMOS performance measurements Elapsed time of range queryAverage time per tuple 27.
- The average insertion time of a record with the splits is of 0.15ms .
- The average access time to a record on a distributed file is of 0.12ms .
- -Itis 100 times faster than that with a traditional file on disc .
- Linear scalability: The insertion time and the access timeper tuple remains constant when the file size and the number of servers increase .
Discussion 28. DB2-SDDS performance measurements Elapsed time of range query Time per tuple (i) access time to the data in a DB2 table, (ii) access time to SDDS file from the DB2 external functions (DB2-SDDS) and (iii) direct access time to SDDS file from a SDDS client. 29.
- Access time to SDDS file is much faster than the access time to a DB2 table: 0.02ms versus 0.07ms.
- Access time to external data from DB2 (0.08ms), is less fast than the access to the internal data (0.07ms) .
- Coupling cost
- An application has :
-
- -fastdirect access to the data
-
- -through the DBMS, access by the query language
Discussion 30.
- We have coupled a SDDS manager with a main-memory DBMS AMOS-II and DB2to improve the current technologies for high-performance databases and for the coupling with external data repositories.
- The experiments we have reported in the Thesis prove the efficiency of the system.
- AMOS-SDDS et DB2-SDDS :use of a SDDS file by a DBMS and the parallel query processing on the server sites .
- SD-AMOS : appears as a scalable generalisation of a parallelmain-memoryDBMS where the data partitioning becomes automatic.
Conclusion 31.
- Other types of DBMS queries.
- Client's scalable distributed query decomposer.
- challenging appears the design of a scalable distributed queryoptimizerhandling the dynamic data partitioning .
Future Work 32. End Thank You for Your Attention CERIA Universit Paris IX Dauphine[email_address]