english

1. Thesis presentation Yakham NDIAYE November, 13 the2001 Interoperability of a Scalable Distributed Data Manager with an Object-relational DBMS

Develop techniques for the interoperability of a DBMS with an external SDDS file.

Examine various architectural issues, making such a coupling the most efficient.

Validate our technical choices by the prototyping and the experimental performances analysis.

Our approach is at the crossing the main memory DBMS, the object-relational-DBMS with the foreign functions, and the distributed/parallel DBMS.

Objective 3.

Multicomputers

AMOS-II & DB2 DBMSs

Coupling SDDS and AMOS-II

Coupling SDDS and DB2

Experimental analysis

Conclusion

Plan 4. Multicomputers

A collection of loosely coupled computers

Computers inter-connected by high-speed local area networks.

Cost/Performance

offers potentially storage and processing capabilities rivaling a supercomputer at a fraction of the cost.

New architectural concepts

offer to applications the cumulated CPU and storage capabilities of a large number of inter-connected computers.

New data structures specifically for Multicomputers

Data arestructured

-records with keys

parallel scans & function shipping

Data are on servers

- waiting for access

Overflowing servers split into new servers

- appended to the file without informing the clients

Queries come from multiple autonomous clients

- Access initiators

-Not using any centralized directory for access computations

See for more : http://ceria.dauphine.fr

SDDS 6.

AMOS-II: A ctiveM ediatingO bjectS ystem

Amain memory database system .

Declarative query language :AMOSQL .

External data sources capability.

External program interfaces AMOS-II using :

- Call-level interface (call-in)

- Foreign functions (call-out)

See the AMOS-II page for more:

http://www.dis.uu.se/~udbl/

AMOS-II DBMS 7.

IBM object-relational DBMS

DB2 Universal Database.

Typical representative of a commercial relational-object DBMS .

Capabilities to handle external data through the user-defined functions (UDF) .

DB2 Universal Database 8. Coupling Strategies

AMOS-SDDS Strategy :

- for a scalable RAM file supporting database queries

- Use aDBMS for manipulations best handled through by the query language;

- Direct fast data access for manipulations not supported well, or at all, by a DBMS;

- Distributed queries processing with functions shipping .

9. AMOS-SDDS System AMOS-SDDS scalable parallel query processing 10. Coupling Strategies

SD-AMOS Strategy :

- UsesAMOS-IIas the memory manager at each SDDS storage site;

- Scalable generalization of a parallel DBMS ;

- D ata partitioning becomes dynamic .

11. SD-AMOS System SD- AMOS scalable parallel query processing 12. Couplage SDDS & DB2

DB2-SDDS Strategy :

- C oupling of a DBMS with an external data repository with direct fast data access .

- Use of a SDDS file by a DBMS like an external data repository.

- Offer to the user an interface more elaborate than that of SDDS manager, in particular by his query language.

13. Coupling SDDS & DB2 DB2-SDDSOverall Architecture Register a user-defined external table function: CREATE FUNCTION scan(Varchar(20)) RETURNS TABLE (ssn integer, name Varchar(20), city Varchar(20)) EXTERNAL NAME interface !fullscan' 14. Coupling SDDS & DB2Foreign functions to access SDDS records from DB2 : range (cleMin, cleMax) -> liste enregistrements dont cleMin < cl < cleMax scan( nom_fichier ) -> liste de tous les enregistrements du fichierSample queries : -Parallel scan All SDDS records. select * from table( scan(fichier) )as table_sdds(SSN, NAME,CITY) -Range query SDDS records where key between 1 and 100. select * from table( range(1, 100) )as table_sdds(SSN, NAME,CITY) order by Name 15.

Six Pentium III 700 MHz with 256 MB of RAM running Windows 2000

On a 100Mbit/s Ethernet network.

One site is used as Client and the five other as Servers

We run many servers at the same machine(up to 3 per machine) .

File scaled from 1 to 15 servers .

The Hardware 16.

Benchmark data :

Table Person (SS#, Name, City) .

Size 20,000 to 300,000 tuples of 25 bytes .

50 Cities.

Random distribution .

Benchmark query : couples of persons in the same city

Query 1,the file resides at a single AMOS-II.

Query 2,the file resides atAMOS-SDDS.

Join evaluation : Two strategies.

Measures :

-Speed-up & Scale-up

Processing time of aggregate functions

Benchmark queries 17. Server Query Processing

E-strategy

Data stay external to AMOS

within the SDDS bucket

Custom foreign functions perform the query

I-strategy

Data are dynamically imported into AMOS-II

Possibly with the local index creation

Deleted after the processing

Good for joins

AMOS performs the query

18. Speed-up Elapsed time of Query 2 according to the strategyfor a file of 20,000 records, distributed over1 to 5 servers. I-Strategy for Query 2: elapsed timeE-Strategy for Query 2: elapsed time Elapsed time per tuple of Query 2 according to the strategy Server nodes 1 2 3 4 5 Elapsed time(s) 1,344 681 468 358 288 Time per tuple (ms) 67.2 34 23.4 17.9 14.4 Serveur nodes 1 2 3 4 5 Nested-loop(s) 128 78 64 55 48 Index lookup(s) 60 39 37 36 32 19.

The results showed an important advantage ofI-StrategyonE-Strategyfor the evaluation of the join query.

For 5 servers, the rate is 6 times for the nested loop, and 9 times if an index is creates.

The favorable result makes us study the scale-up characteristics of AMOS-SDDS on a file that scales up to 300,000 tuples.

Discussion 20. Scaling the number of servers Elapsed time of join queries to AMOS-SDDS Q1=AMOS-SDDSjoin;Q2=AMOS-SDDSjoin with count. Time per tuple(extrapolated for AMOS-SDDS) File size 20,000 60,000 100,000 160,000 200,000 240,000 300,000 # SDDS servers 1 3 5 8 10 12 15 Q1(ms) 3.05 5.02 6.84 11.36 12.77 16.25 18.55 Q2(ms) 2.55 3.08 3.35 6.16 6.39 8.43 8.75 Q1w. extrap. (ms) 3.05 5.02 6.84 8.28 9.6 10.64 12.72 Q2w. extrap.(ms) 2.55 3.08 3.35 3.11 3.2 2.84 2.94 AMOS-II(ms) 2.30 7.17 12.01 19.41 24.12 2 9.08 36.44 21. Scaling the number of servers

Results are extrapolated to 1 server per machine.

-Basically, the CPU component of the elapsed time is divided by 3

The extrapolation of the processing time of the join query withcountshows a linear scalability of the system.

Processing time per tuple remainsconstant (2.94ms) when the file size and the number of servers increase by the same factor.

Expected time per tuple of join queries to AMOS-SDDS 22. Aggregate Functioncount Elapsed time of aggregate function CountElapsed times for AMOS-II = 280ms Elapsed time of aggregate functionsCountunder AMOS-SDDS Elapsed time over 100,000-tuple file on AMOS-SDDS # servers 1 2 3 4 5 E-Stratgie (ms) 10 10 10 10 10 I-Stratgie (ms) 1,462 761 511 440 341 23. Aggregate Functionmax Elapsed time of aggregate function MaxElapsed times for AMOS-II = 471ms Elapsed time over 100,000-tuple file on AMOS-SDDS Elapsed time of aggregate functionsMaxunder AMOS-SDDS #servers 1 2 3 4 5 I-Stratgie (ms) 420 210 140 110 90 I-Stratgie (ms) 1,663 831 561 491 390 24.

Contrary to the join query, the external strategy is gaining for the evaluation of aggregate functions.

Forcountfunction,improvement is about 34 times .

Formaxfunction,improvement is about 4 times .

Due to the importation cost and to a SDDS property : the current number of records is a parameter of a bucket.

LinearSpeed-up: processing time decreases with the number of servers.

The use of the external functions can thus be very advantageous for certain kind of operations.

Discussion 25. SD-AMOS performance measurements Creation time of 3,000,000 records file. The bucket size is 750,000 records of 100 bytes Global and moving average insertion time of a record 26. SD-AMOS performance measurements Elapsed time of range queryAverage time per tuple 27.

The average insertion time of a record with the splits is of 0.15ms .

The average access time to a record on a distributed file is of 0.12ms .

-Itis 100 times faster than that with a traditional file on disc .

Linear scalability: The insertion time and the access timeper tuple remains constant when the file size and the number of servers increase .

Discussion 28. DB2-SDDS performance measurements Elapsed time of range query Time per tuple (i) access time to the data in a DB2 table, (ii) access time to SDDS file from the DB2 external functions (DB2-SDDS) and (iii) direct access time to SDDS file from a SDDS client. 29.

Access time to SDDS file is much faster than the access time to a DB2 table: 0.02ms versus 0.07ms.

Access time to external data from DB2 (0.08ms), is less fast than the access to the internal data (0.07ms) .

Coupling cost

An application has :

-fastdirect access to the data

-through the DBMS, access by the query language

Discussion 30.

We have coupled a SDDS manager with a main-memory DBMS AMOS-II and DB2to improve the current technologies for high-performance databases and for the coupling with external data repositories.

The experiments we have reported in the Thesis prove the efficiency of the system.

AMOS-SDDS et DB2-SDDS :use of a SDDS file by a DBMS and the parallel query processing on the server sites .

SD-AMOS : appears as a scalable generalisation of a parallelmain-memoryDBMS where the data partitioning becomes automatic.

Conclusion 31.

Other types of DBMS queries.

Client's scalable distributed query decomposer.

challenging appears the design of a scalable distributed queryoptimizerhandling the dynamic data partitioning .

Future Work 32. End Thank You for Your Attention CERIA Universit Paris IX Dauphine[email_address]

english

Documents

external data

sdds elapsed

coupling sdds

elapsed time

processing

aggregate

access time

join queries