session: e03 db2 performance update
Post on 25-Apr-2022
10 Views
Preview:
TRANSCRIPT
1
May 19, 2008 • 1:30 p.m. – 2:30 p.m.Platform: Linux, UNIX, Windows
Berni SchieferIBM Toronto Lab
Session: E03
DB2 Performance Update
22
2
Agenda• Basics• Benchmarks• Performance Proof Points• The great new stuff ….• Summary
33
3
Basics – Platforms/OS• The basic fundamentals haven’t changed• You still want/need a balanced (I/O, Memory, CPU)
configuration• We recommend 4GB-8GB RAM / core• 6-20 disks per core where feasible
• Use recommended generally available 64-bit OS • Applies to Linux, Windows, AIX, Solaris, HP-UX
• e.g. AIX 5.3 TL07, SLES10 SP1, RHEL5.2 etc • All performance measurements/assumptions are with a 64-bit
DB2 server• Clients can be 32-bit or 64-bit or mixed
• Even LOCAL clients
44
4
Basics - Storage• Disk spindles still matter
• With sophisticated storage subsystems and storage virtualization it just requires more sleuthing than ever to find them
• Drives keep getting bigger, 146GB now the norm• Be leery of Storage Administrators that tell you
• “Don’t worry, it doesn’t matter”• “The cache will take care of it”
• Make the Storage Administrator your best friend!• Take them out for lunch/dinner, whatever it takes!
55
5
Benchmarks• DB2 is THE performance leader
• for OLTP and Data Warehousing
6
6
TPC-H result on IBM Balanced Warehouse E7100
343,551
208,457
63,651
0
60000
120000
180000
240000
300000
360000
Qph
HIBM System p6 570 and DB2 9.5 create top 10TB TPC-H
performance
IBM p6 570/DB2 9.5HP Integrity Superdome-DC Itanium 2/Oracle 11gHP Integrity Superdome / SQL Server 2008
TPC Benchmark, TPC-H, QphH, are trademarks of the Transaction Processing Performance Council. For further TPC-related information, please see http://www.tpc.org.
DB2 9.5 on IBM System p6 570, (128 core POWER6 4.7GHz), 343551 QphH@10000GB, 32.89 US $ per QphH@10000GB available: April 15, 2008Oracle 11g Enterprise Ed w/ Partitioning on HP Integrity Superdome-DC Itanium 2, HP-UX 11i v3 64 bit (128 core Intel Itanium 2 1.6 GHz), 208457 QphH@10000GB, 27.97 US $ per QphH@10000GB available: September 10, 2008 SQL Server 2005 on HP Integrity Superdome-DC Itanium 2, Windows (64 core Intel Itanium 2 1.6GHz): 63651QphH @38.54 US $ per QphH@10000GB available: August 30, 2008
Latest POWER6 hardware combined with DB2 9.5 and DS4800 storage produce outstanding data warehouse performance
Delivers 1.65x faster performance than best Oracle result
Loaded 10TB data @ 6TB / hour (incl. data load, index creation, runstats)
Results as of 2008/03/24
TPC-H is a benchmark that uses DSS type queries, it is the best candidate to measure database warehouse performanceDB2 9.5 provides a significant proof-point for the new IBM Balanced Warehouse E7100Delivers 2-3x performance than existing Oracle results10TB database built in just 1h40m with DB2 9.5 (compared to Oracle/Sun 18h13m and Oracle/HP 5h51m) on DS4800 storage
7
7
516,752
407,079
75000
135000
195000
255000
315000
375000
435000
495000
tpm
CTPC-C performance comparison on 4 processor
Intel Xeon 7350
IBM x3850/DB2 9.5 HP DL580/SQL Server 2005
TPC Benchmark, TPC-C, tpmC, are trademarks of the Transaction Processing Performance Council. For further TPC-related information, please see http://www.tpc.org.
DB2 9.5 on IBM System x3850, Red Hat Enterprise Linux Advanced Platform (4-way Intel Quad Core Xeon 7350 2.93 GHz): 516,752 tpmC @ $2.59/tpmC available: April 15, 2008SQL Server 2005 on HP DL580G5, Microsoft Windows Server 2003 Enterprise x64 (4-way Intel Quad Core Xeon 7350 2.93GHz): 407,079 tpmC @$1.71/tpmC available: September 5, 2007
Latest System x server (x3850 M2) combined with DB2 9.5 and Red Hat Enterprise Linux delivers outstanding OLTP performance
First data server to cross the half-million tpmC ceiling with 4 processors
With about 1.1 Billion web users in the world, the performance delivered in this benchmark would handle purchase and delivery of items to all these web users every 4 days
Results as of 2008/03/24
TPC-C result on IBMSystem x3850 M2 with Linux
TPC-C is a benchmark emulating an OLTP workload, DB2 is the TPC-C leader. This chart shows the relative performance on 4-processorsIBM System x and DB2 9.5 beat SQL Server by 27% on Red HatDB2 9.5 has a good relationship with Red HatOracle numbers not available
8
8
629,159
372,140 371,044
0100000200000300000400000500000600000700000
tpm
CIBM System p 570 and DB2 9 leader on SAP R/3 2-tier SD
IBM System 550/DB2 9.5HP Integrity rx6600 Itanium 2 9050 DC, 1.6GHzIBM System p 570 1.9GHz POWER5
DB2 9.5 on System p550 takes industry leadership in 8 core TPC-C benchmark.
Demonstrates excellent performance of DB2 and POWER6 with AIX 5L
Demonstrates superior per core performance for DB2 9.5 on POWER6 processors
TPC-C on IBM System p550 and DB2 9.5/AIX 5.3
Results as of 2008/03/24
$5.26371,0447/12/049/30/04
Oracle 10g, AIX 5.34/8/16IBM System p 570 1.9GHz POWER5
$1.81372,1406/11/076/11/07
SQL Server 2005, Windows 20034/8/16HP Integrity rx6600 Itanium 2 9050 DC, 1.6GHz
$2.49629,159 3/20/084/20/08
DB2 9.5, AIX 5.34/8/16IBM System p 550 4.2GHz POWER6
$/tpmCtpmCSubmitted/Available
SoftwareProcessors/Cores/ThreadsConfiguration
Also the leader on SAP R/3SAP SD benchmarks are sales and distribution benchmarks designed to test the performance of database components and SAP applicationsDB2 9 outperforms SQL Server and Oracle once again …
IBM System p 570, 8 processors / 16 cores / 32 threads, POWER6 4.7 GHz, 128 KB L1 cache and 4 MB L2 cache per core, 32 MB L3 cache per processor, 8000 benchmark users, AIX 5.3, DB2 9, available: May 2007HP ProLiant DL580 G5, 4 processors / 16 cores / 16 threads, Quad-Core Intel Xeon Processor X7350 2.93 GHz, 64 KB L1 cache per core and 4 MB L2 cache per 2 cores, 3705 benchmark users, Windows Server 2003 Enterprise Edition , SQL Server 2005, available: Sept 2007HP Integrity rx8620, 16-way XMP, Intel Itanium 2 1.5 GHz, 32 KB L1 cache, 256 KB L2 cache, 6 MB L3 cache per processor, 2880 benchmark users, HP-UX 11i, Oracle 9i, available: Dec, 2003
99
9
1,616,162
520,467
254,471150000
350000
550000
750000
950000
1150000
1350000
1550000
tpm
CDB2 9 Top TPC-C Performer among Data
server vendors on 8 Processors
DB2 9 SQL Server 2005 Oracle 10g
Higher is Better
TPC Benchmark, TPC-C, tpmC, are trademarks of the Transaction Processing Performance Council. For further TPC-related information, please see http://www.tpc.org.
DB2 9 on IBM System p570, IBM AIX 5L V5.3 (8 P 16 C 4.7 GHz POWER6 ): 1,616,162 tpmC @ $3.54/tpmC available: November 21, 2007SQL Server 2005 on Unisys ES7000, Microsoft Windows Server 2003 Enterprise x64 Edition (8 P, 16 C Intel Dual Core Xeon MP 3.4 GHz ): 520,467 tpmC @ $2.73/tpmC available: May 1, 2007Oracle 10g on NEC Express5800, Red Hat Enterprise Linux AS 4.0 (8 P, 8C Intel Itanium2 1.6GHz): 254,471 tpmC @ $5.32/tpmC available: February 17, 2006Oracle 10g on HP Integrity rx6600, HP-UX 11i v2 64 bit (2P, 4C Intel Itanium2 1.6GHz): 230,569 tpmC @ $2.63/tpmC available December 1, 2006SQL Server 2000 on HP Proliant ML350G4p, Microsoft Windows Server 2003 Enterprise Edition (1 P, 1C Intel Xeon 3.4GHz): 42,432 tpmC @ $1.96/tpmC available March 29, 2005
101,010
57,642
42,432
20000
40000
60000
80000
100000
120000
tpm
C p
er c
ore
DB2 9 Best TPC-C performance per CPU/Core among Data servers
DB2 9 SQL Server Oracle 10g
Top Performer on POWER6
Results as of 2008/03/24
DB2 9 has the best TPC-C number on 8 processors – 3.1x better than SQL Server and an amazing 6.3x better performing than Oracle 10gDB2 9 also has the best TPC-C performance per core among the other competitors. 2-2.5 more TPC-C per core than the competitors this means better TCO with DB2
10
10
SPECjAppServer 2004 World Record
14004
10519
0
5000
10000
15000
JOPS
@St
anda
rdDB2 9.5 has best SPECjAppServer 2004 results
40-core System p5-595 / DB2 9.564-core HP Superdome/Oracle 10g
SPEC and the benchmark name SPECjAppServer 2004 are registered trademarks of the Standard Performance Evaluation Corporation. For the latest SPECjAppServer 2004 benchmark results, visit http://www.spec.org/.
DB2 9.5 has 1/3 more performance with ½ the number of cores!
Results as of 2008/03/24
Ilustrates advantage of combining DB2 with Websphere
SPECjAppServer is the only official multi-tier end-to-end performance benchmark for J2EE technologies. It emulates information flow among an automotive dealership, manufacturing, supply chain management, and an order/inventory system.
DB2 was the first to publish on every version of SPECjAppServer benchmark!Only DB2 has published with single-database (i.e. non-XA) and multi-database (i.e. XA 2PC) results. Others were all single-database.DB2 9 also has the best performance per coreOnly DB2 has leading results with both WebSphere and WebLogic on IBM and non-IBM platforms
Sun/WLS/DB2 – 8253.21 SPECjAppserver2004 JOPS@Standard – WLS 10 on Sun Blade 6000 10x 8 cores T6300 UltraSPARC T1 1.4GHz running Solaris 10 8/07, and DB2 9 on 48 cores Sun E6900 UltraSPARC IV+ 1.95GHz HP/WLS/Oracle – 7629.45 SPECjAppServer2004 JOPS@Standard – WLS 9.2 on 6x 8 cores IA64 rx6600 1.6GHz running HP-UX 11iv3, and Oracle 10g EE 10.2.0.2 on HP Superdome 64x1.6GHz 256GB RAM running HP-UX 11v3IBM/WAS/DB2 - 4368 SPECjAppServer2004 JOPS@Standard – WAS 6.1 on xSeries Blade Center with 20 HS20 on SLES 9 40 cores, and DB2 9 on p5-570 POWER5+ 1.9GHz 16 cores 128GB ram AIX 5.3Sun/WLS/ORA – 4099 SPECjAppServer2004 JOPS@Standard , WLS 9.0 on Sun T2000 cluster 7x8 core Solaris 10, Oracle 10g EE on E6900 UltraSPARC IV+ 40x1.5Ghz, Solaris 10
1111
11
TPoX performance with DB2 9.5
DB
Customers
BrokerageHouse DB
Customers
BrokerageHouse
For more information on TPoX please visit tpox.sourceforge.net
400000050000006000000700000080000009000000
100000001100000012000000
txns
/sec
15M orderinserts
3M custaccinserts
21K securityinserts
Full DocumentReplacement
TPoX Throughput with DB2 9 and DB2 9.5
DB2 9 DB2 9.5
Transaction Processing over XML (TPoX) is an open source application-level XML database benchmark based on a financial application scenario
DB2 9.5 yields 10%-54% throughput improvement over DB2 9 for TPoXinserts and full document replacement
TPoX Schema consists of an order, security, holding and customer account tables. There are 15M documents in the order table, 3M documents in the Customer account table and 21K documents in the security table.
Each customer has one or multiple accounts. Each account has one or multiple holdings. A holding is a certain number of shares of a security. A security can be a stock, a bond or a mutual fund. Customers place orders to buy or sell securities for their account(s).
In DB2 9 there were no subdocument updates, you could only replace an entire document. Replacing full documents has improved by ~54% with DB2 9.5
12
12
TPoX performance on Intel
1 97%
1.30
56%
1.90
82%
2.20
90%
0.00
0.50
1.00
1.50
2.00
2.50
Intel Tulsa DB2 9, 16GB Intel Tigerton DB2 9.5,16GB
Intel Tigerton DB2 9.5,Compression and In-
lining, 16GB
Intel Tigerton DB2 9.5Compression and In-
lining, 32GB
Relative TPoX performance and CPU utilization using DB2 9.5
Throughput ImprovementCPU utilization
DB2 9 to DB2 9.5 size reduction of
67%
With the new Quad-Core Intel Tigerton processors –2.2x TpoX throughput on DB2 9.5 with new XML features (in-lining and compression)
275781Database size (GB)
Compressed DB2 9.5
DB2 9.5DB2 9
Tulsa processor - 4 Socket Dual-Core Intel Xeon processor 7100 seriesTigerton processor - 4 Socket Quad-Core Intel Xeon processor 7300 series
The first bar is the baseline results on DB2 9 with Tulsa processorsWith the Tigerton processors, there was a 30% improvement in throughput moving to quad core and ~ 44% idle CPU since the system was IO boundApplying the new XML features in DB2 9.5 (in-lining and compression) allowed us to increase throughput by 1.9 x and improve CPU utilization to ~82%. Since IO costs were reduced by compression we were able to drive more throughput at the cost of more user CPU.Doubling the memory on Tigerton, allowed us to achieve 2.2x throughput with DB29.5 since there was even less IO and hence we were able to drive more CPU utilization
The storage space also reduced by 67% which results in disk savings $
Common storageDS4800 with 78 disks RAID5
Equivalent OS levelSLES 10 64bit SP1
Intel Tulsa system referred to in slides16 GB of memoryDB2 9Fastest tulsa is 3.5Ghz
13
13
DB2 / SAP AMD Opteron Virtualization performance on VMWare ESX 3.0.1
445
350
200
300
400
500
SD u
sers
DB2 9 Virtualization capabilities outperforms SQL Server 2005
2 VCPU IBM System x3755, AMD Opteron 8220 SE 2.8GHz2 VCPU Dell PowerEdge 6950 AMD Opteron 8220 SE 2.8GHz
Virtualization enables superior efficiency allowing you to maximize use of unused server capacity and hardware resources with the least overhead
DB2 on VMWare ESX provides an effective and scalable production ready platform for hosting multiple virtualized transaction processing workloads
Self-Tuning Memory Manager allows DB2 to automatically adapt in dynamic resource allocation environments.
DB2 also offers automatic storage which enables storage virtualization
For SAP Benchmark and related information please see http://www/sap.com/benchmark
For information on DB2 scalability on VMWare see http://www.vmware.com/pdf/db2_scalability_wp_vi3.pdf
Virtualization makes it possible to run multiple operating systems and multiple applications on the same computer at the same time, increasing the utilization and flexibility of hardware. VMware is the most widely deployed software for optimizing and managing IT environments through virtualizationDB2 on VMWare provides an effective production ready platform for hosting multiple virtualized transaction processing workloads
DB2 9 has the most number of SD (SALES and Distribution) users - ~30% more than SQL ServerSame processors in both cases, DB2 has higher performance by ~30%
There are more automatic parameters now in DB2 9.5 Db cfgDatabase heap (4KB) (DBHEAP) = AUTOMATICSQL statement heap (4KB) (STMTHEAP) = AUTOMATICDefault application heap (4KB) (APPLHEAPSZ) = AUTOMATICApplication Memory Size (4KB) (APPL_MEMORY) = AUTOMATICStatistics heap size (4KB) (STAT_HEAP_SZ) = AUTOMATIC
Dbm cfg
14
14
DB2 & AIX LPAR Mobility• DB2 9.5 on AIX 5.3• 2 CPU Dedicated LPAR• 14 GB RAM• Virtual IO Server
151
101
151
201
251
301
351
401
451
501
551
601
651
701
751
801
851
901
S1
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
tpm
C
Interval
Partition Migration Impact on OLTP
Start of LPAR Migration End of LPAR Migration
Challenge:To migrate an LPAR with DB2 9.5 running a TPC-C likeworkload from one physical host to another.
Outcome:During the memory copy portion ofthe migration, an 18% degradationin throughput was observed.
At the final switch-over phase, throughput stopped for a few seconds.
Immediately after the migration completed, DB2 9.5 was running at peak performance!
At 14GB RAM, the entire migration tookabout 5 minutes.
1515
15
The Great New Stuff
• When you think about the new features …• As always
• “It depends”• We don’t know everything (yet)• Your mileage will vary• Please tell us what you think!
1616
16
Today’s UNIX/Linux Architecture
Data DisksLog Disks
Common
Client
UDB Client Library
Active
Subagentsdb2agntp
Write Log Requests
Victim
Noti
ficati
ons
Parallel, Page
Write Requests
UDB ServerShared Mem & Semaphores, TCPIP, Named Pipes,…
Each circle is an OS process
ListenersInstance Level
Idle Agent Pool
Per-instance
Idle, pooled agent or subagent
db2tcpcm db2ipccmdb2agent (idle)
CoordinatorAgents
Per-application
db2agent
db2pclnr
db2pfchr
db2loggw db2dlock
db2agntp
db2loggr
Per-database
Prefetchers
PageCleaners
Buffer Pool(s)
DeadlockDetector
LoggingSubsystem
Log Buffer
Database Level
Idle
Async IO Prefetch Requests
Parallel, Big-block,
Read Requests
1717
17
New UNIX/Linux Architecture
Data DisksLog Disks
Common
Client
UDB Client Library
Active
Subagentsdb2agntp
Write Log Requests
Victim
Noti
ficati
ons
Parallel, Page
Write Requests
UDB ServerShared Mem & Semaphores, TCPIP, Named Pipes,…
Each circle is an OS thread
ListenersInstance Level
Idle Agent Pool
Per-instance
Idle, pooled agent or subagent
db2tcpcm db2ipccmdb2agent (idle)
CoordinatorAgents
Per-application
db2agent
db2pclnr
db2pfchr
db2loggw db2dlock
db2agntp
db2loggr
Per-database
Prefetchers
PageCleaners
Buffer Pool(s)
DeadlockDetector
LoggingSubsystem
Log Buffer
Database Level
Idle
Async IO Prefetch Requests
Parallel, Big-block,
Read Requests
Single, multi-thread process
1818
18
Performance Advantages of the Threaded Architecture on UNIX/Linux
• Context switching between threads is generally faster than between processes• No need to switch address space• Less cache “pollution”
• Operating system threads require less context than processes• Share address space, context information (such as uid, file
handle table, etc)• Memory savings
• Significantly fewer system file descriptors used• All threads in a process can share the same file descriptors• No need to have each agent maintain its own file descriptor table
1919
19
Performance characteristics of threaded architecture
0%
20%
40%
60%
80%
100%
120%
Rel
ativ
e th
roug
hput
on
Linu
x x6
4Relative performance on Linux x64 with threaded DB2 9.5
DB2 9DB2 9.5
0
0.5
1
1.5
2
2.5
3Pe
r-age
nt M
emor
y Fo
otpr
int (
MB
) -lo
wer
is
bet
ter
Linux x64 AIX
Decrease in Agent Memory Footprint with DB2 9.5
DB2 9DB2 9.5
Savings of up to 1 MB per agent due to new threaded architecture
Increased throughput by 14 % on Linux x64 internal OLTP workload
With the new threaded architecture, throughput increased by 14% on Linux x64 and the agent footprint decreased by 1MB. On AIX we also see ~0.6MB decrease in footprint
2020
20
Tuning hints/tips
• Be current on your OS maintenance• Use large pages where feasible
• 64K pages selected automatically on AIX• Ensure the resource limits assigned to the few DB2
processes are “unlimited”• Set the NUM_IOCLEANERS configuration parameter
to automatic• It uses the # of CPUs as a key factor • Don’t want to have too many cleaners
2121
21
XML Enhancements in DB2 9.5• Base Table In-lining (BTI) and Compression
• Store small XML docs in the XML column in the base table - no .xda storage needed
• In-lined documents can be compressed• XML Load
• For bulk inserts of XML documents• Faster Insert with XML Schema Validation
• Up to 5x faster than DB2 9• XML Update
• Based on XQuery Update Facility, a standardized extension to XQuery - allows you to modify, insert, or delete individual elements and attributes
• 2-3x faster than stored procedure approach in DB2 9• Extensive path length reduction and optimizer
improvements
Also§Instant compatible schema evolution -> schema evolution is the XML
schema change (this is how i understand it: e.g., your schema changed and you saved it under the same name; then you can still easily validate your documents using this schema with the same validation statement as before; this is of course if the schema became less restrictive; if it became more restrictive you have to save it under a different name and only validate the new documents with it; i.e., the ones that conform to it;
Enabling existing customers–Non-Unicode, Offline Load, Replication, FederationRicher tool support: -> this is great and we did not mention it; however, i am not sure how important it is for a DB2 performance presentation (it is in a way performance-related too, because it makes the user's work faster/more effective)
–IBM Data Studio, RDA, DB2W, and many more–Altova, Skytide, and many more
2222
22
Base Table In-lining & Compression
050000
100000150000200000250000300000350000400000
Tx/m
in
Inserts Queries Mixed
TPoX throughput by workload type
DB2 9
DB2 9.5
Inlined DB2 9.5
Inlined,compressed DB29.5
Up to 3x improvement in throughput with In-lining and Compression of the XML data
XML document structures are now stored more efficiently on DB2 9.5
Obtained a 30% reduction in space just by using DB2 9.5 to load the data.
With in-lining and compression achieved space savings of ~68%
5.310.911.816.5Database size (GB)
Compressed DB2 9.5
In-lined DB2 9.5
DB2 9.5
DB2 9
Database size reduction with compressed XML tables in DB2 9.5
17% improvement in inserts, ~ 3x improvement in queries and 2.2x improvement in mixed transactions
We save space when migrating to DB2 9.5 because internal structures are now stored more efficiently
2323
23
XML Load Performance
LOAD support for XML is new inDB2 9.5!
Tests done on TPoX tables in DB2 9.5show that LOAD out performs IMPORTby a factor of 3-8 x
0
500
1000
1500
2000
2500
3000
Ela
psed
Tim
e (s
ec) -
low
er is
bet
ter
No Indexes 10 Indexes SchemaValidation
Population of an XML table using Load is faster than Import
XML ImportXML Load
Increase CPU and DISK PARALLELISM to speed up LOAD
Build indexes during LOAD rather than loading then creating the indexes
Always run RUNSTATS after loading tables
With no indexes, 7x improvement in throughput using loadWith 10 indexes, 4 x improvement with loadSchema Validation can also be done during load, the results show that schema validation is ~8x better with load than importRUNSTATS during LOAD forces CPU_PARALLELISM to 1
2424
24
Best Practices for XML
• Use Base Table In-lining and Row Compression for workloads that …• Tend to be more I/O-bound rather than CPU-bound• Contain statements that involve XML columns• Do not touch large numbers of XML documents per
statement (be aware of temping)• Use Load instead of Import to insert XML documents• Use the new XML Update facility rather than XML
Update Stored Procedure• Filter XML documents passed to the XML Update transform
rather than filtering within the XML Update transform• Apply normal tunings for update workloads• Use parameter markers in your xml update statement to
avoid recompilation
Here is an example of the new XML Update using SQL-style parameter marker ("?") to avoid recompilation. In this example we also filter the data passed to the update transform with a where clause.
update xmlcustomerset info = xmlquery(transform 'copy $new := $INFO
modify do replace value of $new/customer_info/phone with $z
return $new 'passing cast(? as varchar(15)) as "z")
where cid = ?
For more information on how to use the new XML Update see http://www.ibm.com/developerworks/db2/library/techarticle/dm-0710nicola/
Here is another example of filtering docs passed to the tranformfor $i in db2-fn:xmlcolumn("XMLCUSTOMER.INFO")[/customerinfo/name="John Smith"]return
transform copy $new := $imodify do delete $new/customerinfo/phonereturn $new;
2525
25
Fast Redistribute Utility• Enables rapid incremental growth of a data warehouse
• Rapidly moves rows from partitions with more data to partitions with less/no data including space reclamation
• High performing as it …• Reduces active log space requirement• Reduces code path• Performs multiple activities in a single pass of the data• Redistributes multiple tables in parallel
• With DB2 9.5, a redistribute command is equivalent to these steps in DB2 9
1. Dropping and re-creating the indexes2. Redistribute3. Running REORG on the table4. Executing RUNSTATS on the table
Has to have runstats profile defined before hand for these steps to apply
2626
26
DB2 9 Redistribute DB2 9.5 Redistribute (aka Fast Redistribute)
Implementation and Architecture
Uses standard SQL inserts and deletes
Bypass runtime; Parallel Architecture; Single pass of data, Data compaction, Parallel processing of tables.
Performance Low – record level processing High – page level processing
Logging Requirements High : full SQL logging; high disk requirements
Low : minimal logging
Indexes Incremental Indexing (slow and heavy logging). No sorting. Very costly
Single Table Scan, Parallel Sorting, Parallel Index Rebuild
Disk Requirements Fully logged, large active log space and total log space/archive required
No additional disk requirements
Catalog Contention Low : tables are not created and dropped
Low : tables are not created and dropped
Post Redistribute steps required
Reorg, run stats, re-bindPossible Re-create indexes
Only re-binding of packages
Fast Redistribute Details
Fast redistribute on DB2 9 and DB9.5 do not supported replicated MQTs – drop and recreate after redistribution
2727
27
Fast Redistribute Performance Data
Up to 83% improvement in total time to redistribute all the data
More consistent redistribution times –time to redistribute ½ the data is ~ ½ the overall time in DB2 9.50%
20%
40%
60%
80%
100%
Rel
ativ
e El
apse
d Ti
me
- low
er is
be
tter
50 100Percentage of table redistributed
Elapsed time to redistribute table on DB2 9 and DB2 9.5
DB2 9 Redistribute DB2 9.5 Fast Redistribute
Linux x64~ 13% reduction in total time to redistribute the table for 50% of the data, for 100% of the data, improvement of 83%
28
28
Fast Redistribute Hints / Tips
• Use separate log disks than the disks used for the table space containers
• Create a large temporary table space for each node and increase the buffer pools and sort heap sizes needed for index creation
• Define RUNSTATS profiles for all tables to be redistributed
• Backup the affected tablespaces before and after redistribution
We place the tablespace in backup pending mode after redistribution in DB2 9.5
2929
29
LOB Performance Enhancements
• Large Objects are becoming more prevalent• BLOB, CLOB, DBCLOB
• DB2 9.5 enhances client/server LOB performance• Blocking of Cursors containing LOB columns• DB2 automatically chooses the best performing
method to send LOB data back to the client• CLI also supports RTNEXTALL for blocking cursors with LOB
columns
• Improved performance by reducing the number of network flows required to retrieve a LOB
3030
30
Performance benefit depends on the row size
• Smaller the row size - more LOBs can be blocked - more significant improvement
• Larger row size - not much savings in network traffic with blocking therefore performance benefit for medium and large-sized not as high as improvement for small pages
LOB Performance Improvements
02468
1012141618
Elap
sed
Tim
e (s
ec) -
low
er is
bet
ter
4K 8K 16K 32KPage size
Elapsed time to retrieve LOBs on pages with different sizes
DB2 9 DB2 9.5
smaller the row size -> more rows in a query block -> significant improvement (40% for LOB size=4k)
larger row size -> more network traffic - limited by the size of the TCP/IP packets ->increase in the number of send calls (performance benefit of DDF for medium and large-sized LOB is < 20% )
For large pages consider using DB2SOSNDBUF > 64k and DB2SORCVBUF > 64k these will increase the send and receive buffers in DB2
3131
31
Decimal Floating Point (DFP)• How to represent numbers
• BIGINT – good for whole numbers• FLOAT - Binary floating point is fast but can be
inaccurate for business applications• DECIMAL - SQL DECIMAL data type
• implemented in software• DECFLOAT(n) – new datatype
• 16 and 34 digit precision• IBM POWER6 is the first Unix microprocessor to
support decimal floating point arithmetic in hardware• Provides additional performance acceleration
8 bytes and 16 bytes for DECFloat
32
32
Decimal Floating Point Performance on IBM POWER6 Hardware with AIX
• Significant range of speed-up • Depends on
• How cpu-bound• Number/kind of math
expressions • Have seen up to 6x faster
performance• In one complex expression
with mainly aggregation DECFLOAT(16) was 1.6x faster than DECIMAL(15,4) when using DFP on POWER6
• Have seen ~40% gains in SAP BW environments
0
50
100
150
200
Elap
sed
time
(sec
) -
low
er is
bet
ter
Elapsed Time Reduction for a Sample Query(on POWER6 hardware)
Float/Double Decimal DFP
Q1 in TPC-H does a lot of calculations on DFP columns, we do sum and averages across these columns so the performance benefit is more recognizableselect
l_returnflag,l_linestatus,sum(l_quantity) as sum_qty,sum(l_extendedprice) as sum_base_price,sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,avg(l_quantity) as avg_qty,avg(l_extendedprice) as avg_price,avg(l_discount) as avg_disc,count_big(*) as count_order
fromtpcd.lineitem
wherel_shipdate <= date ('1998-12-01') - 117 day
group byl_returnflag,l_linestatus
d b
33
33
Decimal Floating Point Performance on x64 Hardware running Linux
• Significant range of speed-up • Depends on
• How cpu-bound• Number/kind of math
expressions • In one complex expression
with mainly aggregation DECFLOAT(16) was 1.4x faster than DECIMAL(15,4) when using the DFPAL library
050
100150200250300350400
Elap
sed
time
(sec
) -
low
er is
bet
ter
Elapsed Time Reduction for a Sample Query(on x64 hardware)
Float/Double Decimal DFP
DFP Hardware support exists on POWER6 but we only use the support on AIX, not pLinux today. Q1 in TPC-H does a lot of calculations on DFP columns, we do sum and averages across these columns so the performance benefit is more recognizableselect
l_returnflag,l_linestatus,sum(l_quantity) as sum_qty,sum(l_extendedprice) as sum_base_price,sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,avg(l_quantity) as avg_qty,avg(l_extendedprice) as avg_price,avg(l_discount) as avg_disc,count_big(*) as count_order
fromtpcd.lineitem
wherel_shipdate <= date ('1998-12-01') - 117 day
group byl returnflag
3434
34
MDC Rollout• Faster DELETE along cell or slice boundaries• Immediate Index Cleanup Rollout (implemented in DB2
v8.2.2)• Deferred Index Cleanup Rollout (new in DB2 9.5)
1997, Mexico, blue
1997, Canada, blue
1997, Mexico, yellow
1997, Canada, yellow
1997, Canada, yellow
1997, Mexico, yellow
1998, Canada, yellow
1998, Mexico, yellow
Cell for (1997, Canada, yellow)
Each cell contains one or more blocks
yeardimension
nationdimension
colourdimension
MDC Tables are multi-dimensional clustered tables that deliver fast performance by introducing block indexes which point to blocks or groups of records instead of to individual records.By physically organizing data in an MDC table into blocks according to clustering values, and then accessing these blocks using block indexes, MDC is able to provide significant additional performance benefits. In the example above, the block indexes on nation, year and colour provide more enhanced performance for queries using these fields
3535
35
Deferred index cleanup rollout• Since DB2 v8.2.2, rollout deletion provided faster, block-based
deletes and reduced logging than regular deletes• Required row-level processing and logging for each index• Performance dependent on the number of indexes
• DB2 9.5 provides further enhancements• Deferred index cleanup
• Enabled by setting DB2_MDC_ROLLOUT to defer, SET CURRENT MDC ROLLOUT MODE DEFERRED
• Removes index keys after the transaction commits• Cleans up multiple indexes in parallel • Reduces logging - instead of logging one record for every RID
removed from the indexes, only one record per index page is logged• The application doing the delete does not need to wait before
processing other transactions
3636
36
MDC rollout performance data
Deferred rollout excluding the wait time for asyncindex cleanup is the fastest; transactions do not have to wait until the index cleanup is finished
Significant reduction in log space usage with deferred rollout
0%10%20%30%40%50%60%70%80%90%
100%R
elat
ive
time
to d
elet
e ro
ws
- low
er is
bette
r
30 97% of table deleted
Time to delete rows using different options of MDC rollout
Delete (no rollout)
Immediate rollout
Deferred rollout (includingasync cleanup)Deferred rollout (excludingasync cleanup)
0%10%20%30%40%50%60%70%80%90%
100%
% lo
g sp
ace
need
ed -
low
er is
bet
ter
30 97% of table deleted
Log space needed to delete rows using various MDC rollout options
Delete (no rollout)Immediate rolloutDeferred rollout
On 11 million rows (134260 pages), 16K page, 16K extent size, 4 nodes, 8 RID indexes The deferred rollout not including async cleanup time is there to show how fast the delete statement finishes.
37
37
When to use deferred index cleanup rollout
• Consider deferred index cleanup when• There are a number of RID indexes• Large number of deletes• Several rollouts planned for a particular table• In limited log space environments• Doing a lot of roll-in/roll-out in the short
maintenance window – transactions do not have to wait for index cleanup to occur
Limited log space environments - customers may need to break down the delete into smaller ones by deleting first N rows. Using deferred index cleanup can avoid the problem because not only the deferred index cleanup reduces the logging, but it also performs internal commit for the cleanup
Deferred rollout has the following drawbacks1) blocks cannot be reused immediately after deletion. 2) index scan will be slower while the indexes are being cleanup. 3) additional memory is needed to memorize which blocks have been rolled out.
Immediate index cleanup is default in DB2 9.5
3838
38
Optimizer enhancements in DB2 9.5• Real Time Statistics (RTS)
• Table statistics updated automatically over time with variables including UDI (update/delete/insert)
• Potentially significant query plan improvements• Slight overhead
• FFNR (Fetch First N Rows), OFNR (Optimize for N Rows) and Group-By with min/max• Improved costing and better query/sub-query plan
alternatives for FFNR, OFNR and Group-By• Performance Results:
• Up to 99% improvement for such queries
< 1% overhead for RTS
39
39
STMM Updated for 9.5• The self-tuning memory manager has been
enhanced in DB2 9.5 with feedback from customers• We have seen very positive results
• Particularly in OLTP environments• Some ISVs now use it out of the box
• The DB2 performance team uses it to tune OLTP benchmarks• But we turn it off at the end once tuning is
complete• We continue to work on enhancements
• Particularly in the data warehousing environment
40
40
Optimizer enhancements in DB2 9.5• In-List Cardinality Improvements
• DB2 has enhanced the costing of IN list predicates that are to be converted to nested loop joins
• Performance Results:• Various individual query improvements for SAP-
SSQJ and internal query workloads (30%-99%)• Improved filter-factor/selectivity estimation of
‘between’ predicates using parameter markers • Performance Results:
• 5% improvement overall throughput SAP-SD (AIX)• 12% faster average response time SAP-SD (LINUX)
In-list 2 join optimization existed in v9, v9.5 just improves this feature
40
4141
41
Unicode Enhancements• Unicode standards exist to support the worldwide
interchange and processing of texts of diverse languages
• Unicode has the ability to encode 1.1 million characters
• Unicode is the database creation default for new databases in DB2 9.5• With functional and performance enhancements
for Unicode• UCA400 Collation improvements for sorting and
organizing data• Normalized Unicode, Thai and Slovakian
Changes in DB2 9.5Codepage conversion -(caching values, decreasing function calls, minimize tracing, short-cutting sqlnlsIconv) in libg11n library
Rearranged DB2 code in order to minimize cycles spent on inter-library glue-code
Addition of adjustable ICU Key Buffering for binary sortEliminates excessive ICU tracing
ICU - 'international components for unicode' ... its a standard we follow. like a library. ICU-tracing: tracing the code on the ICU library. ICU-key-buffer: basically how we pass information/mapping in db2 (size of the buffer)
Unicode normalization is a form of text normalization that transforms equivalentcharacters or sequences of characters into a consistent underlying representation so that they may be easily compared. Normalization is important when comparing text strings for searching and sorting (collation).
http://unicode.org/reports/tr10/
4242
42
Unicode Performance Results
020406080
100120140160180200
Ela
psed
Tim
e (s
ec) -
low
er is
be
tter
Overall Unicode Performance Improvement in DB2 9.5
DB2 9DB2 9.5
0
50
100
150
200
250
300
350
400
Ela
psed
Tim
e (s
ec) -
low
er is
bet
ter
Normalized Thai Slovakian
UCA 400 Collation Improvement in DB2 9.5
DB2 9DB2 9.5
From internal tests, overall performance improvement of ~30% when using Unicode in DB2 9.5
Performance improvements of 11-14% for Normalized Unicode, Thai and Slovakian with UCA400 Collation
43
43
Container
Container
Container
Subagentdb2agntp
Parallel scan & sort
CoordinatingAgent
db2agent
Table queue
Indexbuild
Up to 6 agents, depending on active CPUs & number of
nodes (1:N)
In DB2 9, just one DB2 agent handles allthis in non-SMP case
Parallel Index Create• DB2 9.5 parallelizes index create to exploit extra processors• A CPU-bound index create in DB2 9 will see a substantial
performance boost in DB2 9.5• Improvements between 20% and 2x, depending on the
number of CPUs & the I/O capacity of the system
Up to degree 6 parallelism.
Controlled by registry variable
DB2_SMP_INDEX_CREATE
4444
44
DB2 Audit Enhancements in DB2 9.5• Introduction of auditing at the database level instead of at the
instance level• One audit log file per database • Audit policies can be created and associated to a table,
user, group or role, thereby enabling fine grained auditing• Customizable audit log path• Introduction of Archiving
• Quick method of switching the active log file to an archived log file and starting a new active log file
• Allows follow on operations such as backup, extraction and deletion to have zero effect on the performance of the server
• EXECUTE Category• A new database level category that audits the execution of
SQL statements• Can optionally include input data host variables and
parameter markers
45
45
DB2 Audit Performance Results
0
1
2
3
4
5
6
7
8
Rel
ativ
e Th
roug
hput
Relative throughput of an internal OLTP workload collecting DB2 Audit data
DB2 9DB2 9.5
From internal tests on OLTP workload, collecting db2audit data is ~ 8x faster on DB2 9.5
Audit log is on a separate disk from the db2 logger, table space container disks and database directory with DB2 9.5
4646
46
DB2 Audit Hints / Tips
• Use Asynchronous Logging • Set AUDIT_BUF_SZ to > 0
• Place audit logs and archive audit logs disks on separate disks,different from db2 logger, database directory and table space container disks
• Only audit the tables needed with the level of audit required• In a DPF environment set the log path for each node on
different disks• Use the filter capability of audit to selectively audit data • Use EXECUTE category instead of CONTEXT category when
auditing SQL statements
47
47
Workload Management (WLM)• Provides a foundation for more predictable performance
through improved resource & request management• Explicit control of CPU priority for different classes of work• Controlling ‘rogue’ queries• Finer-grained monitoring capabilities than DB2 9
• Integrated with AIX WLM for control of CPU consumption by service class
• Agent priority and prefetch priority provided on all platforms• Thresholds allow control of activity execution & monitoring
• Query exceeded a threshold? Capture information, or even block it
• Too many queries or utilities bombarding the system? Queue up the excess ones
• Aggregate statistics and individual event records can be captured on activity in any service class• Allows monitoring to be as narrow or broad as required
48
48
0100200300400500600700
Tx/s
No WLM DB2 WLM + AIXWLM
DB2 WLM +AGENTPRI
CPU-bound OLTP workloads competing for system resources
Low priorty workloadHigh priority workload
WLM Examples• Example 1
• Two CPU-intensive workloads occupy the same system & compete for resources
• Simple DB2 WLM Workloads & Service Classes plus AIX WLM allow the high priority workload to use most of the resources
• Example 2• Two I/O-intensive scan
workloads occupy the same system & compete for resources
• Simple DB2 WLM Workloads & Service Classes using PREFETCH PRIORITY allow the high priority workload to use most of the resources
0100200300400500600700
Que
ry T
hrou
ghpu
t
No WLM DB2 WLM + PREFETCHPRIORITY
Disk-bound BI workloads competing for system resources
Low priorty workloadHigh priority workload
49
49
WLM Hints / Tips
• For CPU-intensive environments, use CPU prioritization with AIX WLM CPU shares, or DB2 WLM AGENTPRI ‘nice’ values
• For BI environments, prefetch prioritization provides best control of scan-oriented workloads
• Use fine-grained service class definitions to choose what applications to monitor• WLM enables activity (event) monitoring with much
lower overheads than statement event monitors in previous versions of DB2
50
50
Many more new DB2 9.5 capabilities exist …
5151
51
Summary
• DB2 is the performance benchmark leader • TPC-C, TPC-H, XML, SAP, SPEC-J …• Leader in the Virtualization space
• New features in DB2 9.5 that further boost performance• threaded architecture• XML enhancements• LOB blocking• …
• Initial performance results and usage guidance
52
52
Berni SchieferIBM Toronto Lab
schiefer@ca.ibm.com
Session E03DB2 9.5 Performance Update
top related