trends and directions for database technology curt cotner ... trends and directi… · ibm shall...

28
Trends and Directions for Database Technology Curt Cotner, IBM Fellow Session Code: G018

Upload: others

Post on 07-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

Trends and Directions for Database TechnologyCurt Cotner, IBM FellowSession Code: G018

Page 2: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Please Note:

IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

Page 3: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Acknowledgements and Disclaimers:

© Copyright IBM Corporation 2011. All rights reserved.

– U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract

with IBM Corp.

IBM, the IBM logo, ibm.com, Infosphere Warehouse and SAS are trademarks or registered trademarks of International Business

Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first

occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks

owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other

countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at

www.ibm.com/legal/copytrade.shtml

Other company, product, or service names may be trademarks or service marks of others.

Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all

countries in which IBM operates.

The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

Page 4: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

DBMS is the Bedrock of Modern Business

� Mature

� Performance

� Available

� Reliable

� Consistent

� Durable

4

Page 5: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

New Technology Emerges

5

XMLDatabases

In-memoryDatabases

Object Databases

NoSQLDatabases

1990s 2000s 2010s

Page 6: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

NoSQL Datastores

Page 7: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Current NoSQL Landscape

Document Stores12 +

Graph Stores11+

Key Value Stores23 +

Tabular Stores6 +

XML Stores9 +

Currently there are more than 100+ noSQL systemsNot clear which of these will survive

Object Stores12 +

Others

Page 8: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Why did NoSQL Datastores Arise?

� Some applications want extremely rapid development iterations

� so rapid that they cannot afford to negotiate schema changes with a DBA

� Some application groups aren’t comfortable with SQL, and really don’t want to get

involved in learning relational database technology

� Need for a simple low-latency, low-overhead API to access data that scales to 1000’s of

Web servers (e.g. cached Web data)

� Need to scale-out on cheap commodity nodes with locally attached SATA disks

� Increasing use of distributed analytics

Page 9: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

NoSQL Datastores

Transactional� Custom high-end OLTP for financial applications

� Scaleout datastores for Cloud/Web 2.0

� Examples

– MemcacheDB, Cassandra, Dynamo, Voldemort,

SimpleDB, Gigaspaces, Websphere eXtreme Scale

Analytics� Managing updates

� Support for random access and indexing

� Scaleout content store

� Examples

– Bigtable, HBase, Hypertable

Focus on Give up

� Commodity servers, networking, disks

� Easy elasticity and scalability to multiple racks (10s to 100s of servers)

� Fault-tolerance and high availability

� Relational data model

� SQL APIs

� Complex queries (joins, secondary indexes, ACID transactions)

Two Worlds

Page 10: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

DB2 is already providing XML support

Applications include creating business reports, SOA, webservices, forms etc

Page 11: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

DB2 is making investments to support Key Value

Data Store

Get (Key)

Put (Key, Value)

Remove(Key)

Value

Often used to cache data and objects for Web 2.0 applications

Page 12: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

NoSQL Graph Store

� Easy database design

� Schema not pre-defined

� Easy adaptation as needs evolve

Curt Cotner 1995 FordownsCar

Curt Cotner 123 Maple Ave, ChicagoownsHouse

Curt Cotner 2001 ThunderjetownsBoat

Page 13: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

NoSQL Graph Store based on DB2

� Open source Jena code ported to use DB2

� Uses DB2 logging, indexing, compression, etc.

� Makes use of DB2 high availability features

� Can scale out with DB2 pureScale or parallel sysplex

• Used in measurements with Rational Jazz

• DB2 graph store outperforms open source 4:1

• DB2 eliminates the scalability and availability limitations found with the open source

solution

• Will be provided to all DB2 and Informix customers at no added charge in 1H2012

Page 14: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Cloud Computing: Hottest Topic in the Industry…

14

Page 15: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

� Secure, self-service cloud management hardware appliance for management of shared application deployments

� Pre-optimized for high performance and scalability with pre-configured workload patterns for ease of use

� Unmatched IBM Middleware management (apply maintenance, federate cells, etc. - not black box)– Can also manage black-box images to support other products

� Enables consistent & repeatable deployment of application environments based on patterns (Virtual Systems and Virtual Applications)

� Dispenses hardened middleware patterns into a pool/cloud of virtualized hardware running a supported hypervisor e.g. VMware ESX, z/VM, or PowerVM.

– “Bring your own cloud”

� Integrates with existing infrastructure management tools through programmable REST APIs

� License management provides ability to set license thresholds per product to maintain cloud-wide compliance

� Elasticity of application environments through support for addition of virtual images to dispensed patterns

� Fine grained control of deployments with IP address mapping and naming details with deployed patterns

IBM Workload Deployer – What is it?

X P Z

Page 16: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

DB Multi-tenancy: Sharing the Costs of Hardware Resources and Maintenance� Multitenancy can further reduce hardware costs and maintenance costs of a database in the cloud

� Multitenancy: multiple companies or users using the same software with a level of isolation

– Tenants are companies or users that would have historically installed and used a single instance of software solely for their own use

– Multitenancy allows companies/users to use the same software with a level of isolation

� Analogous to users running various applications on the same operating system

– The point is to share the management and hardware costs among a number of “tenants”

– Tenants, like the distinct users on an operating system require a level isolation

Number of Tenants

Siz

e o

f Ten

an

ts

Large tenants

Medium tenants

Long tail of small tenants

Medium Tenants Small TenantsLarge Tenants

Isolation: DatabasesShared: NA/Hardware

Isolation: TablesShared: Database

Isolation: RowsShared: Tables

MT ApplicationMT ApplicationMT Application or non-

MT application

1616

Page 17: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Database Multi-Tenancy Models

Tenant A

Tenant B

App Server

Shared Tables� Tenants w/ same schema

� Smallest metadata

� Difficult tenant-specificoptimizations and tooling

� FGAC/Row permissions for security

Separate Instances/DBs

Separate Schemas/Tenants

� Tenants w/ different schemas (e.g. customization)

� Larger catalog footprint

� Table statistics

� Tenant-specific backup/restore

Tenant A

Tenant B

Multi-tenant App

App Server

Multi-tenant App

Hig

her

Query

Optim

ization/r

untim

e C

om

ple

xity,

Hig

her

Security

Worr

ies

Multi-tenant App

App Server

Higher Multi-tenancy, better resource utilization

Page 18: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Shared Development Scenario

Shared objects and/or private objects

� Most objects could be shared across multiple tenants.

� Tenants w/ different schemas (e.g. customization) can have private objects.

� Greatly reduces cases where you need to create a new instance/subsystem to keep changes to shared objects from impacting other users.

Tenant A

Tenant B

Multi-tenant App

App Server

Shared Objects

Page 19: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Shared Data Test Scenario

Tenant A updates

Tenant B updates

App Server

Shared Tables Multi-tenant App

Shared rows for all

tenants

•Tenants with same schema

•Most rows are shared across tenants

•Updates are held off to the side in tenant-keyed rows visible only to the tenant.

•Huge disk savings for SAP customers, since they often have 6-15 copies of production used for various types of development, testing, training.

•Should be easy/fast to destroy tenant updates for repetitive testing scenarios.

Page 20: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012 20

Traditional Systems Landscape

OLTP Staging Area ODS EDW Data Marts

ETL ETL ETL ETL

Historical reasons:

• Different access patterns� impact on performance

• EDW as the data integration hub� again, impact on performance

• Different life-cycle characteristics� and again, impact on performance

• Different Service Level Agreements (SLA)� Lack of broadly available workload management capabilities

� Choice of lower cost-of-acquisition offerings

Negative ramifications:

• Complexity� both in systems management and in applications

• Difficulties in supporting real time analytics

• Inability to match ever more demanding SLA

requirements

• High total cost of ownership

Applications

Page 21: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012 21

Visionary Systems Landscape

OLTP Staging Area ODS EDW Data Marts

ELT ELT ELT ELT

Applications

� Benefits� Consolidating all the components into a single

system

� Uniform access to any data

� Efficient data movement within the system

(ideally, no network)

� Opportunity to remove, i.e. consolidate some

of the layers

� Challenges� Mixed workload management capabilities

� Ensuring continuous availability, security and

reliability

� Providing universal processing capabilities to

deliver best performance for both

transactional and analytical workloads without

the need for excessive tuning

� Approaches� Columnar stores

� In-memory databases

� Hardware acceleration, special purpose processors

� Appliances

Page 22: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Columnar Data Store Model

• Transactional database engines

typically use row-oriented data store

model

• Query engines which are optimized for

analytical queries sometimes use a

column-oriented approach.

• In a columnar store, the data of a

specific column is stored sequentially

• If attributes are not required for a

specific query execution, they simply

can be skipped, not causing any I/O or

decompression overhead.

Advantages:

• Scan only the columns required. Large

reduction in I/O

• High compression rates

• Good CPU cache locality while processing

column data

Challenges:

• Can be expensive to combine the qualifying columns into answer set (projection).

• Random I/O created during projections can eliminate benefits of I/O reduction

• Additional storage required so that values across vertical slices can be merged

• Multiple I/Os per record for all write operations (INSERT, UPDATE, DELETE).

Col1 row1

Col1 row2

Col1 row3

Col1 row4

Col1 row5

Col1 row6

Col1 rowN

.

.

.

Col2 row1

Col2 row2

Col2 row3

Col2 row4

Col2 row5

Col2 row6

Col2 rowN

.

.

.

Col3 row1

Col3 row2

Col3 row3

Col3 row4

Col3 row5

Col3 row6

Col3 rowN

.

.

.

ColN row1

ColN row2

ColN row3

ColN row4

ColN row5

ColN row6

ColN rowN

.

.

.

Multiple storage blocks store data

exclusively for this column

Approaches:

• In-memory

• SSD

• Multi-core friendly scan patterns

Page 23: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012 23

Memory Hierarchy

CPU L1 L2/3 DRAM SSD HDD

1 - 26 - 20

100 - 4005000

1000000

Palace Park

Berget Stuttgart

Chicago

2 times to the Moon and back

c

y

c

l

e

s

m

i

l

e

s

This room

Challenges:

� Disk storage is realistically unavoidable for a fast and reliable recovery� Logging

� Backup

� Database growth and capacity planning challenges� Non-deterministic compression rates

� Cost

Approaches:

� Enhancing disk-based DBMS with in-memory capabilities� Optimizer and run-time awareness

� Storage Class Memory

Page 24: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012 24

Storage Class Memory� Need to close the gap between DRAM and HDD

� HDD growth focus has always been areal density

Recommended reading: IBM Journal Research & Development, Vol 52 No 4/5

Capacity 1TB

Read or write access time 100 ns

Data rate > 1 GB/s

Sustained I/O rate 238K SIO/s

Sustained bandwidth 975MB/s

Write endurance 1012 writes

Projected 2020 characteristics of SCM devices

Thumb Drive

• Goal: create compact, robust storage (and memory) systems with greatly improved cost/performance ratios

• Defining characteristics

� nonvolatility� solid-state implementation (no moving parts)� very low latencies (tens to hundreds of ns)� low cost per bit� physical durability during practical use

• Access latency improved relatively modestly

� 10% vs. 45% CAGR for chip performance

Page 25: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012 25

IBM DB2 Analytics Accelerator: Deep DB2 Integration

Data

Manager

Buffer

ManagerIRLM

Log

ManagerIDAA

Applications DBA Tools, z/OS Console, ...

. . .

Operation Interfaces

(e.g. DB2 Commands)

Application Interfaces

(standard SQL dialects)

z/OS on System z10‘s of processors

100‘s GB of memory

Netezza

DB2

Superior availability

reliability, security,

workload management,

OLTP performance ...

Industry leading

DW performance,

ease of use

Page 26: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012 26

IDAA: Query Execution

DB2 for z/OS

Optimizer

IDA

A D

RD

A R

equesto

r

IDAA

Application

Application

Interface

Query execution run-time for

queries that cannot be or should

not be off-loaded to IDAA

SPU

CPU FPGA

Memory

SPU

CPU FPGA

Memory

SPU

CPU FPGA

Memory

SPU

CPU FPGA

Memory

SM

P H

ost

Page 27: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

10,000

Sec(s)

Query 1 Query 2 Query 3 Query 4 Query 5 Query 6 Query 7 Query 8 Query 9

Query

Acceleration

Times

Faster

Query

Total

Rows

Reviewed

Total

Rows

Returned Hours Sec(s) Hours Sec(s)

Query 1 2,813,571 853,320 2:39 9,540 0.0 5 1,908

Query 2 2,813,571 585,780 2:16 8,220 0.0 5 1,644

Query 3 8,260,214 274 1:16 4,560 0.0 6 760

Query 4 2,813,571 601,197 1:08 4,080 0.0 5 816

Query 5 3,422,765 508 0:57 4,080 0.0 70 58

Query 6 4,290,648 165 0:53 3,180 0.0 6 530

Query 7 361,521 58,236 0:51 3,120 0.0 4 780

Query 8 3,425.29 724 0:44 2,640 0.0 2 1,320Query 9 4,130,107 137 0:42 2,520 0.1 193 13

DB2 Only

DB2 with

IDAA

IDAA: Beta Program Results

Page 28: Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall not be responsible for any damages arising out of the use of, or otherwise related

10/05/2012

Curt [email protected], [email protected]

Session

Trends and Directions for Database Technology