2016-10 open source databases - lawrence - v2.1.pdf
TRANSCRIPT
IBM North America
© 2016 IBM Corporation
Linux on PowerOpen Source Databases
Kevin LawrenceIBM - NA Power Systems - Server Solutions Ecosystem
Open Source Databases
© 2016 IBM Corporation 2
IBM North America
Linux on Power - Open Source Databases
• “By 2018, more than 70% of new in-house
applications will be developed on an OSDBMS,
and 50% of existing commercial RDBMS
instances will have been converted or will be in
process”*
*Gartner - The State of Open Source RDBMSs, 2015, by Donald Feinberg and Merv Adrian, published April 21, 2015.
© 2016 IBM Corporation 3
IBM North America
Database Ecosystem Many Database choices spanning commercial to open source products, Relational and non-Relational models –no single ‘winner takes all’,
Relational DBs strengths –transactional integrity and large ecosystem around SQL
NoSQL DBs are much lower cost and provide clients a simple data model with dynamic control over store and retrieve of primarily unstructured data types.
The primary 4 flavors of NoSQL DBs are all available on Power 8 :
Key/Value Store (example is Redis) Document Store (example is MongoDB) Columnar Store (example is Cassandra Graph Stores (example is Neo4J)
© 2016 IBM Corporation 4
IBM North America
Types of Databases• Relational database management systems (RDBMS) support the relational
(table-oriented) data model. The schema of a table (relation schema) is defined
by the table name and a fixed number of attributes with fixed data types. A record
(entity) corresponds to a row in the table and consists of the values of each
attribute. (Open Source example would be Postgres/EnterpriseDB)
• Document Databases (eg – MongoDB) store data in Documents, Documents
contain one or more Fields. Data can be queried based on any combination of
fields in a document. The appeal of these systems is that that are very general
purpose, have large application ecosystems and map very nicely to support and
enable many of today’s object oriented programing styles.
• Key Value Store Databases (eg – Redis) are the most basic type of non-
relational DBs. They store a Key and associated Values.
• Wide Column Stores (example – Cassandra) vary in the number of Columns
that are stored. The appeal of these systems is around their very high
performance and scalability.
• Graph Databases – (eg – Neo4j) focus on storing simple and complex
relationships and can be queried to discover simple and more complex
relationships between the data.
© 2016 IBM Corporation 5
IBM North America
Types of Databases with Open Source Examples
Wide column store - Example: Cassandra Graphical - Example: Neo4J
- Example: MongoDB - Example: Redis
Relational - Example: EnterpriseDB
© 2016 IBM Corporation 6
IBM North America
Common Linux on Power OSDBs
Name Classification Optimized for Common Use Cases
MongoDB NoSQL - Document Store Document Model, Document stores, semi-structured or unstructured data.
Single view of Customer records, Enterprise content management, catalogs, personalization
Redis NoSQL - in memory Key Value Store Data queues, Strings, Lists, Counts, caching, Statistics, Text, session IDs, pictures, videos
Live in memory cache, data queues, User session data, shopping cart data,
Cassandra NoSQL - Wide Column Store NoSQL environments that need Very High Performance and Scalability, Very High data volumes
Messaging, Fraud detection, Internet of Things data – sensor data, log data, telco call detail records
Neo4J NoSQL - Graph Store Data stored as edges, nodes, or attributes (Graphs).
Fraud detection, Social Network Analysis, Location aware apps, Master data mgmt., Machine Learning
PostGres(Enterprise DB)
Open source Object Relational database
Wide variety of transactional work at lower TCO – relational/structured queries to object store and retrieval
Oracle RDBMs migrations and take-outs
MariaDB Open source Relational database Lower cost transactional SQL based queries and updates
Migrations from Oracle MySQL, Turbo LAMP stack
© 2016 IBM Corporation 7
IBM North America
Redis
• Main points: Simple values or data structures by keys. Blazing fast
• Exploits Power 8: Redis Labs on Power utilizes IBM POWER8
servers, the IBM Flash System, the IBM CAPI-Flash card and the
Redis Labs Enterprise Cluster (RLEC) for Flash software.
• Other features : Master-slave replication, automatic failover
• Best used: For rapidly changing data with a foreseeable database
size (should fit mostly in memory).
• For example: To store real-time stock prices. Real-time analytics.
Leaderboards. Real-time communication. And wherever you used
memcached before.
© 2016 IBM Corporation 8
IBM North America
MongoDB
• Main point: Retains some friendly properties of SQL. (Query, index)
• Exploits Power 8 features: Performance, MongoDB with CAPI Flash
on P8 testing just starting
• Other features : Master/slave replication (auto failover with replica
sets), Sharding , Text search integrated, Has geospatial indexing
• Data center aware
• Best used: If you need dynamic queries. If you prefer to define
indexes, not map/reduce functions. If you need good performance on
a big DB. If you wanted CouchDB, but your data changes too much,
filling up disks.
• For example: Most popular NoSQL Document DB.
© 2016 IBM Corporation 9
IBM North America
Cassandra
• Main point: Store huge datasets , retrieves in "almost" SQL (CQL3)
• Exploits Power 8 features : Apache
• Other features: CQL3 is the official interface and very similar SQL, but with some limitations that come from the scalability
(most notably: no JOINs, no aggregate functions.)
• Querying by key, or key range (secondary indices are also available).
• Highly scalable and highly available with no single point of failure
• NoSQL column family implementation
• Very high write throughput and good read throughput. Writes can be much faster than reads (when reads are disk-bound)
• SQL-like query language (since 0.8) and support search through secondary indexes
• Tunable consistency and support for replication
• Flexible schema
• Map/reduce possible with Apache Hadoop
• Very good and reliable cross-datacenter replication
• Best used: When you need to store data so huge that it doesn't fit on server, but still want a friendly familiar interface to it.
• For example: Web analytics, to count hits by hour, by browser, by IP, etc. Transaction logging. Data collection from huge
sensor arrays.
© 2016 IBM Corporation 10
IBM North America
Neo4j
• Main point: NoSQL Graph database optimized for connected data
• Exploit Power 8 features: Neo4j on POWER8 offers 56 TB of extended memory, drastically increasing the size at which realtime graph queries are possible. Real-time graph processing with Neo4j on POWER8 supports both standard operational requirements and analytic insights that normally require offline processing. IBM POWER8 hardware allows Neo4j to scale both up and out for graphs of greater size than ever before.
• Other features: HTTP/REST (or embedding in Java)
• Full ACID (Atomicity, Consistency, Isolation, Durability) conformity (including durable data)
• Integrated pattern-matching-based query language ("Cypher")
• Indexing of keys, nodes and relationships
• Advanced path-finding with multiple algorithms
• Optimized for reads
• Has transactions (in the Java API)
• Clustering, replication, caching, online backup, advanced monitoring and High Availability are commercially licensed
• Best used: For graph-style, rich or complex, interconnected data.
• For example: For searching routes in social relations, public transport links, road maps, or network topologies.
© 2016 IBM Corporation 11
IBM North America
EnterpriseDB (Postgres)
• Main Point: Enterprise class, Open Source, Relational Database
• Easily integrates/supplants OracleDB - This means that many applications written for Oracle run on Postgres
Advanced Server without modification and Oracle-skilled developers can use it with minimal re-training.
• Performance – EDB running on Power8 brings a cost-effective, enterprise-class solution to CIOs and IT managers
running Red Hat Enteprise Linux 7.x and Power8 based on little endian. EDB Postgres Advanced Server on Power8
offers 2x higher performance over Intel-based systems for OLTP applications, high performance multi-threading,
more cache and greater data bandwidth
• Scalability – Reliably handles multi-terabyte data sets supporting millions of users with guaranteed transactional
integrity and continuous availability
• TCO – Reduces operating costs by requiring less systems at a lower acquisition cost
• DBMS Convergence – Support traditional structured, semi-structured, and unstructured data types to reduce the
need to deploy costly, one-off NoSQL data silos, adoption of Postgres and migration of workloads from proprietary
databases.
• Services – Brings together two industry leaders committed to Open Source offerings. EDB Postgres Management,
Integration, and Migration Suites supports replication, HA, database monitoring/management and data integration for
mission-critical enterprise applications.
Modernize your Databasewith POWER8 and EnterpriseDB
79%3-year TCO Reduction
30%Less servers
84%reduction in SW licensing
cost with fewer cores and
EnterpriseDB
29%reduction in HW costs
and maintenance
68%reduction in core count
0
1000000
2000000
3000000
4000000
5000000
6000000
S822LC/20c/2.926 withEnterpriseDB
HP DL380p/Brwell (2s) withOracleEE
Solution TCO for 3 years
Environmentals HW SW • Assumptions: 7 Power S922LC servers (65% utilization) have equivalent performance as 10 x86 servers (40% utilization)
Modernize your Databasewith POWER8/PowerKVM and MongoDB vs x86/VMWare and Oracle EE
85%3-year TCO Reduction
90%reduction in SW licensing
cost with fewer cores and
MongoDB
23%reduction in HW costs
and maintenance
45%reduction in core count
• Assumptions: • 7xPower S822LC/20c servers with PowerKVM (40% utilization) have equivalent performance as
10xHPDL380/E5-2699 v4/44c servers with VMWare (40% utilization)• Performance is based on SPECint_rate
0
1000000
2000000
3000000
4000000
5000000
6000000
S822LC/20c/2.926 withMongoDB
HP DL380/BWL/44c/2.2 withOracleEE
Solution TCO for 3 years
Environmentals HW SW
© 2016 IBM Corporation 14
IBM North America
Hortonworks Announcement
• Announced at IBM Edge:
Hortonworks HDP is coming to Power!
• What is Hortonworks’ HDP?
It is an Enterprise-ready open source
Apache™ Hadoop® distribution based
on a centralized architecture (YARN).
• HDP addresses the complete needs of
data-at-rest, powers real-time customer
applications and delivers robust
analytics that accelerate decision
making and innovation
© 2016 IBM Corporation 15
IBM North America
“By 2018, more than 70% of new in-
house applications will be developed on
an OSDBMS, and 50% of existing
commercial RDBMS instances will have
been converted or will be in process”*
*Gartner - The State of Open Source RDBMSs, 2015, by Donald Feinberg and Merv Adrian, published April 21, 2015.
© 2016 IBM Corporation 16
IBM North America
Trademarks and notes
IBM Corporation 2016
• IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be trademarks or service marks of International Business Machines Corporation in the United States, other countries, or both. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.
• Other company, product, and service names may be trademarks or service marks of others.
• References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.
• IBM and IBM Credit LLC do not, nor intend to, offer or provide accounting, tax or legal advice to clients. Clients should consult with their own financial, tax and legal advisors. Any tax or accounting treatment decisions made by or on behalf of the client are the sole responsibility of the customer.
• IBM Global Financing offerings are provided through IBM Credit LLC in the United States, IBM Canada Ltd. in Canada, and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates and availability are based on a client’s credit rating, financing terms, offering type, equipment type and options, and may vary by country. Some offerings are not available in certain countries. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.
© 2016 IBM Corporation 17
IBM North America
Welcome to the Waitless World.