cassandra, couchbase and spring data in the enterprise

70
© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission. Spring Data Cassandra Matthew Adams Senior Consultant, SciSpike, LLC

Upload: spring-io

Post on 14-Jun-2015

1.216 views

Category:

Software


6 download

DESCRIPTION

Speakers: Matthew Adams, SCISpike, Michael Nitschinger, Couchbase Data / Integration Track Spring Data Cassandra brings Cassandra support to the Spring Data umbrella of projects, offering Spring Data's familiar Repository concepts & POJO persistence. This talk will focus first on POJO persistence over Cassandra, including automatic Cassandra schema generation and Spring context configuration using both XML & Java. Then, the talk will dig deeper into some of the lower-level features that Spring Data Cassandra is built upon (AKA "Spring CQL"), which make plain, old Cassandra development simpler & easier. Couchbase Server is well-known as one of the leaders in the NoSQL space, heavily used in enterprises and startups alike where low latency even at hundreds of thousands of operations per seconds matters. This talk will not only give a short introduction about the benefits of adopting Couchbase, but also show how to integrate it into your Java Enterprise landscape through its support for Spring Data. You will learn how to fully control your database schema from the application through entities and repositories, how to deploy new application versions or scale out your cluster without a single second of downtime and also how to integrate easily with elasticsearch.

TRANSCRIPT

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Spring Data Cassandra

Matthew Adams Senior Consultant, SciSpike, LLC

Contents

• Cassandra Overview

• Spring Data Cassandra

• Spring CQL

2

Cassandra Overview

3

Cassandra: History

• Originated from Facebook, now at Apache

• Supported by DataStax

• Born from Amazon Dynamo & Google BigTable

• Distributed • All nodes are peers, no master/slave: ✓

• Always available: ✓

• Partition/fault tolerant: ✓

• Consistent: tunable (any, one, two, three, quorum, all, etc.)

• Provided language bindings/drivers: • Java, C#, Python, Node.js, etc.

4

Cassandra: Strengths & Weaknesses

• Strengths • Storing lots of data

• Very fast writes

• Fast reads

• Time-series data

• Fault tolerant

• Automated replication

• Weaknesses • Limited data model

• Neither/Both • No ad-hoc querying

5

Cassandra: Concepts

• Keyspace • Identified by name

• Contains tables (AKA "column families")

• Spans nodes in racks in data centers

• Table • Identified by name

• Has rows

• Row • Contains columns (up to 2 billion!)

• Can have different number of columns

• Column • Identified by name

• Has data type

6

Cassandra: Primary Key

• Required for each table

• Uniquely identifies row

• Node

• Cluster

• Partition Key

• Determines node

• Has one or more columns

• Cluster Key

• Determines disk location

• Has zero or more

columns

7

Primary Key

Partition Key Cluster Key

Cassandra: Cassandra Query Language (CQL)

• Similar to SQL

• Data definition language (DDL) • CREATE: KEYSPACE, TABLE, INDEX, …

• ALTER: KEYSPACE, TABLE, …

• DROP: KEYSPACE, TABLE, INDEX, …

• Data manipulation language (DML) • INSERT INTO table (column1, …) VALUES (value1, …) …

• UPDATE table SET column1 = value1, … WHERE …

• DELETE FROM table … WHERE … // deletes entire row

• DELETE column1, … FROM table … WHERE … // only deletes columns

• TRUNCATE table // deletes all rows from table

8

Cassandra: Querying

• SELECT column1, … FROM table

WHERE keyedColumn1 op criterion1 [AND …]

[ORDER BY clusteredColumn1, …]

[LIMIT n]

[ALLOW FILTERING]

• Can use * for all columns

• op can be =, <, >, <=, >=, IN

• Can also SELECT COUNT(*), certain functions, DISTINCT, etc.

9

Cassandra: Query Limitations & Considerations

• No ad hoc querying by design!

• Expected queries drive the schema

• Implies a denormalized schema

• Query criteria must hit keys and possibly indexes

• Otherwise rejected by default

• To prevent query rejection, use ALLOW FILTERING

10

Spring Data Cassandra

11

Spring Data Cassandra: Overview

• Enables plain old Java objects (POJOs) to be mapped to

Cassandra tables

• Familiar Spring Data repository pattern

• Declare Repository interface in terms of your POJOs

• Spring Data Cassandra provides basic implementation

• Supports Spring XML & Java Config

12

Spring Data Cassandra: Basic Support

• Basic CRUD operations out of the box

• Save is an upsert (just like Cassandra's INSERT & UPDATE)

• Delete & find is by primary key(s)

• Primary key classes supported, but unnecessary

• Spring Data expects user-defined primary key class for compound ids

• Compound ids are the norm in C*

• SDC* provides map-based id class to ease pain

• Use BasicMapId's static & builder methods

• Not type-safe, but convenient

• DATACASS-164: Support strongly-typed, user-defined primary key

13

repo.findOne(id().with("sensorId", id).with("time", time));

@Table

public class SensorReading {

@PrimaryKeyColumn(ordinal = 0, type = PARTITIONED)

private String sensorId;

@PrimaryKeyColumn(ordinal = 0)

private Date timestamp;

private String data;

}

Spring Data Cassandra: Entity & Repository

14

Identifies class as persistent

Identifies partition key column

Identifies cluster key column

public interface SensorReadingRepository

extends CassandraRepository<SensorReading> {}

Only need to identify entity

Spring Data Cassandra: Query Support

• No Spring Data-style dynamic querying with findBy* methods!

• Mirrors Cassandra's philosophy of no ad hoc queries

• Custom queries must be supplied

• @Query annotations on repository methods

• Customizable properties file(s)

o Default is "classpath*:META-INF/cassandra-named-queries.properties"

• DATACASS-109: Enhance XML schema to allow for named queries

• Custom queries should use placeholders (currently zero-based numeric)

o DATACASS-117: Add support for named query placeholders

15

Spring Data Cassandra: Custom @Query

16

public interface SensorReadingRepository

extends CassandraRepository<SensorReading> {

@Query("SELECT * FROM sensorreading " +

"WHERE sensorid = ?0 AND timestamp >= ?1 " +

"AND timestamp < ?2")

List<SensorReading> findSensorReadingsInDateRange(String sensorId, Date

beginInclusive, Date endExclusive);

}

Spring Data Cassandra: Custom Query in Properties File

17

# in META-INF/cassandra-named-queries.properties or other

SensorReading.findSensorReadingsInDateRange=\

SELECT * FROM sensorreading \

WHERE sensorid = ?0 AND timestamp >= ?1 AND timestamp < ?2

Spring Data Cassandra: Custom XML Query (DATACASS-109)

18

<cass:entity

class="com.springone2gx.sdc.demo.domain.SensorReading">

<cass:query name="findSensorReadingsInDateRange"

value="SELECT * FROM sensorreading WHERE sensorid = ?0 AND timestamp >= ?1

AND timestamp < ?2" />

<!-- or -->

<cass:query name="findSensorReadingsInDateRange"><![CDATA[

SELECT * FROM sensorreading

WHERE sensorid = ?0 AND timestamp >= ?1 AND timestamp < ?2

]]></cass:query>

</cass:entity>

Order of precedence will probably be properties, XML, then annotations

Spring Data Cassandra: Entity Mapping

• XML overrides annotations

• @Table: corresponds to class

• Table name (optional, default lower cased simple class name)

• Whether to preserve case ("force quote", optional, default false)

19

Spring Data Cassandra: Property (Field) Mapping

• XML overrides annotations

• @Column: corresponds to field

• Column name (optional, default lower cased field name)

• Whether to preserve case ("force quote", optional, default false)

• @PrimaryKeyColumn: same as @Column plus…

• Order of column in table with respect to other columns (required, ordinal)

• Whether it’s a clustered or partition column (optional, default clustered)

• Ordering of column, ascending or descending (optional, default ascending)

20

Spring Data Cassandra Demo https://github.com/SciSpike/springone2gx

21

Spring CQL

22

Spring CQL: Overview

• Spring CQL is to Cassandra what Spring JDBC is to SQL DBs

• A collection of convenient classes to help you interact directly

with Cassandra via CQL and Datastax's Java driver

• Maven artifact:

• spring-cql (not spring-data-cassandra)

• Namespace:

• org.springframework.cassandra (not org.springframework.data.cassandra)

23

Spring CQL v. Spring Data Cassandra

• Spring Data Cassandra is for…

• …mapping POJOs to tables

• …producing repositories easily

• …dynamically creating tables (during dev & testing, at least)

• Spring CQL is for…

• …interacting directly with Cassandra via CQL or Java driver

• Spring Data Cassandra is built on top of Spring CQL

• CassandraTemplate extends CqlTemplate

• AbstractCassandraConfiguration extends AbstractClusterConfiguration

24

Spring CQL: Highlights

• CqlTemplate

• Fluent API for creation/alteration of keyspaces, tables, indexes,

columns

• Java & XML configuration support

• Spring CQL XML schema is similar to but differs from SDC* XML schema!

• Spring DataAccessException translation

25

Spring CQL: CqlTemplate

• Primary artifact is CqlTemplate

• Like JdbcTemplate: takes care of boilerplate code for you

• All it needs is a Session; can be used standalone without Spring context

• Several different types of methods

• READ: query*(..)

• INSERT/UPDATE: execute*(..)

• DELETE: truncate(..)

• Bulk insert: ingest(..)

• Etc: count(..), describeRing(..)

26

Spring CQL: Reading with CqlTemplate

• ResultSet query(String cql)

• Executes CQL then returns ResultSet

• void query(String cql, RowCallbackHandler handler)

• Executes CQL then calls handler.processRow(row) for each row

• List<T> query(String cql, RowMapper<T> mapper)

• Executes CQL, stores each result of mapper.mapRow(row, index) in a List,

then returns the List: "poor man's object mapper"

• T query(String cql, ResultSetExtractor<T> extractor)

• Executes CQL then calls extractor.extractData(resultSet)

27

Spring CQL: More Reading with CqlTemplate

• T queryForObject(String cql, Class<T> requiredType)

• Converts first column of first row to given type & returns

• requiredType is typically String, Integer, Long, etc.

• List<T> queryForList(String cql, Class<T> elementType)

• Converts first column of each row in results to given type & returns

• List<Map<String, Object>> queryForListOfMap(String cql)

• Returns a list of rows

28

Spring CQL: Reading Asynchronously with CqlTemplate

• ResultSetFuture queryAsynchronously(String cql)

• Executes CQL then returns ResultSetFuture

• void queryAsynchronously(String cql, Runnable listener)

• Like Java driver: listener doesn't receive anything except an invocation

• void queryAsynchronously(String cql,

AsynchronousQueryListener listener)

• Better because AsynchronousQueryListener receives ResultSetFuture in

listener.onQueryComplete(future)

• There are many overloads of these query*(..) methods

29

Spring CQL: CqlTemplate Non-Read Queries

• void execute(String cql)

• void execute(Insert insert)

• void execute(Update update)

• void execute(Truncate truncate)

• void execute(Batch batch)

• T execute(SessionCallback<T> callback)

• Returns result of callback.doInSession(session)

• There are also many executeAsynchronously(..) methods

30

Spring CQL: Bulk Inserts

• void ingest(String cql, List<List<?>> rows)

• void ingest(String cql, Object[][] rows)

• These methods take an INSERT or UPDATE statement, create

and cache a PreparedStatement, then execute the

PreparedStatement for each row asynchronously

31

Spring CQL: Fluent API

• Use import static for maximum fluency

32

template.execute(/* new CreateTableCqlGenerator( */

createTable("foo")

.with(COMPACT_STORAGE)

.with(COMMENT, "my comment").ifNotExists()

.partitionKeyColumn("id", timeuuid())

.partitionKeyColumn("thingy", inet())

.clusteredKeyColumn("category", text(), DESCENDING)

.column("data", text())

/* ).toCql() */);

CREATE TABLE IF NOT EXISTS foo (id timeuuid, thingy inet, category text,

data text, PRIMARY KEY ((id, thingy), category)) WITH CLUSTERING ORDER

BY (category DESC) AND COMPACT STORAGE AND comment = 'my

comment';

Spring CQL Demo https://github.com/SciSpike/springone2gx

33

Spring Data Cassandra: Summary

• Spring Data Cassandra…

• …enables POJO persistence over Cassandra datastores

• …provides easy Repository pattern support

• Spring CQL…

• ...is helpful with raw CQL usage

34

Spring Data Cassandra: Coordinates

• Professional Consulting & Support

• http://www.scispike.com

35

• Project Home

• http://projects.spring.io/spring-data-cassandra

• Source Repository

• https://github.com/spring-projects/spring-data-cassandra

• Issue Tracker

• https://jira.spring.io/browse/DATACASS

• Community Support

• http://stackoverflow.com/questions/tagged/spring-data-cassandra

• Contributions Welcome!

• Source & documentation pull requests

• Issues

• Community assistance on StackOverflow

© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Couchbase and Spring Data in the Enterprise

Michael Nitschinger, Software Engineer at Couchbase @daschl

is  a  document  oriented  database,  designed  for  performance,  scalability  and  con5nuous  

up5me.

Enterprise & Internet Company Customers

3

“What benefits does the adoption of Couchbase bring to my company?”

Couchbase 101Architecture

Core Principles

6

JSONJSONJSON

JSONJSON

PERFORMANCE

Easy ScalabilityGrow and shrink clusters easily and with no downtime

Core Principles

7

Consistent High PerformanceSub-millisecond latency and consistent throughput, even under high load

JSONJSONJSON

JSONJSON

PERFORMANCE

Core Principles

8

Always on 24x365No downtime for software upgrades, hardware maintenance, etc

JSONJSONJSON

JSONJSON

PERFORMANCE

Core Principles

9

Flexible Data ModelSchema is dictated by the application, not the database.

JSONJSONJSON

JSONJSON

PERFORMANCE

“Bold claims! How does it work?”

Evolution from memcached

11

• Key  contributors  to  memcached  

• Evolved  into  Membase  

• distributed  and  persisted  key-­‐value  store  

• Evolved  into  Couchbase  

• Document  Store  with  JSON  

• Map-­‐Reduce  Indexing  

• Cross-­‐Data  Center  ReplicaEon

Architecture

12

New  Persistence  Layer

storage  interface

Couchbase  EP  Engine

11210  Memcapable    2.0

Moxi

11211  Memcapable    1.0

Object-­‐level  Cache

Disk  Persistence

8092  Query  API

Que

ry  Engine

HTTP  8091

Erlang  port  mapper  4369

Distributed  Erlang  21100  -­‐  21199

Heartbeat

Process  m

onito

r

Glob

al  singleton  supe

rviso

r

Confi

guraEo

n  manager

on  each  node

Rebalance  orchestrator

Nod

e  he

alth  m

onito

r

one  per  cluster

vBucket  state  and

 replicaE

on  m

anager

hJp

REST  m

anagem

ent  A

PI/W

eb  UI

Erlang/OTP

Server/Cluster  Management  &  CommunicaUon  

(Erlang)

RAM  Cache,  Indexing  &  Persistence  Management  

(C)

Architecture

13

New  Persistence  Layer

storage  interface

Couchbase  EP  Engine

11210  Memcapable    2.0

Moxi

11211  Memcapable    1.0

Object-­‐level  Cache

Disk  Persistence

8092  Query  API

Que

ry  Engine

HTTP  8091

Erlang  port  mapper  4369

Distributed  Erlang  21100  -­‐  21199

Heartbeat

Process  m

onito

r

Glob

al  singleton  supe

rviso

r

Confi

guraEo

n  manager

on  each  node

Rebalance  orchestrator

Nod

e  he

alth  m

onito

r

one  per  cluster

vBucket  state  and

 replicaE

on  m

anager

hJp

REST  m

anagem

ent  A

PI/W

eb  UI

Erlang/OTP

Rebalance

14

Application Servers8 GB RAM

3 IO Workers

8 GB RAM

3 IO Workers TOTAL

16 GB RAM

6 IO Workers

1024

Partitions

1024

Partitions512

Partitions512

Partitions

Rebalance Operation

TOTAL

8 GB RAM

3 IO Workers

1024

Partitions

Rebalance

15

TOTAL

32 GB RAM

12 IO Workers

1024

Partitions

Application Servers

8 GB RAM

3 IO Workers

8 GB RAM

3 IO Workers

8 GB RAM

3 IO Workers

8 GB RAM

3 IO Workers

512

Partitions

512

Partitions256

Partitions

256

Partitions

Rebalance Operation

256

Partitions

256

Partitions

TOTAL

16 GB RAM

6 IO Workers

1024

Partitions

MAP

MAP

MAP

MAP

MAP

MAP MAP

MAP

MAP

MAP

MAP

MAP

MAP

MAP

MAP

MAP

“How does my application access it?”

Couchbase 101Data Access

Retrieval Operations

18

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

Application Server

Replica Couchbase Cluster Machine

get

Storage Operations

19

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

Application Server

Replica Couchbase Cluster Machine

set/add/replace

Consistency

20

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

Application Server

Replica Couchbase Cluster Machine

set/add/replaceget

Ejection and Cache Misses

21

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

Application Server

Replica Couchbase Cluster Machine

FULL  (90%)

NRU Documents Ejected

set/add/replaceset/add/replaceset/add/replaceget

Non-­‐Resident  Document    

SDKs are cluster map aware

22

Application Servers

MAP

MAP

MAP

1024

8 GB RAMPartitions

3 IO Workers

Key Partitioning

23

Application Servers

MAP

MAP

MAP

1024

8 GB RAMPartitions

3 IO Workers

ClientHashFuncUon(“[email protected]”)  =>  ParUUon[0..1023]  {25}  ClusterMap[P(25)]  =>  [x.x.x.x]  =>  IP  of  Server  Responsible  for  ParUUon  25

Couchbase 101SDKs

Supported SDKs

25

• Start  here:  hVp://www.couchbase.com/communiEes/  

• NodeJS,  PHP,  Ruby,  and  Python  clients  are  wrappers  around  libcouchbase  C  library,  so  libcouchbase  must  be  installed  first  

• Community  clients  also  available

Stable SDK

• Current Version: 1.4.4 !

• Asynchronous, cache-like API on top of Futures • Cluster topology aware • Unifies underlying protocol semantics !

• Start here: http://www.couchbase.com/communities/java/getting-started • Docs: http://docs.couchbase.com/couchbase-sdk-java-1.4/index.html

27

Preview: SDK 2.0

• Current Version: 2.0.0-beta !

• Complete rewrite compared to 1.* • Fully asynchronous based on RxJava • Document oriented API & Support for N1QL • First-class support for JVM languages (Scala,..) & Java 8 !

• Start here: http://blog.couchbase.com/couchbase-java-sdk-200-beta-1 • Docs: http://docs.couchbase.com/prebuilt/java-sdk-2.0-beta/topics/

overview.html

28

Spring Data Couchbase

• Current Version: 1.1.4 (part of the release trains) • Depends on the 1.4.4 Java SDK !

• Key Features: • Work with Entities instead of JSON • Template as a lightweight wrapper • Full CrudRepository for easy access • Easy configuration management • JMX & @Cacheable

29

“Talk is cheap. Show me the code.” Linus Torvalds

Demo

We are hiring! couchbase.com/careers

33

Thank you! Questions?

@daschl

[email protected]

github.com/daschl

35