cassandra, couchbase and spring data in the enterprise
DESCRIPTION
Speakers: Matthew Adams, SCISpike, Michael Nitschinger, Couchbase Data / Integration Track Spring Data Cassandra brings Cassandra support to the Spring Data umbrella of projects, offering Spring Data's familiar Repository concepts & POJO persistence. This talk will focus first on POJO persistence over Cassandra, including automatic Cassandra schema generation and Spring context configuration using both XML & Java. Then, the talk will dig deeper into some of the lower-level features that Spring Data Cassandra is built upon (AKA "Spring CQL"), which make plain, old Cassandra development simpler & easier. Couchbase Server is well-known as one of the leaders in the NoSQL space, heavily used in enterprises and startups alike where low latency even at hundreds of thousands of operations per seconds matters. This talk will not only give a short introduction about the benefits of adopting Couchbase, but also show how to integrate it into your Java Enterprise landscape through its support for Spring Data. You will learn how to fully control your database schema from the application through entities and repositories, how to deploy new application versions or scale out your cluster without a single second of downtime and also how to integrate easily with elasticsearch.TRANSCRIPT
© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.
Spring Data Cassandra
Matthew Adams Senior Consultant, SciSpike, LLC
Cassandra: History
• Originated from Facebook, now at Apache
• Supported by DataStax
• Born from Amazon Dynamo & Google BigTable
• Distributed • All nodes are peers, no master/slave: ✓
• Always available: ✓
• Partition/fault tolerant: ✓
• Consistent: tunable (any, one, two, three, quorum, all, etc.)
• Provided language bindings/drivers: • Java, C#, Python, Node.js, etc.
4
Cassandra: Strengths & Weaknesses
• Strengths • Storing lots of data
• Very fast writes
• Fast reads
• Time-series data
• Fault tolerant
• Automated replication
• Weaknesses • Limited data model
• Neither/Both • No ad-hoc querying
5
Cassandra: Concepts
• Keyspace • Identified by name
• Contains tables (AKA "column families")
• Spans nodes in racks in data centers
• Table • Identified by name
• Has rows
• Row • Contains columns (up to 2 billion!)
• Can have different number of columns
• Column • Identified by name
• Has data type
6
Cassandra: Primary Key
• Required for each table
• Uniquely identifies row
• Node
• Cluster
• Partition Key
• Determines node
• Has one or more columns
• Cluster Key
• Determines disk location
• Has zero or more
columns
7
Primary Key
Partition Key Cluster Key
Cassandra: Cassandra Query Language (CQL)
• Similar to SQL
• Data definition language (DDL) • CREATE: KEYSPACE, TABLE, INDEX, …
• ALTER: KEYSPACE, TABLE, …
• DROP: KEYSPACE, TABLE, INDEX, …
• Data manipulation language (DML) • INSERT INTO table (column1, …) VALUES (value1, …) …
• UPDATE table SET column1 = value1, … WHERE …
• DELETE FROM table … WHERE … // deletes entire row
• DELETE column1, … FROM table … WHERE … // only deletes columns
• TRUNCATE table // deletes all rows from table
8
Cassandra: Querying
• SELECT column1, … FROM table
WHERE keyedColumn1 op criterion1 [AND …]
[ORDER BY clusteredColumn1, …]
[LIMIT n]
[ALLOW FILTERING]
• Can use * for all columns
• op can be =, <, >, <=, >=, IN
• Can also SELECT COUNT(*), certain functions, DISTINCT, etc.
9
Cassandra: Query Limitations & Considerations
• No ad hoc querying by design!
• Expected queries drive the schema
• Implies a denormalized schema
• Query criteria must hit keys and possibly indexes
• Otherwise rejected by default
• To prevent query rejection, use ALLOW FILTERING
10
Spring Data Cassandra: Overview
• Enables plain old Java objects (POJOs) to be mapped to
Cassandra tables
• Familiar Spring Data repository pattern
• Declare Repository interface in terms of your POJOs
• Spring Data Cassandra provides basic implementation
• Supports Spring XML & Java Config
12
Spring Data Cassandra: Basic Support
• Basic CRUD operations out of the box
• Save is an upsert (just like Cassandra's INSERT & UPDATE)
• Delete & find is by primary key(s)
• Primary key classes supported, but unnecessary
• Spring Data expects user-defined primary key class for compound ids
• Compound ids are the norm in C*
• SDC* provides map-based id class to ease pain
• Use BasicMapId's static & builder methods
• Not type-safe, but convenient
• DATACASS-164: Support strongly-typed, user-defined primary key
13
repo.findOne(id().with("sensorId", id).with("time", time));
@Table
public class SensorReading {
@PrimaryKeyColumn(ordinal = 0, type = PARTITIONED)
private String sensorId;
@PrimaryKeyColumn(ordinal = 0)
private Date timestamp;
private String data;
…
}
Spring Data Cassandra: Entity & Repository
14
Identifies class as persistent
Identifies partition key column
Identifies cluster key column
public interface SensorReadingRepository
extends CassandraRepository<SensorReading> {}
Only need to identify entity
Spring Data Cassandra: Query Support
• No Spring Data-style dynamic querying with findBy* methods!
• Mirrors Cassandra's philosophy of no ad hoc queries
• Custom queries must be supplied
• @Query annotations on repository methods
• Customizable properties file(s)
o Default is "classpath*:META-INF/cassandra-named-queries.properties"
• DATACASS-109: Enhance XML schema to allow for named queries
• Custom queries should use placeholders (currently zero-based numeric)
o DATACASS-117: Add support for named query placeholders
15
Spring Data Cassandra: Custom @Query
16
public interface SensorReadingRepository
extends CassandraRepository<SensorReading> {
@Query("SELECT * FROM sensorreading " +
"WHERE sensorid = ?0 AND timestamp >= ?1 " +
"AND timestamp < ?2")
List<SensorReading> findSensorReadingsInDateRange(String sensorId, Date
beginInclusive, Date endExclusive);
}
Spring Data Cassandra: Custom Query in Properties File
17
# in META-INF/cassandra-named-queries.properties or other
SensorReading.findSensorReadingsInDateRange=\
SELECT * FROM sensorreading \
WHERE sensorid = ?0 AND timestamp >= ?1 AND timestamp < ?2
Spring Data Cassandra: Custom XML Query (DATACASS-109)
18
…
<cass:entity
class="com.springone2gx.sdc.demo.domain.SensorReading">
<cass:query name="findSensorReadingsInDateRange"
value="SELECT * FROM sensorreading WHERE sensorid = ?0 AND timestamp >= ?1
AND timestamp < ?2" />
<!-- or -->
<cass:query name="findSensorReadingsInDateRange"><![CDATA[
SELECT * FROM sensorreading
WHERE sensorid = ?0 AND timestamp >= ?1 AND timestamp < ?2
]]></cass:query>
</cass:entity>
…
Order of precedence will probably be properties, XML, then annotations
Spring Data Cassandra: Entity Mapping
• XML overrides annotations
• @Table: corresponds to class
• Table name (optional, default lower cased simple class name)
• Whether to preserve case ("force quote", optional, default false)
19
Spring Data Cassandra: Property (Field) Mapping
• XML overrides annotations
• @Column: corresponds to field
• Column name (optional, default lower cased field name)
• Whether to preserve case ("force quote", optional, default false)
• @PrimaryKeyColumn: same as @Column plus…
• Order of column in table with respect to other columns (required, ordinal)
• Whether it’s a clustered or partition column (optional, default clustered)
• Ordering of column, ascending or descending (optional, default ascending)
20
Spring CQL: Overview
• Spring CQL is to Cassandra what Spring JDBC is to SQL DBs
• A collection of convenient classes to help you interact directly
with Cassandra via CQL and Datastax's Java driver
• Maven artifact:
• spring-cql (not spring-data-cassandra)
• Namespace:
• org.springframework.cassandra (not org.springframework.data.cassandra)
23
Spring CQL v. Spring Data Cassandra
• Spring Data Cassandra is for…
• …mapping POJOs to tables
• …producing repositories easily
• …dynamically creating tables (during dev & testing, at least)
• Spring CQL is for…
• …interacting directly with Cassandra via CQL or Java driver
• Spring Data Cassandra is built on top of Spring CQL
• CassandraTemplate extends CqlTemplate
• AbstractCassandraConfiguration extends AbstractClusterConfiguration
24
Spring CQL: Highlights
• CqlTemplate
• Fluent API for creation/alteration of keyspaces, tables, indexes,
columns
• Java & XML configuration support
• Spring CQL XML schema is similar to but differs from SDC* XML schema!
• Spring DataAccessException translation
25
Spring CQL: CqlTemplate
• Primary artifact is CqlTemplate
• Like JdbcTemplate: takes care of boilerplate code for you
• All it needs is a Session; can be used standalone without Spring context
• Several different types of methods
• READ: query*(..)
• INSERT/UPDATE: execute*(..)
• DELETE: truncate(..)
• Bulk insert: ingest(..)
• Etc: count(..), describeRing(..)
26
Spring CQL: Reading with CqlTemplate
• ResultSet query(String cql)
• Executes CQL then returns ResultSet
• void query(String cql, RowCallbackHandler handler)
• Executes CQL then calls handler.processRow(row) for each row
• List<T> query(String cql, RowMapper<T> mapper)
• Executes CQL, stores each result of mapper.mapRow(row, index) in a List,
then returns the List: "poor man's object mapper"
• T query(String cql, ResultSetExtractor<T> extractor)
• Executes CQL then calls extractor.extractData(resultSet)
27
Spring CQL: More Reading with CqlTemplate
• T queryForObject(String cql, Class<T> requiredType)
• Converts first column of first row to given type & returns
• requiredType is typically String, Integer, Long, etc.
• List<T> queryForList(String cql, Class<T> elementType)
• Converts first column of each row in results to given type & returns
• List<Map<String, Object>> queryForListOfMap(String cql)
• Returns a list of rows
28
Spring CQL: Reading Asynchronously with CqlTemplate
• ResultSetFuture queryAsynchronously(String cql)
• Executes CQL then returns ResultSetFuture
• void queryAsynchronously(String cql, Runnable listener)
• Like Java driver: listener doesn't receive anything except an invocation
• void queryAsynchronously(String cql,
AsynchronousQueryListener listener)
• Better because AsynchronousQueryListener receives ResultSetFuture in
listener.onQueryComplete(future)
• There are many overloads of these query*(..) methods
29
Spring CQL: CqlTemplate Non-Read Queries
• void execute(String cql)
• void execute(Insert insert)
• void execute(Update update)
• void execute(Truncate truncate)
• void execute(Batch batch)
• T execute(SessionCallback<T> callback)
• Returns result of callback.doInSession(session)
• There are also many executeAsynchronously(..) methods
30
Spring CQL: Bulk Inserts
• void ingest(String cql, List<List<?>> rows)
• void ingest(String cql, Object[][] rows)
• These methods take an INSERT or UPDATE statement, create
and cache a PreparedStatement, then execute the
PreparedStatement for each row asynchronously
31
Spring CQL: Fluent API
• Use import static for maximum fluency
32
template.execute(/* new CreateTableCqlGenerator( */
createTable("foo")
.with(COMPACT_STORAGE)
.with(COMMENT, "my comment").ifNotExists()
.partitionKeyColumn("id", timeuuid())
.partitionKeyColumn("thingy", inet())
.clusteredKeyColumn("category", text(), DESCENDING)
.column("data", text())
/* ).toCql() */);
CREATE TABLE IF NOT EXISTS foo (id timeuuid, thingy inet, category text,
data text, PRIMARY KEY ((id, thingy), category)) WITH CLUSTERING ORDER
BY (category DESC) AND COMPACT STORAGE AND comment = 'my
comment';
Spring Data Cassandra: Summary
• Spring Data Cassandra…
• …enables POJO persistence over Cassandra datastores
• …provides easy Repository pattern support
• Spring CQL…
• ...is helpful with raw CQL usage
34
Spring Data Cassandra: Coordinates
• Professional Consulting & Support
• http://www.scispike.com
35
• Project Home
• http://projects.spring.io/spring-data-cassandra
• Source Repository
• https://github.com/spring-projects/spring-data-cassandra
• Issue Tracker
• https://jira.spring.io/browse/DATACASS
• Community Support
• http://stackoverflow.com/questions/tagged/spring-data-cassandra
• Contributions Welcome!
• Source & documentation pull requests
• Issues
• Community assistance on StackOverflow
© 2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.
Couchbase and Spring Data in the Enterprise
Michael Nitschinger, Software Engineer at Couchbase @daschl
Core Principles
6
JSONJSONJSON
JSONJSON
PERFORMANCE
Easy ScalabilityGrow and shrink clusters easily and with no downtime
Core Principles
7
Consistent High PerformanceSub-millisecond latency and consistent throughput, even under high load
JSONJSONJSON
JSONJSON
PERFORMANCE
Core Principles
8
Always on 24x365No downtime for software upgrades, hardware maintenance, etc
JSONJSONJSON
JSONJSON
PERFORMANCE
Core Principles
9
Flexible Data ModelSchema is dictated by the application, not the database.
JSONJSONJSON
JSONJSON
PERFORMANCE
Evolution from memcached
11
• Key contributors to memcached
• Evolved into Membase
• distributed and persisted key-‐value store
• Evolved into Couchbase
• Document Store with JSON
• Map-‐Reduce Indexing
• Cross-‐Data Center ReplicaEon
Architecture
12
New Persistence Layer
storage interface
Couchbase EP Engine
11210 Memcapable 2.0
Moxi
11211 Memcapable 1.0
Object-‐level Cache
Disk Persistence
8092 Query API
Que
ry Engine
HTTP 8091
Erlang port mapper 4369
Distributed Erlang 21100 -‐ 21199
Heartbeat
Process m
onito
r
Glob
al singleton supe
rviso
r
Confi
guraEo
n manager
on each node
Rebalance orchestrator
Nod
e he
alth m
onito
r
one per cluster
vBucket state and
replicaE
on m
anager
hJp
REST m
anagem
ent A
PI/W
eb UI
Erlang/OTP
Server/Cluster Management & CommunicaUon
(Erlang)
RAM Cache, Indexing & Persistence Management
(C)
Architecture
13
New Persistence Layer
storage interface
Couchbase EP Engine
11210 Memcapable 2.0
Moxi
11211 Memcapable 1.0
Object-‐level Cache
Disk Persistence
8092 Query API
Que
ry Engine
HTTP 8091
Erlang port mapper 4369
Distributed Erlang 21100 -‐ 21199
Heartbeat
Process m
onito
r
Glob
al singleton supe
rviso
r
Confi
guraEo
n manager
on each node
Rebalance orchestrator
Nod
e he
alth m
onito
r
one per cluster
vBucket state and
replicaE
on m
anager
hJp
REST m
anagem
ent A
PI/W
eb UI
Erlang/OTP
Rebalance
14
Application Servers8 GB RAM
3 IO Workers
8 GB RAM
3 IO Workers TOTAL
16 GB RAM
6 IO Workers
1024
Partitions
1024
Partitions512
Partitions512
Partitions
Rebalance Operation
TOTAL
8 GB RAM
3 IO Workers
1024
Partitions
Rebalance
15
TOTAL
32 GB RAM
12 IO Workers
1024
Partitions
Application Servers
8 GB RAM
3 IO Workers
8 GB RAM
3 IO Workers
8 GB RAM
3 IO Workers
8 GB RAM
3 IO Workers
512
Partitions
512
Partitions256
Partitions
256
Partitions
Rebalance Operation
256
Partitions
256
Partitions
TOTAL
16 GB RAM
6 IO Workers
1024
Partitions
MAP
MAP
MAP
MAP
MAP
MAP MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
Retrieval Operations
18
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
Application Server
Replica Couchbase Cluster Machine
get
Storage Operations
19
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
Application Server
Replica Couchbase Cluster Machine
set/add/replace
Consistency
20
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
Application Server
Replica Couchbase Cluster Machine
set/add/replaceget
Ejection and Cache Misses
21
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
Application Server
Replica Couchbase Cluster Machine
FULL (90%)
NRU Documents Ejected
set/add/replaceset/add/replaceset/add/replaceget
Non-‐Resident Document
Key Partitioning
23
Application Servers
MAP
MAP
MAP
1024
8 GB RAMPartitions
3 IO Workers
ClientHashFuncUon(“[email protected]”) => ParUUon[0..1023] {25} ClusterMap[P(25)] => [x.x.x.x] => IP of Server Responsible for ParUUon 25
Supported SDKs
25
• Start here: hVp://www.couchbase.com/communiEes/
• NodeJS, PHP, Ruby, and Python clients are wrappers around libcouchbase C library, so libcouchbase must be installed first
• Community clients also available
Stable SDK
• Current Version: 1.4.4 !
• Asynchronous, cache-like API on top of Futures • Cluster topology aware • Unifies underlying protocol semantics !
• Start here: http://www.couchbase.com/communities/java/getting-started • Docs: http://docs.couchbase.com/couchbase-sdk-java-1.4/index.html
27
Preview: SDK 2.0
• Current Version: 2.0.0-beta !
• Complete rewrite compared to 1.* • Fully asynchronous based on RxJava • Document oriented API & Support for N1QL • First-class support for JVM languages (Scala,..) & Java 8 !
• Start here: http://blog.couchbase.com/couchbase-java-sdk-200-beta-1 • Docs: http://docs.couchbase.com/prebuilt/java-sdk-2.0-beta/topics/
overview.html
28
Spring Data Couchbase
• Current Version: 1.1.4 (part of the release trains) • Depends on the 1.4.4 Java SDK !
• Key Features: • Work with Entities instead of JSON • Template as a lightweight wrapper • Full CrudRepository for easy access • Easy configuration management • JMX & @Cacheable
29