cassandra db

28
Cassandra DB Not Only SQL

Upload: chuck

Post on 25-Feb-2016

79 views

Category:

Documents


1 download

DESCRIPTION

Cassandra DB. Not Only SQL. Table of Content. Background and history Used Applications What is Cassandra? – Overview Replication & Consistency Writing, Reading, Querying and Sorting API’s & Installation World Database in Cassandra Using Hector API Administration tools. Background. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cassandra DB

Cassandra DBNot Only SQL

Page 2: Cassandra DB

Table of ContentBackground and historyUsed ApplicationsWhat is Cassandra? – OverviewReplication & ConsistencyWriting, Reading, Querying and SortingAPI’s & InstallationWorld Database in CassandraUsing Hector APIAdministration tools

Page 3: Cassandra DB

BackgroundInfluential Technologies:

Dynamo – Fully distributed design - infrastructure

BigTable – Sparse data model

Page 4: Cassandra DB

Other NoSql databasesNoSql Big Data NoSqlMongoDBNeo4J HyperGraMemcachTokyo CaRedisCouchDB

HypertabCassandraRiakVoldemortHBase

Page 5: Cassandra DB

Bigtable / DynamoBigtable DynamoHbaseHypertable

RiakVoldemort

Cassandra Combination of Both

Page 6: Cassandra DB

CAP Theorem

ConsistencyAvailabilityPartition Tolerance

Page 7: Cassandra DB

ApplicationsFacebookGoogle CodeApacheDiggTwitterRackspaceOthers…

Page 8: Cassandra DB

What Is Cassandra?O(1) node lookupKey – Value StoreColumn based data store Highly Distributed – decentralized (no

master\slave)ElasticityDurable, Fault-tolerant - ReplicationsSparseACID NoSQL!

Page 9: Cassandra DB

Overview – Data ModelKeyspace

Uppermost namespace Typically one per application

Column Basic unit of storage – Name, Value and timestamp

ColumnFamily Associates records of a similar kind Record-level Atomicity Indexed

SuperColumn Columns whose values are columns Array of columns

SuperColumnFamily ColumnFamily whose values are only SuperColumns

Page 10: Cassandra DB

ExamplesColumn - City: ORANJESTAD {"id": 1, "name": "ORANJESTAD", "population": 33000, "capital": true}SuperColumns – Country:Aruba {"id": "aa", "name": "Aruba", "fullName": "Aruba“, "location": "Caribbean, island in the Caribbean Sea, north of Venezuela", "coordinates": { "latitudeType": "N", "latitude": 12.5, "longitudeType": "W", "longitude": 69.96667}, ….

Page 11: Cassandra DB

Replication & ConsistencyConsistency Level is based on Replication Factor

(N), nor the number of nodes in the system.The are a few options to set How many replicas

must respond to declare successQuery all replicas on every readEvery Column has a value and a timestamp – latest

timestamp winsRead repair – read one replica and check the

checksum/timestamp to verifyR(number of nodes to read from) + W(number of

nodes to write on) > N (number of nodes)

Page 12: Cassandra DB

The Ring - PartitioningEach NODE has a single, unique TOKENEach NODE claims a RANGE of its neighbors

in the ringPartitioning – Map from Key Space to Token –

Can be random or Order PreservingSnitching – Map from Nodes to Physical

Location

Page 13: Cassandra DB

WritingNo LocksAppend support without read aheadAtomicity guarantee for a key (in a

ColumnFamily)Always Writable!!!SSTables – Key/data – SSTable file for each

column familyFast

Page 14: Cassandra DB

ReadingWait for R responsesWait for N – R responses in the background

and perform read repairRead multiple SSTablesSlower than writes (but still fast)

Page 15: Cassandra DB

Compare with MySQL (RDBMS)Compare a 50GB Database:MySQL

~300ms write~350ms read

Cassandra~0.12ms write~15ms read

Page 16: Cassandra DB

QueriesSingle columnSlice

Set of names / range of namesSimple slice -> columnsSuper slice -> supercolumns

Key range

Page 17: Cassandra DB

SortingSorting is set on writingSorting is set by the type of the

Column/Supercolumn keysSorting/keys Types

BytesUTF8AsciiLexicalUUIDTimeUUID

Page 18: Cassandra DB

DrawbacksNo joins (for speed)Not able to sort at query timeNot really supports sql (altough some API’s

support it on a very small portion)

Page 19: Cassandra DB

API’sMany API’s for large number of languages

includes C++, Java, Python, PHP, Ruby, Erlang, Haskell, C#, Javascript and more…

Thrift interface – Driver level interface – hard to use.

Hector – a java Cassandra client – simple Column based client – does what Cassandra is intended to do.

Kundera – JPA supported java client – tries to translate JPA classes and attributes to Cassandra – good on inserts, hard and problematic still with queries.

Page 20: Cassandra DB

Cassandra InstallationInstall prerequisite – basically the latest java

se releaseExtract the Cassandra Zip files to your

requested pathRun Bin/cassandra.but –fCassandra node is up and running

Page 21: Cassandra DB

World database in cassandraWorld - KeyspaceCountries – SuperColumn Family

CountryDetails – SuperColumnBorder – SuperColumnsCoordinates – SuperColumnGDP – SuperColumnLanguage – SuperColumns

Cities – Column Family

Page 22: Cassandra DB

Using Hector API - definitionsCreating a Cassandra Cluster :

Adding a keyspace:

Adding a Column:

Cluster cluster = HFactory.getOrCreateCluster("WorldCluster", "localhost:9160");

columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE);

BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition();columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE);columnFamilyDefinition.setName(CITY_CF); // ColumnFamily NamecolumnFamilyDefinition.addColumnDefinition(columnDefinition);

Page 23: Cassandra DB

Using Hector API - definitionsAdding a SuperColumn:

Adding all definition to cluster:

BasicColumnFamilyDefinition superCfDefinition = new BasicColumnFamilyDefinition();superCfDefinition.setKeyspaceName(WORLD_KEYSPACE);superCfDefinition.setName(COUNTRY_SUPER);superCfDefinition.setColumnType(ColumnType.SUPER);

ColumnFamilyDefinition cfDefStandard = new ThriftCfDef(columnFamilyDefinition);ColumnFamilyDefinition cfDefSuper = new ThriftCfDef(superCfDefinition); KeyspaceDefinition keyspaceDefinition = HFactory.createKeyspaceDefinition(WORLD_KEYSPACE, "org.apache.cassandra.locator.SimpleStrategy", 1, Arrays.asList(cfDefStandard, cfDefSuper)); cluster.addKeyspace(keyspaceDefinition);

Page 24: Cassandra DB

Using Hector API - insertingCreating a Column Template

Adding a Row into a Column Family

ColumnFamilyTemplate<String, String> template = new ThriftColumnFamilyTemplate<String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer);

ColumnFamilyUpdater<String, String> updater = template.createUpdater("a key"); updater.setString(“key", "value"); try { template.update(updater); } catch (HectorException e) { // do something ... }

Page 25: Cassandra DB

Using Hector API - insertingCreating a Super Column Template

Adding a Row into a SuperColumn Family

SuperCfTemplate<String,String, String> template = new ThriftSuperCfTemplate<String, String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer, stringSerializer);

SuperCfUpdater<String, String, String> updater = template.createUpdater("a key"); HSuperColumn<String, String, ByteBuffer> superColumn = updater.addSuperColumn(“sc name”);superColumn.setString(“column name”, value);superColumn.update();try { template.update(updater); } catch (HectorException e) { // do something ... }

Page 26: Cassandra DB

Using Hector API - readingReading all Rows and it’s columns from a

Column Family (Using CQL)

Reading all columns from a Row in a SuperColumn Family

CqlQuery<String,String,String> cqlQuery = new CqlQuery<String,String,String>(factory.getKeyspaceOperator(), stringSerializer, stringSerializer, stringSerializer); cqlQuery.setQuery("select * from City"); QueryResult<CqlRows<String,String,String>> result = cqlQuery.execute();

SuperCfTemplate<String,String,String> superColumn = HectorFactory.getFactory().getSuperColumnFamilyTemplate(“SuperColumnFamily”);SuperCfResult<String, String, String> superRes = superColumn.querySuperColumns(“key");Collection<String> columnNames = superRes.getSuperColumns();

Page 27: Cassandra DB

Using Hector API - readingReading a SuperColumn from a Row in a

SuperColumn Family

Every query as options to get part of the rows – by setting start value and end value (the rows are sorted on inserting), and part of the columns by setting the column names explicitly

SuperColumnQuery<String, String, String, String> query = HFactory.createSuperColumnQuery(keyspaceOperator, stringSerializer, stringSerializer, stringSerializer, stringSerializer);query.setColumnFamily(“SuperColumnFamily”);query.setKey(“key");query.setSuperName(“SuperColumnName");QueryResult<HSuperColumn<String, String, String>> result = query.execute();for (HColumn<String, String> col : result.get().getColumns()) {

String name = col.getName();String value = col.getValue();

}

Page 28: Cassandra DB

Administration toolsCassandra – node activatorNodetool – bootstrapping and monitoringCassandra-cli – Application ConsoleSstable2json - ExportJson2sstable - Import