cassandra db
DESCRIPTION
Cassandra DB. Not Only SQL. Table of Content. Background and history Used Applications What is Cassandra? – Overview Replication & Consistency Writing, Reading, Querying and Sorting API’s & Installation World Database in Cassandra Using Hector API Administration tools. Background. - PowerPoint PPT PresentationTRANSCRIPT
Cassandra DBNot Only SQL
Table of ContentBackground and historyUsed ApplicationsWhat is Cassandra? – OverviewReplication & ConsistencyWriting, Reading, Querying and SortingAPI’s & InstallationWorld Database in CassandraUsing Hector APIAdministration tools
BackgroundInfluential Technologies:
Dynamo – Fully distributed design - infrastructure
BigTable – Sparse data model
Other NoSql databasesNoSql Big Data NoSqlMongoDBNeo4J HyperGraMemcachTokyo CaRedisCouchDB
HypertabCassandraRiakVoldemortHBase
Bigtable / DynamoBigtable DynamoHbaseHypertable
RiakVoldemort
Cassandra Combination of Both
CAP Theorem
ConsistencyAvailabilityPartition Tolerance
ApplicationsFacebookGoogle CodeApacheDiggTwitterRackspaceOthers…
What Is Cassandra?O(1) node lookupKey – Value StoreColumn based data store Highly Distributed – decentralized (no
master\slave)ElasticityDurable, Fault-tolerant - ReplicationsSparseACID NoSQL!
Overview – Data ModelKeyspace
Uppermost namespace Typically one per application
Column Basic unit of storage – Name, Value and timestamp
ColumnFamily Associates records of a similar kind Record-level Atomicity Indexed
SuperColumn Columns whose values are columns Array of columns
SuperColumnFamily ColumnFamily whose values are only SuperColumns
ExamplesColumn - City: ORANJESTAD {"id": 1, "name": "ORANJESTAD", "population": 33000, "capital": true}SuperColumns – Country:Aruba {"id": "aa", "name": "Aruba", "fullName": "Aruba“, "location": "Caribbean, island in the Caribbean Sea, north of Venezuela", "coordinates": { "latitudeType": "N", "latitude": 12.5, "longitudeType": "W", "longitude": 69.96667}, ….
Replication & ConsistencyConsistency Level is based on Replication Factor
(N), nor the number of nodes in the system.The are a few options to set How many replicas
must respond to declare successQuery all replicas on every readEvery Column has a value and a timestamp – latest
timestamp winsRead repair – read one replica and check the
checksum/timestamp to verifyR(number of nodes to read from) + W(number of
nodes to write on) > N (number of nodes)
The Ring - PartitioningEach NODE has a single, unique TOKENEach NODE claims a RANGE of its neighbors
in the ringPartitioning – Map from Key Space to Token –
Can be random or Order PreservingSnitching – Map from Nodes to Physical
Location
WritingNo LocksAppend support without read aheadAtomicity guarantee for a key (in a
ColumnFamily)Always Writable!!!SSTables – Key/data – SSTable file for each
column familyFast
ReadingWait for R responsesWait for N – R responses in the background
and perform read repairRead multiple SSTablesSlower than writes (but still fast)
Compare with MySQL (RDBMS)Compare a 50GB Database:MySQL
~300ms write~350ms read
Cassandra~0.12ms write~15ms read
QueriesSingle columnSlice
Set of names / range of namesSimple slice -> columnsSuper slice -> supercolumns
Key range
SortingSorting is set on writingSorting is set by the type of the
Column/Supercolumn keysSorting/keys Types
BytesUTF8AsciiLexicalUUIDTimeUUID
DrawbacksNo joins (for speed)Not able to sort at query timeNot really supports sql (altough some API’s
support it on a very small portion)
API’sMany API’s for large number of languages
includes C++, Java, Python, PHP, Ruby, Erlang, Haskell, C#, Javascript and more…
Thrift interface – Driver level interface – hard to use.
Hector – a java Cassandra client – simple Column based client – does what Cassandra is intended to do.
Kundera – JPA supported java client – tries to translate JPA classes and attributes to Cassandra – good on inserts, hard and problematic still with queries.
Cassandra InstallationInstall prerequisite – basically the latest java
se releaseExtract the Cassandra Zip files to your
requested pathRun Bin/cassandra.but –fCassandra node is up and running
World database in cassandraWorld - KeyspaceCountries – SuperColumn Family
CountryDetails – SuperColumnBorder – SuperColumnsCoordinates – SuperColumnGDP – SuperColumnLanguage – SuperColumns
Cities – Column Family
Using Hector API - definitionsCreating a Cassandra Cluster :
Adding a keyspace:
Adding a Column:
Cluster cluster = HFactory.getOrCreateCluster("WorldCluster", "localhost:9160");
columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE);
BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition();columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE);columnFamilyDefinition.setName(CITY_CF); // ColumnFamily NamecolumnFamilyDefinition.addColumnDefinition(columnDefinition);
Using Hector API - definitionsAdding a SuperColumn:
Adding all definition to cluster:
BasicColumnFamilyDefinition superCfDefinition = new BasicColumnFamilyDefinition();superCfDefinition.setKeyspaceName(WORLD_KEYSPACE);superCfDefinition.setName(COUNTRY_SUPER);superCfDefinition.setColumnType(ColumnType.SUPER);
ColumnFamilyDefinition cfDefStandard = new ThriftCfDef(columnFamilyDefinition);ColumnFamilyDefinition cfDefSuper = new ThriftCfDef(superCfDefinition); KeyspaceDefinition keyspaceDefinition = HFactory.createKeyspaceDefinition(WORLD_KEYSPACE, "org.apache.cassandra.locator.SimpleStrategy", 1, Arrays.asList(cfDefStandard, cfDefSuper)); cluster.addKeyspace(keyspaceDefinition);
Using Hector API - insertingCreating a Column Template
Adding a Row into a Column Family
ColumnFamilyTemplate<String, String> template = new ThriftColumnFamilyTemplate<String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer);
ColumnFamilyUpdater<String, String> updater = template.createUpdater("a key"); updater.setString(“key", "value"); try { template.update(updater); } catch (HectorException e) { // do something ... }
Using Hector API - insertingCreating a Super Column Template
Adding a Row into a SuperColumn Family
SuperCfTemplate<String,String, String> template = new ThriftSuperCfTemplate<String, String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer, stringSerializer);
SuperCfUpdater<String, String, String> updater = template.createUpdater("a key"); HSuperColumn<String, String, ByteBuffer> superColumn = updater.addSuperColumn(“sc name”);superColumn.setString(“column name”, value);superColumn.update();try { template.update(updater); } catch (HectorException e) { // do something ... }
Using Hector API - readingReading all Rows and it’s columns from a
Column Family (Using CQL)
Reading all columns from a Row in a SuperColumn Family
CqlQuery<String,String,String> cqlQuery = new CqlQuery<String,String,String>(factory.getKeyspaceOperator(), stringSerializer, stringSerializer, stringSerializer); cqlQuery.setQuery("select * from City"); QueryResult<CqlRows<String,String,String>> result = cqlQuery.execute();
SuperCfTemplate<String,String,String> superColumn = HectorFactory.getFactory().getSuperColumnFamilyTemplate(“SuperColumnFamily”);SuperCfResult<String, String, String> superRes = superColumn.querySuperColumns(“key");Collection<String> columnNames = superRes.getSuperColumns();
Using Hector API - readingReading a SuperColumn from a Row in a
SuperColumn Family
Every query as options to get part of the rows – by setting start value and end value (the rows are sorted on inserting), and part of the columns by setting the column names explicitly
SuperColumnQuery<String, String, String, String> query = HFactory.createSuperColumnQuery(keyspaceOperator, stringSerializer, stringSerializer, stringSerializer, stringSerializer);query.setColumnFamily(“SuperColumnFamily”);query.setKey(“key");query.setSuperName(“SuperColumnName");QueryResult<HSuperColumn<String, String, String>> result = query.execute();for (HColumn<String, String> col : result.get().getColumns()) {
String name = col.getName();String value = col.getValue();
}
Administration toolsCassandra – node activatorNodetool – bootstrapping and monitoringCassandra-cli – Application ConsoleSstable2json - ExportJson2sstable - Import