cassandra explained 100922022328 phpapp01

Upload: samrajcse

Post on 07-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    1/50

    Cassandra Explained

    Disruptive Code

    September 22, 2010

    Eric [email protected]

    @jericevanshttp://blog.sym-link.com

    mailto:[email protected]:[email protected]
  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    2/50

    Outline

    Background

    Description

    API(s) Code Samples

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    3/50

    Background

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    4/50

    The Digital Universe

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    5/50

    Consolidation

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    6/50

    Old Guard

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    7/50

    Vertical Scaling Sucks

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    8/50

    Influential Papers

    BigTable

    Strong consistency

    Sparse map data model GFS, Chubby, et al

    Dynamo

    O(1) distributed hash table (DHT) BASE (aka eventual consistency)

    Client tunable consistency/availability

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    9/50

    NoSQL

    HBase

    MongoDB

    Riak Voldemort

    Neo4J

    Cassandra

    Hypertable

    HyperGraphDB

    Memcached Tokyo Cabinet

    Redis

    CouchDB

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    10/50

    NoSQL Big data

    HBase

    MongoDB

    Riak Voldemort

    Neo4J

    Cassandra

    Hypertable

    HyperGraphDB

    Memcached Tokyo Cabinet

    Redis

    CouchDB

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    11/50

    Bigtable / Dynamo

    Bigtable

    HBase

    Hypertable

    Dynamo

    Riak

    Voldemort

    Cassandra ??

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    12/50

    Dynamo-Bigtable Lovechild

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    13/50

    CAP Theorem Pick Two

    CP

    Bigtable

    Hypertable

    HBase

    AP

    Dynamo

    Voldemort

    Cassandra

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    14/50

    CAP Theorem Pick Two

    Consistency Availability

    Partition Tolerance

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    15/50

    Description

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    16/50

    Properties

    Symmetric

    No single point of failure

    Linearly scalable Ease of administration

    Flexible partitioning, replica placement

    Automated provisioning High availability (eventual consistency)

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    17/50

    P2P Routing

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    18/50

    P2P Routing

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    19/50

    Partitioning

    Random

    128bit namespace, (MD5)

    Good distribution

    Order Preserving

    Tokens determine namespace

    Natural order (lexicographical)

    Range / cover queries

    Yours ??

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    20/50

    Replica Placement

    SimpleSnitch

    Default

    N-1 successive nodes

    RackInferringSnitch

    Infers DC/rack from IP

    PropertyFileSnitch

    Configured w/ a properties file

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    21/50

    Bootstrap

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    22/50

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    23/50

    Bootstrap

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    24/50

    Choosing Consistency

    Read

    Level Description

    ZERO N/AANY N/A

    ONE 1 replica

    QUORUM (N / 2) +1

    ALL All replicas

    Write

    Level Description

    ZERO Hail MaryANY 1 replica (HH)

    ONE 1 replica

    QUORUM (N / 2) +1

    ALL All replicas

    R + W > N

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    25/50

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    26/50

    Quorum ((N/2) + 1)

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    27/50

    Data Model

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    28/50

    Overview

    Keyspace

    Uppermost namespace

    Typically one per application

    ColumnFamily Associates records of a similar kind

    Record-level Atomicity

    Indexed Column

    Basic unit of storage

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    29/50

    Sparse Table

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    30/50

    Column

    name

    byte[]

    Queried against (predicates)

    Determines sort order value

    byte[]

    Opaque to Cassandra

    timestamp

    long

    Conflict resolution (Last Write Wins)

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    31/50

    Column Comparators

    Bytes

    UTF8

    TimeUUID

    Long

    LexicalUUID

    Composite (third-party)

    http://github.com/edanuff/CassandraCompositeType

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    32/50

    API

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    33/50

    RPC

    THRIFT

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    34/50

    RPC

    THRIFT

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    35/50

    RPC

    AVRO

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    36/50

    RPC

    AVRO

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    37/50

    Idiomatic Client Libraries

    Pelops, Hector (Java)

    Pycassa (Python)

    Cassandra (Ruby)

    Others

    http://wiki.apache.org/cassandra/ClientOptions

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    38/50

    Code Samples

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    39/50

    Pycassa Python Client API

    # creating a connectionfrom pycassa import connecthosts = ['host1:9160', 'host2:9160']client = connect('Keyspace1', hosts)

    # creating a column family instancefrom pycassa import ColumnFamilycf = ColumnFamily(client, Standard1)

    # reading/writing a columncf.insert('key1', {'name': 'value'})print cf.get('key1')['name']

    1. http://github.com/vomjom/pycassa

    http://github.com/vomjom/pycassahttp://github.com/vomjom/pycassa
  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    40/50

    Address Book Setup

    # conf/cassandra.yamlkeyspaces:- name: AddressBook

    column_families:

    - name: Addressescompare_with: BytesTyperows_cached: 10000keys_cached: 50

    comment: 'No comment'

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    41/50

    Adding an entry

    key = uuid()

    columns = {'first': 'Eric',

    'last': 'Evans','email': '[email protected]','city': 'San Antonio','zip': 78250

    }

    addresses.insert(key, columns)

    mailto:'[email protected]:'[email protected]
  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    42/50

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    43/50

    Indexing (manual)

    # conf/cassandra.yamlkeyspaces:- name: AddressBook

    column_families:

    - name: Addressescompare_with: BytesTyperows_cached: 10000keys_cached: 50

    comment: 'No comment'- name: ByCitycompare_with: UTF8Type

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    44/50

    Updating the index

    key = uuid()

    columns = {'first': 'Eric',

    'last': 'Evans','email': '[email protected]','city': 'San Antonio','zip': 78250

    }

    addresses.insert(key, columns)byCity.insert('San Antonio', {key: ''})

    mailto:'[email protected]:'[email protected]
  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    45/50

    Indexing (auto)

    # conf/cassandra.yamlkeyspaces:- name: AddressBook

    column_families:

    - name: Addressescompare_with: BytesTyperows_cached: 10000keys_cached: 50

    comment: 'No comment' column_metadata:- name: cityindex_type: KEYS

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    46/50

    Querying the Index

    from pycassa.index import create_index_expressionfrom pycassa.index import create_index_clause

    e = create_index_expression('city', 'San Antonio')clause = create_index_clause([e])

    results = address.get_indexed_slices(clause)

    for (key, columns) in results.items(): print %(first)s %(last)s % columns

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    47/50

    Timeseries

    # conf/cassandra.yamlkeyspaces:- name: Sites

    column_families:

    - name: Statscompare_with: LongType

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    48/50

    Logging values

    # time as long (milliseconds since epoch)tstmp = long(time() * 1e6)

    stats.insert('org.apache', {tstmp: value})

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    49/50

    Slicing

    begin = long(start * 1e6)

    stats.get_range('org.apache',column_start=begin)

    end = long((start + 86400) * 1e6)

    stats.get_range(start='org.apache',finish='org.debian',column_start=begin,column_finish=end)

  • 8/6/2019 Cassandra Explained 100922022328 Phpapp01

    50/50

    Questions?