the hadoop rdbms replace oracle with hadoop john leach cto and co-founder j

15
The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J

Upload: abba

Post on 22-Feb-2016

56 views

Category:

Documents


0 download

DESCRIPTION

The Hadoop RDBMS Replace Oracle with Hadoop John Leach CTO and Co-Founder J. who we are. The Hadoop RDBMS. Standard ANSI SQL Horizontal Scale- Out Real -Time Updates ACID Transactions Powers OLAP and OLTP Seamless BI Integration. Splice Machine Proprietary and Confidential. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

The Hadoop RDBMSReplace Oracle with Hadoop

John Leach CTO and Co-FounderJ

Page 2: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

2

TheHadoopRDBMS

Standard ANSI SQLHorizontal Scale-OutReal-Time Updates

ACID TransactionsPowers OLAP and OLTPSeamless BI Integration

who we are

Splice Machine Proprietary and Confidential

Page 3: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

3

serialization and write pipelining

Serialization GoalsDisk Usage Parity with Data SuppliedPredicate evaluation use byte[] comparisons (sorted)Memory and CPU efficient (fast)Lazy Serialization and Deserialization

Write Pipelining GoalsNon-blocking WritesTransactional AwarenessSmall Network FootprintHandle Failure, Location, and Retry Semantics

Page 4: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

4

Single Column Encoding

All Columns encoded in a single cellseparated by 0x00 byte

Nulls are encoded either as “explicit null” or as an absent fieldCell value prefixed by an Index containing

which fields are present in cellwhether the field is

Scalar (1-9 Bytes) Float (4 Bytes) Double (8 Bytes) Other (1 – N Bytes)

Page 5: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

5

Example Insert

Table Schema: (a int, b string)Insert row (1,’bob’):

All columns packed together1 0x00 ‘bob’

Index prepended{1(s),2(o)}0x00 1 0x00 ‘bob’

Page 6: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

6

Example Insert w/ nulls

Row (1,null)nulls left absent

1

Index prepended (field B is not present){1(s)} 0x00 1

Page 7: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

7

Example: Update

Row already present: {1(s),2(o)}set a = 2

Pack entry2

prepend index (field B is not present){1(s)}0x00 2

Page 8: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

8

Decoding

Indexes are cachedMost data looks like it’s predecessor

Values are read in reverse timestamp orderUpdates before inserts

Seek through bytes for fields of interestOnce a field is populated, ignore all other values for that field.

Page 9: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

9

Example Decoding

Start with (NULL,NULL)2 KeyValues present:

{1(s)}0x00 2{1(s),2(o)} 0x00 1 0x00 ‘bob’

Read first KeyValue, fill field 1Row: (2,NULL)

Read second KeyValue, skip field 1(already filled), fill field 2:

Row: (2,’bob’)

Page 10: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

10

Index Decoding

Index encoded differently depending on number of columns present and type

Uncompressed: 1 bit for present, 2 bits for typeCompressed: Run-length encoded (field 1-3, scalar, 5-8 double…)Sparse: Delta encoded (index,type) pairsSparse compressed: Run-length encoded (index,type) pairs

Page 11: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

11

Write Pipeline

Asynchronous but guaranteed deliveryOperate in Bulk

Row or Size boundedHighly Configurable

Utilizes Cached Region LocationsServer component modeled after Java’s NIO

Attach Handlers for different RDBMS features

Handle retries, failure, and SQL semanticsWrong Region, Region Too Busy, Primary Key Violation, Unique Constraint Violation

Page 12: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

12

Write Pipeline Base Element

Rows are encoded into custom KVPairsall rows for a family and column are grouped together<byte[],byte[]>

Exploded into Put only to write to HBaseTimestamps added on server side

Supports snappy compression

Page 13: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

13

Write Pipeline Client

Tree Based BufferTable -> Region -> N BuffersRows are buffered on client side in memoryN is configurable

When buffer fillsasynchronously write batch to Region

Handles HBase “difficulties” gracefullyWrong Region

Re-bucket

Too BusyAdd delay and possibly back-off

etc.

Page 14: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

14

Write Pipeline Server Side

Coprocessor basedLimited number of concurrent writes to a server

excess write requests are rejectedprevents IPC thread starvation

SQL Based Handlers for parallel writesIndexes, Primary Key Constraints, Unique Constraints

Writes occur in a single WALEdit on each region

Page 15: The  Hadoop  RDBMS Replace Oracle with  Hadoop John Leach CTO and Co-Founder J

15

Interests

Other items we have done or interested in…Burstable Tries Implementation of MemstorePluggable Cost Based Genetic Algorithm for Assignment ManagerColumnar Representations and in-memory processing.Concurrent Bloom Filter (i.e. Thread Safe BitSet)

We are hiringJust Completed $15M Series B [email protected]