physical data storage stephen dawson-haggerty. data sources smap - data exploration/visualization -...
Post on 15-Jan-2016
219 views
TRANSCRIPT
Physical Data Storage
Stephen Dawson-Haggerty
Data Sources
sMAP
sMAP
sMAP
sMAP
- Data exploration/visualization- Control Loops- Demand response- Analytics- Mobile feedback- Fault detection
Hadoop
HDFS
Applications
StreamFS
Time-Series Databases
• Expected workload• Related work• Server architecture• API• Performance• Future directions
Den
t ci
rcui
t m
eter
sMAP
sMAP
Write Workload
• sMAP Sources– HTTP/REST protocol for exposing physical
information– Data trickles in as its generated– Typical data rates: 1 reading/1-60s
• Bulk imports– Existing databases– Migrations
Read Workload
• Plotting engine• Matlab & python
adaptors for analysis
• Mobile apps• Batch analysis
Dominated by range queries
Latency is important, for interactive data exploration
Page Cache Lock Manager
Key-Value Store
Storage Alloc.
Time-series Interface
Bucketing RPC Compression
read
ingd
b
insert
resample
aggregate
query
stre
amin
g pi
pelin
e
SQL
Storage mapper
MySQL
Time series interface
db_open()
db_query(streamid, start, end) Query points in a range
db_next(streamid, ref), db_prev(...) Query points near a reference time
db_add(streamid, vector) Insert points into the database
db_avail(streamid) Retrieve storage map
db_close()
All data is part of a stream, identified only by streamid
A stream is a series of tuples: (timestamp, sequence, value, min, max)
Storage Manager: BDB
• Berkeley Database: embedded key-value store• Store binary blobs using B+ trees• Very mature: around since 1992, supports
transactions, free-threading, replication• We use version 4
RPC Evolution
• First: shared memory– Low latency
• Move to threaded TCP• Google protocol buffers– zig-zag integer representation, multiple language
bindings– Extensible for multiple versions
On-Disk Format
• All data stores perform poorly with one key per reading– index size is high– unnecessary
• Solution: bucket readings• Excellent locality of reference
with B+ tree intexes– Data sorted by streamid and
timestamp– Range queries translate into
mostly large sequential IOs
bucket
(streamid, timestamp)
• Represent in memory with materialized structure – 32b/rec– Inefficient on disk – lots of
repeated data, missing fields• Solution: compression
– First: delta encode each bucket in protocol buffer
– Second: Huffman Tree or Run Length encoding (zlib)
• Combined compression 2x better than gzip or either one
• 1m rec/second compress/decompress on modest hardware
On-Disk Format
compress
bdb page
...
Other Services: Storage Mapping
• What is in the database?– Compute a set of tuples (start, end, n)
• The desired interpretation is “the data source was alive”
• Different data sources have different ways of maintaining this information and maintaining confidence– Sometimes you have to infer it from the data– Sometime data sources give you liveness/presence guarantees – “I haven’t heard from you in an hour, but I’m still alive!”
dead or alive?
readingdb6
• Up since December supporting Cory Hall, SDH Hall, most other LoCal Deployments– behind www.openbms.org
• > 2 billion points in 10k streams– 12Gb on disk ~= 5b/rec including index– So... we fit in memory!
• Import at around 300k points/sec– We maxed out the NIC
Low Latency RPC
Compression ratios
Write load
Importing old data: 150k points/sec Continuous write load: 300-500pts/sec
Future thoughts
• A component of a cloud storage stack for physical data
• Hadoop adaptor: improve Mapreduce performance over Hbase solution
• The data is small: 2 billion points in 12GB– We can go a long time without distributing this
very much– Probably necessary for reasons other than
performance
THE END