durability simulator design for openstack swift
TRANSCRIPT
Copyright©2014 NTT Corp. All Rights Reserved.
Durability Simulator Design for OpenStack Swift (Interactive Durability Calculation Tools)
Kota Tsuyuzaki [IRC: kota_] [email protected] NTT Software Innovation Center
Copyright(c)2009-2014 NTT CORPORATION. All Rights Reserved.
2 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
• Goal & Benefits
• How to calculate?
• Demo
Outline
Etherpad: https://etherpad.openstack.org/p/kilo-swift-durability-simulator
3 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
Issue
User
I wanna build a durable object storage system by
using OpenStack Swift. I wanna know also the durability
to confirm it will be enough for our SLA.
4 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
Issue
User
Provider A
Provider B
Provider C
Hey, guys. Could you tell me the
Swift system architecture and its
storage durability you support.
OpenStack Providers
5 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
Issue
User
Provider A
Provider B
Provider C
A: 7-9s durability with 3 copies
B: 9-9s durability with 3 copies
C: 11-9s durability with 3 copies
WHAT’S HAPPEN!? WHICH IS CORRECT?
OpenStack Providers
6 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
• Goal
• Building durability calculation tools supported (or recommended) by Swift community
• Enabling to get the calculation result easily from both specs of system component HWs and swift configures. (e.g. # of disks, size of each disk, # of partitions)
• Benefits
• Swift Administrators (almost beginners) can find their own system durability easily
• Enable to standardize the calculation definition among Swift providers
• Swift Users can choose the policy for their use case (Replica? EC? Which # of parities are best for you?)
Goal & Benefits
7 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
How to calculate the durability?
8 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
For Replica Case
9 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
• Calculation Using Markov Model (Markov Process)
• 2 Replica -> k = 1, m = 1 • i.e. Data Lost with 2 Fragments
• 3 Replica -> k = 1, m = 2 • i.e. Data Lost with 3 Fragments
• Reference:
• [1]: "Reliability Mechanisms for Very Large Storage Systems"
• http://www.ssrc.ucsc.edu/Papers/xin-mss03.pdf
How to Calculate EC Durability?
[1]
10 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
• Redundancy Set[1]:
• Definition
• A block group composed of data blocks or object and their associated replicas or parity blocks. A single redundancy set will typically contain 1MB to 1TB, though we expect that redundancy sets will be at least 1GB to minimize bookkeeping overhead and reduce the likelihood that two redundancy sets will be stored on the same set of object storage system.
• Assuming a Reduandancy Set as a Partition
Consideration for Swift’s Partition
Ring Ring
MD5*(URL) = index
partitions
idx Copy 1 Copy 2 Copy 3
0 1 5 7
… … … …
8 3 2 6
Partition table from part to device id.
From [1]
11 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
• Definition: • Absorbing State: The end state in the state transition model.
• P: Transition Probability Matrix
Markov Process (1)
Absorbing State
Temporary State
P=𝑄 𝑈𝑂 𝐼
𝟏 − 𝟐𝝁 𝟐𝝁 𝟎𝒗 𝟏 − (𝝁 + 𝒗) 𝝁𝟎 𝟎 𝟏
Q: Transition Probability Matrix among Temporary State U: Probability Matrix from Temporary State into Absorbing State O: Zero Matrix、I: Identity Matrix
State0 State1 State2
12 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
• Time (t) Limitation of State Transition Matrix (P) shows average # of state transition (M) from initial state to absorbing state
• MTTDL (Time to be absorbing state) calculated from sum of each rows in MN
Markov Process (2)
𝐥𝐢𝐦𝒕→∞
𝑷𝒕=𝟎 𝑴𝑼𝟎 𝑰
M = (I-Q)-1 MTTDLrs = M𝟏⋮𝟏
P=𝟏 − 𝟐𝝁 𝟐𝝁 𝟎
𝒗 𝟏 − (𝝁 + 𝒗) 𝝁𝟎 𝟎 𝟏
𝟏
𝟐𝝁
𝝁 + 𝒗
𝝁𝟐
𝒗
𝝁𝟐
State Transition Matrix for 2 replica
M MTTDLrs 𝟏
𝟐𝝁𝟐𝟑𝝁 + 𝒗𝟐𝝁 + 𝒗
Durability = 1 – N/ MTTDLrs
Probability for Data Lost
Durability
1 - 2𝑵𝝁𝟐
𝟏
𝟑𝝁+𝒗
𝟏
𝟐𝝁+𝒗
13 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
For EC Case
14 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
• Object Size(bytes): n
• # of Sliced Raw Objects: k
• # of Parities: m
• Total # of Fragments: k + m
• Fragment Size(bytes): n / k (+ checksum)
• Total Stored Size (bytes): Fragment Size * (k + m)
Erasure Code Definition
object
Data
fragment
Data
fragment
parity
fragment
parity
fragment
…
… k
m
encode
decode
Terminology Reference: http://specs.openstack.org/openstack/ swift-specs/specs/swift/erasure_coding.html
15 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
• Basic Idea
• Expansion of Durability Calculation for Replica Model
• Calculation Using Markov Model (Markov Process)
• Replica Model based on Markov Process:
• 2 Replica -> k = 1, m = 1 • i.e. Data Lost with 2 Fragments
• 3 Replica -> k = 1, m = 2 • i.e. Data Lost with 3 Fragments
How to Calculate EC Durability?
[1]
※ Markov Process works to calculate the durability with matrix calculation. [3]
16 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
• Algorithms
• State: Status (exists or lost) for All fragments
• Each state is transferred by constant probability
• μ = Disk Failure Rate, v = Fragments Repair Rate
• Each Rate related to # of Fragments
• E.g. RAID related to # of Devices
• Extract States to m + 1 (i.e. data lost)
Durability Calculation Algorithms
0 1 m-1 m … m+1
state transitions for “m” parities EC
D = # of Devices (RAID5) N = k + m (N fragments located in the system)
-Nμ
v
Nμ -(N-1)μ-v
(N-m)μ
mv
(N-(m-1))μ
-(N-(m-1))μ-mv
17 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
Demo
18 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
Demo
19 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
Demo
20 Copyright©2014 NTT Corp. All Rights Reserved.
NTT Confidential
Kota Tsuyuzaki [IRC: kota_] [email protected]
NTT Software Innovation Center
Questions?
Etherpad: https://etherpad.openstack.org/p/kilo-swift-durability-simulator