scaling hyperledger fabric using pipelined execution and ... · scaling hyperledger fabric using...

13
Scaling Hyperledger Fabric Using Pipelined Execution and Sparse Peers Parth Thakkar IBM Research, India [email protected] Senthilnathan Natarajan IBM Research, India [email protected] ABSTRACT Permissioned blockchains are becoming popular as data man- agement systems in the enterprise setting. Compared to tra- ditional distributed databases, blockchain platforms provide increased security guarantees but significantly lower perfor- mance. Further, these platforms are quite expensive to run for the low throughput they provide. The following are two ways to improve performance and reduce cost: (1) make the system utilize allocated resources efficiently; (2) allow rapid and dynamic scaling of allocated resources based on load. We explore both of these in this work. We first investigate the reasons for the poor performance and scalability of the dominant permissioned blockchain fla- vor called Execute-Order-Validate (EOV). We do this by studying the scaling characteristics of Hyperledger Fabric, a popular EOV platform, using vertical scaling and horizontal scaling. We find that the transaction throughput scales very poorly with these techniques. At least in the permissioned setting, the real bottleneck is transaction processing, not the consensus protocol. With vertical scaling, the allocated vCPUs go under-utilized. In contrast, with horizontal scaling, the allocated resources get wasted due to redundant work across nodes within an organization. To mitigate the above concerns, we first improve resource efficiency by (a) improving CPU utilization with a pipelined execution of validation & commit phases; (b) avoiding redun- dant work across nodes by introducing a new type of peer node called sparse peer that selectively commits transactions. We further propose a technique that enables the rapid scaling of resources. Our implementation – SmartFabric, built on top of Hyperledger Fabric demonstrates 3× higher throughput, 12-26× faster scale-up time, and provides Fabric’s throughput at 50% to 87% lower cost. PVLDB Reference Format: Parth Thakkar and Senthilnathan Natarajan. Scaling Hyperledger Fabric Using Pipelined Execution and Sparse Peers. PVLDB, 14(1): XXX-XXX, 2020. doi:XX.XX/XXX.XX Currently employed at Microsoft Research, India Committer, Hyperledger Fabric This work is licensed under the Creative Commons BY-NC-ND 4.0 International License. Visit https://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copy of this license. For any use beyond those covered by this license, obtain permission by emailing [email protected]. Copyright is held by the owner/author(s). Publication rights licensed to the VLDB Endowment. Proceedings of the VLDB Endowment, Vol. 14, No. 1 ISSN 2150-8097. doi:XX.XX/XXX.XX 1 INTRODUCTION Blockchain technologies gained popularity as they provide a way to eliminate the intermediary and decentralize the application. A blockchain is a ledger that records transactions and data which are replicated across multiple peer nodes where nodes do not trust one another. Each node holds an identical copy of the ledger as a chain of blocks. Each block is a logical sequence of transactions and encloses the hash of its immediate previous block, thereby guaranteeing the ledger’s immutability. A block is created by executing a consensus protocol among the nodes. For enterprise use-cases that involve interaction between multiple organizations, a permissioned blockchain network is suitable as it supports authenticated participants and privacy. Different organizations own nodes in a permissioned network, and transaction logic gets implemented using smart- contracts. There are two dominant blockchain transaction models: Execute-Order-Validate (EOV) and Order-Execute (OE). Hyperledger Fabric [17] and Quorum [4], which follow the EOV and OE model, respectively, are the two most popular permissioned blockchain platforms, as mentioned in Forbes Blockchain 50 [6]. This paper focuses on the EOV model. Hyperledger Fabric’s performance is of significant concern for enterprises due to steady growth in network usage. For example, to track the provenance of ingredients used in food products, such as protein bars, chocolates, other packaged foods, we need to store lots and lots of records at a high rate. In the current form, permissioned blockchain platforms cannot provide the performance needed by large provenance use-cases and finance industry—stock exchanges, credit card companies, such as Visa [14], and mobile payment platforms, such as AliPay [1] (a peak of 325k tps). Further, with inno- vations in tokens [13, 20] and related applications such as decentralized marketplace [5, 1012], the throughput require- ment will only increase. Hence, it is necessary to improve the performance of Fabric to support ongoing growth proac- tively. Even with the current performance provided by Fabric, nodes get overprovisioned to satisfy a peak load as there ex- ists no mechanism to scale up/down a network. As a result, operational cost increases unnecessarily. Although many efforts [22, 23, 2527, 30, 3436] have pro- posed various optimizations to improve Fabric’s performance, none have comprehensively studied the impact of scaling techniques, such as vertical and horizontal scaling, to identify bottlenecks. Hence, in this paper, we study the efficiency of arXiv:2003.05113v2 [cs.DC] 1 Mar 2021

Upload: others

Post on 21-Oct-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

  • Scaling Hyperledger Fabric Using Pipelined Execution andSparse Peers

    Parth Thakkar∗IBM Research, India

    [email protected]

    Senthilnathan Natarajan†IBM Research, India

    [email protected]

    ABSTRACTPermissioned blockchains are becoming popular as data man-agement systems in the enterprise setting. Compared to tra-ditional distributed databases, blockchain platforms provideincreased security guarantees but significantly lower perfor-mance. Further, these platforms are quite expensive to runfor the low throughput they provide. The following are twoways to improve performance and reduce cost: (1) make thesystem utilize allocated resources efficiently; (2) allow rapidand dynamic scaling of allocated resources based on load. Weexplore both of these in this work.

    We first investigate the reasons for the poor performanceand scalability of the dominant permissioned blockchain fla-vor called Execute-Order-Validate (EOV). We do this bystudying the scaling characteristics of Hyperledger Fabric, apopular EOV platform, using vertical scaling and horizontalscaling. We find that the transaction throughput scales verypoorly with these techniques. At least in the permissionedsetting, the real bottleneck is transaction processing, notthe consensus protocol. With vertical scaling, the allocatedvCPUs go under-utilized. In contrast, with horizontal scaling,the allocated resources get wasted due to redundant workacross nodes within an organization.

    To mitigate the above concerns, we first improve resourceefficiency by (a) improving CPU utilization with a pipelinedexecution of validation & commit phases; (b) avoiding redun-dant work across nodes by introducing a new type of peernode called sparse peer that selectively commits transactions.We further propose a technique that enables the rapid scalingof resources. Our implementation – SmartFabric, built on topof Hyperledger Fabric demonstrates 3× higher throughput,12-26× faster scale-up time, and provides Fabric’s throughputat 50% to 87% lower cost.

    PVLDB Reference Format:Parth Thakkar and Senthilnathan Natarajan. Scaling HyperledgerFabric Using Pipelined Execution and Sparse Peers. PVLDB,14(1): XXX-XXX, 2020.doi:XX.XX/XXX.XX∗Currently employed at Microsoft Research, India†Committer, Hyperledger FabricThis work is licensed under the Creative Commons BY-NC-ND 4.0International License. Visithttps://creativecommons.org/licenses/by-nc-nd/4.0/ to view a copyof this license. For any use beyond those covered by this license,obtain permission by emailing [email protected]. Copyright is held by theowner/author(s). Publication rights licensed to the VLDBEndowment.Proceedings of the VLDB Endowment, Vol. 14, No. 1 ISSN 2150-8097.doi:XX.XX/XXX.XX

    1 INTRODUCTIONBlockchain technologies gained popularity as they providea way to eliminate the intermediary and decentralize theapplication. A blockchain is a ledger that records transactionsand data which are replicated across multiple peer nodeswhere nodes do not trust one another. Each node holds anidentical copy of the ledger as a chain of blocks. Each block isa logical sequence of transactions and encloses the hash of itsimmediate previous block, thereby guaranteeing the ledger’simmutability. A block is created by executing a consensusprotocol among the nodes.

    For enterprise use-cases that involve interaction betweenmultiple organizations, a permissioned blockchain networkis suitable as it supports authenticated participants andprivacy. Different organizations own nodes in a permissionednetwork, and transaction logic gets implemented using smart-contracts. There are two dominant blockchain transactionmodels: Execute-Order-Validate (EOV) and Order-Execute(OE). Hyperledger Fabric [17] and Quorum [4], which followthe EOV and OE model, respectively, are the two most popularpermissioned blockchain platforms, as mentioned in ForbesBlockchain 50 [6]. This paper focuses on the EOV model.

    Hyperledger Fabric’s performance is of significant concernfor enterprises due to steady growth in network usage. Forexample, to track the provenance of ingredients used in foodproducts, such as protein bars, chocolates, other packagedfoods, we need to store lots and lots of records at a highrate. In the current form, permissioned blockchain platformscannot provide the performance needed by large provenanceuse-cases and finance industry—stock exchanges, credit cardcompanies, such as Visa [14], and mobile payment platforms,such as AliPay [1] (a peak of 325k tps). Further, with inno-vations in tokens [13, 20] and related applications such asdecentralized marketplace [5, 10–12], the throughput require-ment will only increase. Hence, it is necessary to improvethe performance of Fabric to support ongoing growth proac-tively. Even with the current performance provided by Fabric,nodes get overprovisioned to satisfy a peak load as there ex-ists no mechanism to scale up/down a network. As a result,operational cost increases unnecessarily.

    Although many efforts [22, 23, 25–27, 30, 34–36] have pro-posed various optimizations to improve Fabric’s performance,none have comprehensively studied the impact of scalingtechniques, such as vertical and horizontal scaling, to identifybottlenecks. Hence, in this paper, we study the efficiency of

    arX

    iv:2

    003.

    0511

    3v2

    [cs

    .DC

    ] 1

    Mar

    202

    1

    https://doi.org/XX.XX/XXX.XXhttps://creativecommons.org/licenses/by-nc-nd/4.0/mailto:[email protected]://doi.org/XX.XX/XXX.XX

  • various scaling techniques and identify bottlenecks. Then, were-architect Fabric transparently to improve performance. Ingeneral, the consensus layer is assumed to be the bottleneck.Though that is valid for the permissionless blockchains suchas Ethereum [19], Bitcoin [31] due to Proof of Work (PoW)consensus, we found that the validation and commit of ablock is the bottleneck in Fabric, not the consensus layer asit uses Raft [32]. Our four major contributions are:(1) We conducted experiments to understand the scalability

    of Fabric using vertical and horizontal scaling. We findthat the transaction throughput scales very poorly withthese techniques. With vertical scaling, the allocatedvCPUs go under-utilized, while with horizontal scaling,the allocated resources get wasted due to redundant workacross nodes within an organization.

    (2) We re-architected Fabric to enable pipelined executionof validation & commit phases without violating theserializability isolation. We also facilitated the validationphase to validate multiple transactions, which belong todifferent blocks, in parallel. We achieve this by introducinga waiting-transactions dependency graph that tracks 7types of dependencies between transactions. As a result,the performance improved by 1.4× while increasing theCPU utilization from 40% to 70%.

    (3) We introduced a new type of node called sparse peer thatselectively commits transactions to avoid the duplicationof CPU & IO intensive tasks. Thus, the performanceimproved by 2.4×. With both sparse peer and pipelinedexecution of phases, the performance improved by 3×.

    (4) We built an auto-scaling framework that can split afull peer into multiple sparse peers or merge multiplesparse peers into a full peer. This helps in scaling upa network quickly to handle an overload situation andreduce transaction invalidation. Our approach reducedthe scale-up time of a network by 12-26× while increasingthe scale-down time slightly.

    The remainder of this paper is structured as follows. §2 pro-vides background on Hyperledger Fabric and motivates ourwork by performing various experiments. §3 and §4 describeour proposed architecture for Fabric. §5 evaluates our pro-posed architecture to showcase the improvement achievedand §6 presents related work.

    2 BACKGROUND AND MOTIVATION2.1 Hyperledger FabricHyperledger Fabric consists of three entities—client, peer,and orderer. The transaction flow in Fabric involves all threeentities and comprises four phases— execution, ordering,validation, and commit, as depicted in [17]. Figure 1 showsvarious peer’s components that involve in the execution of allthese phases except the ordering phase. The communicationbetween different entities happens via gRPC [8].

    Phase 1—Simulation. A client submits a transaction pro-posal to a peer to invoke a smart-contract, which implementsthe transaction logic. The smart-contract executes the logicand reads/writes the states it manages by issuing get(),

    EndorserSmart-Contract AClients

    Smart-Contract B

    Gossip

    OrderingService

    Other Peers

    Block Queue

    Endorsement policy validator

    Serializability validator

    State DB

    Block Store

    History DB

    Components involved in the simulation phase

    Components involved in the validation phase

    Components involved in the commit phase

    Peer Process

    docker containerCommitter

    Worker threads

    Figure 1: Various components in a peer

    put(), and range() commands. Once the transaction execu-tion completes, the endorser cryptographically signs/endorsesthe transaction response (which includes the transaction pro-posal, read-write set, and range query info) before sending itto the client. The read-write set includes all the keys readand written by the contract. The range query info includesthe start and end key passed to the range query. The clientcan submit the transaction proposal to multiple peers si-multaneously depending upon the endorsement policy [29]defined for the smart-contract. Each smart-contract main-tains its blockchain states. One smart-contract can invokeother contracts to read/write their blockchain states.

    Phase 2—Ordering. The client submits the endorsed trans-action response to the ordering service. The ordering service,which consists of orderer nodes from different organizations,employs a consensus protocol [32] to order the received trans-actions and create a block. Each block has a sequence numbercalled block number, the hash of the previous block, the hashof the current block, a list of ordered transactions (i.e., com-mit order), and the orderer’s signature. The orderer nodesbroadcast these blocks to peers.

    Phase 3—Validation. The gossip component of the peer re-ceives blocks and stores them in the block queue. Before com-mitting a block, the peer runs every transaction in the blockthrough the following two validators: (1) The endorsementpolicy validator verifies that the transaction response hasbeen signed by enough organizations as specified in the con-tract’s endorsement policy. If a transaction invoked multiplesmart-contracts, the policy of each of the smart-contractsmust be satisfied. A transaction is marked invalid if it doesnot have adequate endorsements. (2) The serializabilityvalidator applies optimistic concurrency control (OCC) [28]using the read-write set present in the transaction response.To facilitate OCC, Fabric adds a version identifier to eachstate stored in the blockchain. The version of a state is storedas 𝐵, 𝑇 , where 𝐵 denotes the block number, and 𝑇 denotesthe transaction number within the block that last updat-ed/created this state. All the validator checks for is the states,which the transaction has read to decide on its write-set, havenot been modified by any preceding valid transaction. Fur-ther, the validator re-executes range queries present in therange query info to detect any phantom read [18].

    Note that the endorsement policy validator parallelly vali-dates transactions in a block. In contrast, the serializability

    2

  • 1 2 4 8 16vCPUs per peer

    01K2K3K4K

    Thro

    ughp

    ut (

    tps)

    (a) Vertical Scaling(1 peer per org)

    1 2 3 4 5 6Peers per Org

    (b) Horizontal Scaling(16vCPU per peer)

    Smallbank TPSYCSB TPS

    0255075100

    CPU

    Uti

    lizat

    ion

    (%)

    Smallbank CPUYCSB CPU

    Figure 2: Vertical and horizontal scaling

    1 2 4 8 16vCPUs per Peer

    0306090

    120

    Tim

    e (m

    s)

    (a) Vertical Scaling

    1 2 3 4 5 6Peers per Org

    (b) Horizontal ScalingValidation TimeCommit Time

    01K2K3K4K

    Thro

    ughp

    ut (

    tps)

    Observed TPSTheoretical max TPS

    Figure 3: Validation and commit timevalidator serially validates each transaction to take into ac-count the writes of previous valid transactions in the block.

    Phase 4—Commit. After the validation phase, the com-mitter first stores the block in the block store, which is achain of blocks stored in a file system along with the trans-action validity information. Second, the committer appliesthe write set of all valid transactions to the state database,which maintains all active states. Third, it stores metadataof updates of all valid transactions to the history database,which maintains the version history (but not the values) ofboth active and inactive states.

    2.2 Impact of Scaling Techniques on EOVIn this section, we quantify the impact of vertical and horizon-tal scaling on Hyperledger Fabric’s performance. In verticalscaling, we add more compute power to the existing nodes.In contrast, we add more nodes in horizontal scaling.

    The blockchain network topology used for all experimentsconsisted of four organizations, each hosting 𝑁 number of peernodes, ordering service based on Raft consensus protocol [32]with five nodes, and 𝑀 clients to generate load on the network.The value of 𝑀 and 𝑁 were changed depending upon theexperiment. Each node was hosted on a virtual machine with32 GB RAM, 16 vCPUs of Intel Xeon E5-2683 v3 2.00GHz,SSD storage, and 1 Gbps of network bandwidth. As theprimary performance metric, we measured the transactionthroughput, which is the maximum rate at which a peercommits transactions, beyond which the peer’s block queuewould overflow. Note that our results hold for a networkwith a large number of organizations too. This is becausethe bottleneck identified in this section are not related to thenumber of organizations in a network.

    Workload and Configuration. Unless specified otherwise, inthe default configuration, each organization hosted a peer,and all organizations are a member of a single channel host-ing eight smart-contracts. All eight smart-contracts eitherimplement the Smallbank [15] or YCSB[21] benchmark. Thesmallbank benchmark consists of 6 operations on a set of100k accounts. Of the 6, 5 involve both reads and writes,and 1 involves only reads. To generate load, we have chosenan operation and the accounts uniformly. This workload islight on IO as the value’s size was 10 bytes. The YCSB is akey-value store benchmark with 100K keys, and each trans-action reads and writes two keys. This workload is IO heavy

    as the value’s size was 1 KB. The keys were chosen using theZipfian distribution with a skewness factor of 0.5. The blocksize was 100 transactions for all experiments.

    Base Case Performance. Instead of using the vanilla Hy-perledger Fabric v1.4 as the base case, we used an optimizedversion that avoids the redundant block serialization anddeserialization at various phases within a peer using a blockserialization cache, as proposed in FastFabric [26]. Withthe block deserialization cache, the throughput improvedby 1.67×, i.e., from 1040 tps to 1760 tps for the smallbankbenchmark using the default configuration.

    Infrastructure Cost in Public Cloud. From cost estima-tors [2, 3, 7, 9] published by public cloud providers, suchas AWS, Azure, Google Cloud, and IBM Cloud, we identi-fied that the infrastructure cost is a linear function of thenumber of vCPUs allocated. For example, a virtual machine(VM) with 8 vCPUs costs almost 8× of a VM with 1 vCPU.Similarly, two 8-vCPU VMs cost almost the same as one VMwith 16 vCPUs. We use this trend while explaining the costof infrastructure in the rest of the paper.2.2.1 Vertical Scaling. Figure 2(a) plots the peak through-put and CPU utilization over a different number of allocatedvCPU. With an increase in the allocated vCPUs from 1 to16, the throughput of the smallbank benchmark increaseddisproportionately from 320 tps to 1760 tps while CPU uti-lization decreased from 95% to 40%. In other words, whilethe infrastructure cost increased by 16×, the performanceimproved only by 5.5×. YCSB showed similar results. Whenthe number of vCPUs was low, both endorser and validatorcontended for the CPU resources, which led to high CPU uti-lization. With an increase in the number of vCPUs, the CPUcontention reduced, and the endorsement policy validatorutilized multiple vCPUs to validate transactions parallelly,resulting in higher throughput. Moreover, during the com-mit phase (IO heavy) execution, the validation phase (CPUheavy) does not execute, and vice-versa, which results inCPU’s underutilization. This is because the validator cannotvalidate block (𝑖 1) in parallel with the commit of block (𝑖) asthe state updates performed during the commit could makethe validator read stale or incomplete state. As mentioned in§2.1, the serializability validator validates each transactionserially, further leading to low CPU utilization.

    Figure 3(a) plots the time taken by the validation andcommit phases to process a block of size 100 transactionsunder the Smallbank workload. As expected, the validationtime reduced significantly from 86 ms to 19 ms when thenumber of vCPUs increased. The commit time also reducedfrom 55 ms to 36 ms when the number of vCPUs increasedfrom 1 to 4 (due to a reduction in CPU contention betweenendorser and committer). Beyond 4 vCPUs, no reduction wasobserved in the commit time. YCSB shown a similar trend.

    Figure 3 plots the theoretical maximum throughput andthe actual throughput achieved. We define the theoreticalmaximum throughput as |𝐵| × 1000 𝑚𝑠𝑉 𝐶 𝑚𝑠 where |𝐵| is thenumber of transactions in block 𝐵, 𝑉 is the validation timeof block B, and 𝐶 is the commit time of block B. In otherwords, it is equal to the maximum number of transactions that

    3

  • 0

    50

    100

    150

    200

    0 60 120 180

    Blo

    ck Q

    ueu

    e L

    en

    gth

    Elapsed time (secs)

    (a)dynamic #vCPUs (4 to 16)

    peer0 (orgA)peer1 (orgB)

    peer2 (orgC)peer3 (orgD)

    0

    400

    800

    1200

    1600

    0 40 80 120 160Blo

    ck H

    eig

    ht

    Elasped Time (secs)

    (b)ledger growth4 vCPUs to 16

    0k

    5k

    10k

    15k

    20k

    0 300 600 900 1200

    Blo

    ck H

    eig

    ht

    (k=

    10

    3)

    Elapsed Time (secs)

    (c)dynamic #peers

    peer0 (orgA)peer4 (orgA)

    peer5 (orgA)peer6 (orgA)

    Figure 4: Impact of dynamic scaling by vCPUs & peers.

    can be validated and committed in a second. When there wasonly 1 vCPU, the actual throughput was much lower than thetheoretical maximum. This is because the endorser occupiedsome CPU cycles that made the validator/committer wait.

    Takeaway 1. Vertical scaling of peers can help to a degree.However, it is necessary to have pipelined execution of valida-tion and commit phases to utilize the full potential of allocatedvCPUs and get a better return on investment.2.2.2 Horizontal Scaling. Figure 2(b) plots the throughputand CPU utilization over many peers per organization. Com-pared to 1 peer per organization, 4 peers increased thethroughput by only 1.5× (for both workloads) while theinfrastructure cost increased by 4×. This is because all 4peers within each organization validated and committed eachblock, i.e., redundant work within an organization. Withan increase in the number of peers, only the endorsementfor transactions was load-balanced between peers by clients,which also reduced the CPU utilization. As the validation andcommit phases are costlier than the simulation phase due tomultiple signature verifications and synchronous disk IO, thethroughput increment is not significant. Beyond 4 peers perorganization, the throughput did not increase. As load bal-ancing of endorsement requests increased the throughput, wewanted to find the peer’s peak throughput with zero endorse-ment requests. Hence, we did not submit any endorsementrequests for a peer while using other peers to load balance theendorsement requests. We observed that the non-endorsingpeer achieved a peak throughput of 3300 tps. Figures 3(b)plots the validation and commit time for smallbank. Withan increase in the number of peers, the validation time al-most stayed the same. However, the commit time reduceddramatically from 34 ms to 20 ms due to a reduction in bothCPU & IO contention between the endorser and committer.

    Takeaway 2. Horizontal scaling of a Fabric network byadding more peers can help reduce the endorser’s load, i.e.,the simulation phase. Still, it does not help the validationand commit phases much due to redundant work. Further, itdoes not justify the additional infrastructure cost paid for theperformance gain. Redundant work is needed across organi-zations as one organization does not trust another. However,the same does not apply to peers within an organization. It isnecessary to avoid redundant work within an organization toimprove performance and reduce cost.

    2.3 Mitigation of an Overloaded Situation2.3.1 Vertical Scaling. To study the impact of dynamicvCPU scaling on the performance, we overloaded peers ina network by generating more load than it can handle at

    4 vCPUs. We then increased the number of vCPUs to 16,one peer at a time, after the block queue’s length reached200. Figure 4(a) plots the time taken to reduce the blockqueue length from 200 to 0 for each peer across organiza-tions. Though peers scaled immediately, it took around 50secs to 70 secs to reduce the queue length to 0. Figure 4(b)plots the ledger block height over time, while the numberof vCPUs increased from 4 to 16. The block height is noth-ing but the last committed block number. As expected, theledger commit rate increased. Though the queue size became0 within 70 secs and the ledger commit rate increased, therewas a significant impact on the number of failed transactions

    0

    150

    300

    450

    600

    750

    0 60 120 180 240 300

    #fa

    ile

    d t

    ran

    sa

    cti

    on

    s

    Elapsed Time (secs)

    Figure 5: Failedtransactions.

    due to serializability violations (asshown in Figure 5). As more blockswere waiting in the queue to up-date the ledger state, new transac-tions were endorsed using very stalestates. Thus, many transactions ineach block were invalidated later bythe serializability validator due to amismatch in the state’s version present in the read set againstthe committed ledger state. After scaling peers, it took 180seconds to reduce the number of failed transactions to < 2.

    Takeaway 3. Dynamic scaling by vCPU is efficient as it canquickly react to the increased load. However, this approach islimited by the number of available vCPUs.

    2.3.2 Horizontal Scaling. To study the time taken to adda new peer in an existing organization, we ran a peer perorganization and then added a new peer at the 2𝑛𝑑 minute,5𝑡ℎ minute, and 10𝑡ℎ minute. Figure 4(c) plots the time takenby new peers to sync up with the existing peers to process newendorsement requests and new blocks. We observed that thetime taken was proportional to existing peers’ block height asthe new peer had to fetch all old blocks, validate, and committhem one by one. As there were no endorsement requests onthe new peer during the sync up, it could catch up with theexisting peer by committing transactions at a rate of 3300tps (while with endorsement, it was only 1760 tps).

    Takeaway 4. With the dynamic scaling by peers, the newpeer took a significant amount of time to sync up with exist-ing peers to become available for handling the load. This isbecause the new peer validated and committed all blocks, i.e.,redundant work.

    3 DESIGN OF PIPELINED EXECUTIONAs mentioned in takeaway 1 in §2.2, we need to increase theCPU utilization. We can do that by making the validator(i.e., CPU heavy) validate block (𝑖 1) without waiting forthe committer (i.e., IO heavy) to commit block 𝑖. Besides,we can validate transactions in parallel (even across blocks).For the former, the challenge is to ensure that the validatoris not reading any stale state. For the latter, the challengeis to ensure that the serializable schedule is equivalent tothe transactions ordering in blocks. Towards achieving this,we propose a new architecture shown in Figure 6. The twofundamental ideas of the proposed architecture are as follows:

    4

  • Serializability validator

    Endorsement policy validator

    State DB Block#

    Transaction#

    Validation Result

    1 1 ✓1 2 X

    … … …

    5 1 ✓

    Committer

    Block Queue Waiting Transaction Dependency Graph

    Transaction of

    Validation Manager

    rw dependency

    ep-rw dependency

    Block Store

    History DB

    1

    1

    Result Map Manager

    2

    3

    Step 1

    2

    3

    2 3

    Step 1 2 3runs in parallel

    Commit Manager

    4

    4

    Block 2, 6, and 7 respectively

    ….kv ver kv ver

    Dirty State Trie

    Transaction Dependency Extractor

    Validator

    &

    wr dependencyww dependency

    ep-wr dependency

    ….kv kv

    pr dependency

    Write Set Trie

    Figure 6: Proposed architecture to enable pipelined execution ofvalidation and commit phases.

    (1) Exploit the read-write set and range query info present ineach transaction to construct a dependency graph. Thisgraph helps to choose transactions for parallel validationwithout violating serializability.

    (2) Maintain a dirty state buffer to keep all uncommittedwrites of valid transactions. This buffer helps avoid read-ing a stale state while pipelining validation and commitphases. Thus, the validator can validate a future blockwithout waiting for past blocks to get committed.

    3.1 Validation PhaseThe proposed validation phase consists of three components:(1) transaction dependency extractor, (2) validator, and (3)a result map manager, as shown in Figure 6. In step ①, theextractor reads a block from the queue and updates thedependency graph. In step ②, each free validator workerperforms the following: (i) retrieve a transaction from thegraph that has no out-edges & validates it; (ii) if validationpasses, apply the write-set to dirty state; (iii) remove thetransaction from the graph. In step ③, the worker adds thevalidation result to the result map before moving to step ②.3.1.1 Transaction dependency extractor. The extractor addseach transaction in a block to the waiting-transactions de-pendency graph. This graph contains a node per transaction.We interchangeably use node and transaction. Let us assume𝑇𝑖 and 𝑇𝑗 are two transactions such that 𝑇𝑖 appeared before𝑇𝑗 (𝑇𝑗 can be in the same block or any later block). Anedge from 𝑇𝑗 to 𝑇𝑖 (i.e., 𝑇𝑖 ← 𝑇𝑗) denotes that 𝑇𝑖 must bevalidated before validating 𝑇𝑗 . Note that a dependency edgeis always from 𝑇𝑗 to 𝑇𝑖 because 𝑇𝑖 appeared earlier than 𝑇𝑗 .This ensures that the serializable schedule is the same as thetransaction order present in the block. The following are theseven dependencies by which 𝑇𝑗 can depend upon 𝑇𝑖:

    (1) read-write (𝑇𝑖𝑟𝑤(𝑘)←−−−− 𝑇𝑗): 𝑇𝑖 writes a new value to the

    state 𝑘, and 𝑇𝑗 reads a previous version of that state;(2) write-read (𝑇𝑖

    𝑤𝑟(𝑘)←−−−− 𝑇𝑗): 𝑇𝑗 writes a new value to the

    state 𝑘, and 𝑇𝑖 reads a previous version of that state;(3) write-write (𝑇𝑖

    𝑤𝑤(𝑘)←−−−−− 𝑇𝑗): Both 𝑇𝑖 and 𝑇𝑗 write to the

    same state 𝑘;

    (4) phantom-read (𝑇𝑖𝑝𝑟(𝑘)←−−−− 𝑇𝑗): 𝑇𝑗 performs a range query,

    and 𝑇𝑖 creates a new or deletes an existing state 𝑘 thatwould match that range query;

    (5) endorsement-policy read-write (𝑇𝑖𝑒𝑝−𝑟𝑤(𝑠)←−−−−−−− 𝑇𝑗): 𝑇𝑖 up-

    dates the endorsement policy of a smart-contract 𝑠, and𝑇𝑗 invokes 𝑠 using a previous version of the policy;

    (6) endorsement-policy write-read (𝑇𝑖𝑒𝑝−𝑤𝑟(𝑠)←−−−−−−− 𝑇𝑗): 𝑇𝑗 up-

    dates the endorsement policy of a smart-contract 𝑠, and𝑇𝑖 invokes 𝑠 using a previous version of the policy;

    (7) endorsement-policy write-write (𝑇𝑖𝑒𝑝−𝑤𝑤(𝑠)←−−−−−−− 𝑇𝑗): Both

    𝑇𝑖 and 𝑇𝑗 update the endorsement policy of the samesmart-contract 𝑠;

    No cycle is possible in the graph as an edge is always from anew transaction to an old one.

    Fate dependencies: While all seven dependencies are nec-essary to choose a set of transactions for parallel validation,the following three dependencies also decide the validity ofdependent transactions:

    (1) read-write (rw) dependency(2) phantom-read (pr) dependency(3) endorsement policy read-write (ep-rw) dependency

    Hence, these dependencies are called fate dependencies.When there is a fate dependency from 𝑇𝑗 to 𝑇𝑖, i.e.,

    𝑇𝑖𝑟𝑤/𝑝𝑟/𝑒𝑝−𝑟𝑤←−−−−−−−−−− 𝑇𝑗 , and if 𝑇𝑖 turns out to be valid, 𝑇𝑗

    must be invalid. For instance, if 𝑇𝑖 modified a state 𝑘 so thatits new version is 10, but 𝑇𝑗 had read 𝑘 at version 9 (rwdependency), 𝑇𝑖 being valid implies that 𝑇𝑗 is invalid. For anyother dependencies, 𝑇𝑗 can be valid or invalid irrespectiveof the validation result of 𝑇𝑖. For instance, if 𝑇𝑖 read 𝑘 atversion 9 and 𝑇𝑗 modified it to version 10 (wr dependency),the dependency only ensures that 𝑇𝑗 does not modify thestate before 𝑇𝑖 is validated. However, both can be valid orinvalid independently.

    Detecting dependencies: To track dependencies, we main-tain a map from state 𝑘 to (𝑇 𝑘𝑅, 𝑇

    𝑘𝑊 ), where 𝑇

    𝑘𝑅 and 𝑇

    𝑘𝑊 are

    transactions in the graph that respectively read and writeto 𝑘. When adding a new transaction, we lookup 𝑇 𝑘𝑅, 𝑇

    𝑘𝑊

    for every state 𝑘 it reads or writes and add appropriate de-pendencies. When adding a transaction 𝑇𝑗 that performs arange query, we need to find writers of every state in thatrange to detect 𝑝𝑟 dependencies. Simply iterating throughthe range and looking up every state in the map would notbe feasible because the ranges could be large or unbounded.Therefore, we store the keys internally as an ordered tree(e.g., a Trie) to perform both point and range queries quickly.Thus, by running 𝑇𝑗 ’s range query on this tree, we can effi-ciently find all states in the range that are being modified bysome waiting transactions. We can then add pr edges from𝑇𝑗 to writers of those states.

    Illustration of dependency extraction. Let us assume atthe start of a peer, blocks 𝐵1 and 𝐵2, listed in Table 1, arewaiting in the queue. Note that an entry in the write-set alsoincludes the new value, but we do not present it for brevity.𝑣1 → 𝑣2 denotes that the version is being incremented. The

    5

  • Table 1: Read-write sets of transactions.Block Transaction Read Set (key, version) Write Set

    𝐵1

    𝑇1 (𝑘6, 𝑣1) (𝑘1, 𝑣1 → 𝑣2)𝑇2 (𝑘1, 𝑣1) (𝑘6, 𝑣1 → 𝑣2)𝑇3 - (𝑘6, 𝑣1 → 𝑣2)𝑇4 - (𝑘5, 𝑣1)

    𝐵2𝑇5 range(𝑘2, 𝑘5) (𝑘7, 𝑣1 → 𝑣2)𝑇6 (𝑘4, 𝑣1) (𝑘4, 𝑣1 → 𝑣2)

    Figure 7: Dependency graph of transactions

    Table 2: Validation process of transactionsRow Workers Dirty State State DB Results Dependency Graph

    𝑤1 𝑤2 𝑤3 Valid Invalid1 𝑇1 𝑇4 - - (𝑘1, 𝑣1), (𝑘4, 𝑣1), (𝑘6, 𝑣1) - - Same as above2 𝑇3 𝑇6 - (𝑘1, 𝑣2), (𝑘5, 𝑣1) (𝑘1, 𝑣1), (𝑘4, 𝑣1), (𝑘6, 𝑣1) 𝑇1, 𝑇4 𝑇2, 𝑇5 Only 𝑇3 & 𝑇6, no edges3 - - - (𝑘1, 𝑣2), (𝑘5, 𝑣1)

    (𝑘6, 𝑣2), (𝑘4, 𝑣2)(𝑘1, 𝑣1), (𝑘4, 𝑣1), (𝑘6, 𝑣1) 𝑇1, 𝑇4

    𝑇3, 𝑇6𝑇2, 𝑇5 Empty

    Committer commits the block 𝐵14 - - - (𝑘4, 𝑣2) (𝑘1, 𝑣2), (𝑘4, 𝑣1) (𝑘5, 𝑣1), (𝑘6, 𝑣2) 𝑇6 𝑇5 -

    Committer commits the block 𝐵25 - - - - (𝑘1, 𝑣2), (𝑘4, 𝑣2) (𝑘5, 𝑣1), (𝑘6, 𝑣2) - - -

    dependency extractor would fetch block 𝐵1 to start con-structing the dependency graph. Transaction 𝑇1 gets addedas the first node in the graph. 𝑇2 then gets added with twodependencies on 𝑇1: (a) rw dependency because 𝑇2 readsstate 𝑘1 while 𝑇1 updates it and (b) wr dependency because𝑇2 updates 𝑘6 but 𝑇1 reads it. Similarly, a ww dependencygets added from 𝑇3 to 𝑇2, as shown in Figure 7. The fourthtransaction in block 𝐵1 has no dependency on other trans-actions in the same block. Once the dependency extractoradds all transactions in block 𝐵1 to the graph, it would fetchblock 𝐵2 and add each transaction to the graph. Transaction𝑇5 has a phantom read dependency on 𝑇4 because the formerqueries range 𝑘2, 𝑘5 while the latter updates 𝑘5 that belongsto the range. Transaction 𝑇6 has a wr dependency on 𝑇5because it updates 𝑘4 while 𝑇5 reads the previous version ofthe same key using a range query.

    Operations exposed to other components: The extrac-tor exposes the following two operations to enableother components to access the dependency graph: (1)GetNextTransaction() → 𝑇𝑖: Returns a reference to the oldesttransaction 𝑇𝑖 in the graph that does not depend on anyother transactions (but other transactions can depend onit). (2) RemoveFromGraph(𝑇𝑖, isValid): Removes 𝑇𝑖 from thedependency graph. If isValid is true, i.e., 𝑇𝑖 is valid, it marksall transactions depending on 𝑇𝑖 via a fate dependency asinvalid and removes them from the graph.3.1.2 Validators. The logic of endorsement policy validatorand serializability validator remains the same as the onediscussed in §2.1 except the following modifications:(1) Validators can validate later blocks without waiting for

    the committer to commit earlier blocks.(2) Validators read from dirty state buffer before resorting

    to state DB to avoid reading stale states.(3) Multiple workers to validate transactions in parallel.

    Each free worker calls GetNextTransaction() to get the nexttransaction to be processed. First, the endorsement policyvalidator checks whether the policy is satisfied. On suc-cess, the same worker executes the serializability check using

    OCC [28]. If the transaction passes both validations, theworker first applies the write-set to the dirty state and thencalls RemoveFromGraph() to update the dependency graph. Fi-nally, the worker adds the validation result to the result map.To ensure that the validator does not read a stale state whenit validates block j before committing block i (where j > i),a read request (such as a read of the endorsement policy orthe version of a state) would first go to the dirty state. Onlyon a miss, the read would reach the state DB. We store thedirty state as a trie, with the state’s key stored along a pathand its value & version stored in a leaf. This trie structureenables the serializability validator to validate range queriespresent in range query info for a phantom read. Every rangequery is executed on both the trie and state DB.

    Illustration of Validators. Let us assume we have threeworkers, 𝑤1, 𝑤2, and 𝑤3, to validate transactions, as shownin Table 2 (only the first 3 rows are relevant for this illustra-tion). Workers 𝑤1 and 𝑤2 would get transaction 𝑇1 and 𝑇4,respectively, by calling GetNextTransaction(). As every othertransaction in the dependency graph has an out-edge, the𝑤3 would stay idle. Suppose the state database initially hasthree states, as shown in the first row of Table 2. For simplic-ity, let us further assume workers 𝑤1 and 𝑤2 would alwaysfind endorsement policy to be satisfied. In other words, theendorsement policy validator would always return success.

    Once the endorsement policy check succeeds, the workerwould perform the serializability check in which the state’sversion present in the read-set of the transaction should matchthe respective state’s version present in the dirty state orstate DB. As per OCC, if there is a mismatch, the transactionwould be marked invalid. As workers 𝑤1 and 𝑤2 find therespective transaction to pass the serializability check, bothtransactions would be marked as valid. Each worker thenapplies its transaction’s write-set to the dirty state (refer tothe second row in Table 2). Finally, the workers would callRemoveFromGraph() to update the dependency graph.

    Given that transaction 𝑇2 has a fate dependency on trans-action 𝑇1, 𝑇2 would be marked invalid because 𝑇1 is valid.

    6

  • Similarly, since transaction 𝑇5 has a fate dependency ontransaction 𝑇4, 𝑇5 would be marked invalid because 𝑇4 isvalid. Thus, the dependency graph would have only twotransactions (𝑇3 and 𝑇6) left, and they would have no edgesbetween them. Next, the worker 𝑤1 would pick 𝑇3, and 𝑤2would pick 𝑇6. As per OCC, both the transactions would bevalid, and the dirty state gets updated with their write sets.On calling RemoveFromGraph(), the dependency graph wouldbe left empty. Note that only after the committer commits ablock would the entries in the dirty state be removed.3.1.3 Result map manager. This manages a map from trans-action identifier to its validation result. It exposes the fol-lowing two operations: (1) AddValidationResult (𝑇𝑖, isValid)adds the validation result of a transaction to the map—calledby both extractor and validator. (2) PopValidationResult(𝑇𝑖)returns the validation result associated with a given transac-tion after deleting it—called by the committer.

    3.2 Commit PhaseThe logic of the commit phase does not change significantlyas compared to the vanilla Fabric discussed in §2.1. As shownin Figure 6, in step ❶, whenever the committer becomesfree, it reads a block from the queue and retrieves the list oftransactions. In step ❷, the committer fetches the validationresults by calling PopValidationResult() for each transaction.If the validation result is not available for a transaction, thecall would be blocked until the result is available. Once thecommitter collects the validation result of all transactions, instep ❸, it stores the block in the block store and applies thevalid write-sets to state & history DB as in vanilla Fabric.In step ❹, it calls the validation manager to remove thedirty state associated with the just committed block as thevalidator can read those states from the state DB itself.

    Illustration of Committer. Let us continue with the sce-nario listed in Table 2. In parallel with the validation man-ager, the committer would fetch block 𝐵1 and wait for thevalidation result of 𝑇1, 𝑇2, 𝑇3, and 𝑇4. Once these validationresults are available, the committer would commit 𝐵1, asshown in Table 2. Then, the dirty state and results associatedwith the block would be deleted. Next, the committer wouldfetch block 𝐵2 and immediately find validation results forall transactions. Hence, the committer would commit it andupdate the dirty state and result list.

    3.3 Proof of correctnessThe ordering service predetermines the commit order of trans-actions. Therefore, we have to ensure that our approachvalidates transactions such that it is equivalent to the prede-cided serial schedule. Let us say the order of transactions is𝑇1, 𝑇2, · · · , 𝑇𝑛 in a block. To validate a transaction, say 𝑇5,the validator checks that all states read by 𝑇5 have not beenmodified by any previous transactions (present in the same orprevious blocks). If the last transaction that updated any ofthose states was 𝑇1, we do not need to wait for the validationresult of any transaction between them to validate 𝑇5. Weonly need to ensure that if 𝑇1 is valid, the validator sees𝑇1’s updates while validating 𝑇5. This requirement is easily

    met by rw-dependencies. Since 𝑇5 has an rw-dependencyon 𝑇1, the validator workers will never get 𝑇5 before 𝑇1 isremoved from the graph. Once a validator worker receives𝑇1 via GetNextTransaction(), it first validates 𝑇1. If valid,it applies 𝑇1’s updates to the dirty state. Only then does itcall RemoveFromGraph(𝑇1). Note that to process any transac-tion, all transactions it depends upon must be removed fromthe graph. Thus, when a worker receives 𝑇5, 𝑇1’s updateswould be either in the dirty state (if 𝑇1 is valid but not yetcommitted) or in State DB (if 𝑇1 is valid and committed).Thus, the worker can correctly validate 𝑇5. If 𝑇1 is valid, 𝑇5would be invalid. Instead of wasting the validator resources,we term rw dependency as fate dependency and immediatelyinvalidate the dependent transaction if the dependee is valid.

    However, rw-dependencies are not sufficient. Considertransactions 𝑇𝑖 and 𝑇𝑗 (𝑖 < 𝑗) in block 𝐵1 that both writeto the same state. Without ww-dependency, 𝑇𝑖 updates thedirty state after 𝑇𝑗 , overwriting 𝑇𝑗 ’s update in the dirtystate. Now, let’s say that before 𝐵1 is committed, this peerstarted validating the next block having a transaction 𝑇𝑘that has read the state written by 𝑇𝑗 . This could happen if𝑇𝑘 was endorsed on a peer where 𝐵1 was already committed(e.g., on a much faster peer, or if this peer has just joinedthe network). Now, the validator would wrongly mark 𝑇𝑘as invalid because it would see 𝑇𝑖’s write in the dirty statewhile expecting 𝑇𝑗 ’s write. To avoid such anomalies, we trackww and wr and ensure that updates to a state are appliedin the correct order. The same argument can be extended toep-rw, ep-wr, and ep-ww dependencies. Phantom read is justa particular case of rw-dependency.

    4 DESIGN OF SPARSE PEERFrom takeaway 2 in §2.2, we know that multiple peers in anorganization do redundant work, limiting horizontal scalingefficiency. To avoid redundancy, we propose a new peer typecalled a sparse peer. A sparse peer may not validate andcommit all transactions within a block compared to a fullpeer in vanilla Fabric. The sharding concept in a distributeddatabase inspires the concept of a sparse peer.

    4.1 Sparseness in Validation and CommitThe key idea behind a sparse peer is that it can selectivelyvalidate and commit transactions. If all sparse peers withinan organization select a non-overlapping set of transactions,we can avoid redundant work. Towards achieving this, first,we define a deterministic selection logic such that each sparsepeer selects a different set of transactions. Second, we changethe validator and committer to apply the selection logic onthe received block. Third, as an optional feature, we makethe peer pass the selection logic to the orderer such thatthe orderer itself can apply the filter and send only requiredtransactions in a sparse block. Thus, both network bandwidthutilization and disk IO would reduce.

    (1) Transaction selection filter: Each sparse peer owns afilter and applies it on a received block to identify whichtransactions to consider. The filter is simply a list of smart-contracts. The admin assigns/updates each peer’s filter by

    7

  • Block Hash

    Hash of S1 Hash of S2 Hash of S3 Hash of S1 S2

    hash hash

    hash link hash link

    Transaction invoking both S1 S2

    Transaction invoking only S1

    hash(T1, T3, T5, T7) hash(T10) hash(T2, T4, T6) hash(T8, T11, T12 )

    Si -- Denotes a Smart-ContractTj -- Denotes a Transaction

    Hash of all T-IDs

    Next BlockPrevious Block

    Figure 8: Merkle tree based block hash computation.

    issuing a request via gRPC to the peer process. The sparsepeer only validates and commits transactions in a blockthat invoked a smart-contract specified in the filter. Whena transaction had invoked multiple smart-contracts, evenif the filter contains only one of those smart-contracts, thetransaction would be considered by the sparse peer.

    (2) Validation and commit based on a filter. The depen-dency extractor only considers transactions that invoked asmart-contract present in the filter. When the committerreads the block, it marks transactions that did not invokeany smart-contract present in the filter as not validated butstores the whole block in the block store. The rest of thevalidator and committer logic remains the same. However,the receival and storage of full block would not reduce net-work bandwidth utilization and disk IO. Since disk IO is inthe critical path of peers, this limits the maximum possiblethroughput. As an optional feature, next, we allow sparsepeers to pass their filter directly to the orderer.

    (3) Block dissemination based on filters. If orderers them-selves apply the filter and send only appropriate transactionsvia a sparse block to each sparse peer, we can save both net-work bandwidth utilization and disk IO. Hence, each sparsepeer sends its filter to an orderer to which it has connected.For each block, the orderer applies the filter and sends onlythe required transactions to the peer. However, this createsa problem with the hash chain and its verification. In vanillaFabric, the orderer computes a block hash at block creationand stores it in the block. The block hash is computed usingall transactions’ bytes within that block and the hash presentin the previous block. When a peer receives a block, it cancheck its integrity by verifying the hash chain. Further, thishash chain is the source of truth of a blockchain network. Ifwe make the orderer send only a sub-set of transactions in ablock, the peer cannot verify the hash chain integrity.

    Sparse block. To fix this problem, we propose a sparse blockthat includes (1) a Merkle tree to represent the block hash (asshown in Figure 8); (2) only a sub-set of transactions afterapplying the filter; (3) applied filter; (4) transaction identifier(T-ID) of each transaction. In our Raft-based consensus, theleader node constructs the Merkle tree while followers applythe filter before sending the block to its connected peers.When a sparse peer receives a sparse block, it can verify thehash chain integrity by verifying the Merkle tree root hashusing a sub-set of transactions. A list of transaction identifiersis sent with a sparse block to enable the validator to checkfor duplicates and mark them invalid. In vanilla Fabric, theorderer does not peek into the transaction. We break that asthe orderer needs to find transactions associated with each

    smart-contract and find transactions that invoke multiplesmart-contracts. Since the orderers already have access tothe entire transaction, even in a vanilla Fabric, a motivatedparty could always peak into the transactions. Hence, ourapproach does not weaken the trust model.

    4.2 Distributed SimulationIn vanilla Fabric, a smart-contract can invoke another con-tract hosted on the same channel on the same peer. Withsparse peers, smart-contracts would be placed on differentpeers. Hence, we enable distributed simulation in which asmart-contract hosted on one peer can invoke a contracthosted on another by making a gRPC request. The filter DBholds each sparse peer ’s filter and is used to decide which peerto contact for a given contract. As the distributed simulationhappens over a network and the endorser holds a read lockon the whole state DB [36], the commit operation would getdelayed if there are many distributed simulations. Hence, weadopt the technique proposed by Meir et al. [30] to removethe state DB’s read-write lock.

    4.3 Distributed Validation and CommitDue to the distributed simulation, now we need distributedvalidation and commit. Consider a transaction 𝑇 invokingsmart-contracts 𝑆1, 𝑆2, . . . 𝑆𝑛. Consider sparse peers 𝑃1, 𝑃2,. . . , 𝑃𝑛 where each peer 𝑃𝑖 has filter 𝐹𝑖 with a single smart-contract 𝑆𝑖. The transaction invoking all 𝑛 smart-contracts isconsidered valid when the transaction satisfies both the policyand serialization checks of each contract invocation. Hence,to commit this transaction, we would require an agreementfrom all 𝑛 sparse peers during the validation phase.

    Strawman protocol. Each peer 𝑃𝑖 validates parts of thetransaction that involved smart contracts 𝑆𝑖𝜖𝐹𝑖, and then itbroadcasts the results to every other peer. Once a peer hasreceived valid as a result for all smart-contracts 𝑆1 . . . 𝑆𝑘, itcan consider the transaction valid and commit it. If an invalidresult is received, the peer does not need to wait for any moreresults and can proceed by invalidating the transaction. Aspeers within an organization are trusted, there are no securityissues. This approach has a significant drawback. Since thepeers could be at different block heights due to differentblock commit rates (as a result of heterogeneous hardwareor workloads), a transaction requiring distributed commitcould block other transactions in the block for a considerableamount of time. As a result, the committer could get blockedat PopValidationResult() call.

    An Improved protocol using deferred transactions. Toavoid the committer getting blocked, we introduce deferredtransactions. As a result, the committer can commit all localtransactions (i.e., the partial block commit) without waitingfor distributed transactions’ validation results. Further, thecommitter would mark and store distributed transactions asdeferred during the commit (as it is needed for recovery aftera peer failure). Whenever the result is available, the deferredtransaction is committed and removed from the graph. Notethat any local transaction with a dependency on a deferredtransaction must be deferred even if it is in a different block.

    8

  • Table 3: Transaction RW-setsTx Read Set Write Set𝑇1 𝑆1{𝑘7, 𝑣1} 𝑆1{𝑘7, 𝑣2}𝑇2 𝑆2 {𝑘3, 𝑣1} 𝑆1 {𝑘4, 𝑣2}𝑇3 𝑆1 {𝑘4, 𝑣1}

    𝑆2 {𝑘5, 𝑣1}𝑆3 {𝑘6, 𝑣1}

    𝑆1 {𝑘4, 𝑣2}𝑆2 {𝑘3, 𝑣2}𝑆3 {𝑘6, 𝑣2}

    Table 4: StateDB at each peerPeer 𝑃1 Peer 𝑃2 Peer 𝑃3𝑆1(𝑘4, 𝑣1)(𝑘7, 𝑣1)

    𝑆2(𝑘3, 𝑣1)(𝑘5, 𝑣1)

    𝑆3(𝑘6, 𝑣1)

    Figure 9: Transaction dependency graphT1 T2

    T3

    rw (k4)

    ww (k4)

    Peer P1Filter F = {S1}

    T2

    T3

    Peer P2Filter F = {S2}

    T3

    Peer P3Filter F = {S3}

    wr (k 3

    )

    Table 5: Distributed validation of transactions

    RowWorkers Dirty State Valid Tx Invalid Tx Tx waiting for other peers

    𝑃1 𝑃2 𝑃3 𝑃1 𝑃2 𝑃3 𝑃1 𝑃2 𝑃3 𝑃1 𝑃2 𝑃3 𝑃1 𝑃2 𝑃3𝑤1 𝑤2 𝑤1 𝑤2 𝑤1 𝑤21 𝑇1 𝑇2 𝑇2 - 𝑇3 - - - - - - - - - -2 - - - - - - (𝑘7, 𝑣2) 𝑇1 - - - - - 𝑇2 awaits 𝑃2 𝑇2 awaits 𝑃1 𝑇3 awaits 𝑃1, 𝑃2𝑃1 informs 𝑃2 that 𝑇2 is 𝑆1-valid 𝑃2 informs 𝑃1 that 𝑇2 is 𝑆2-valid 𝑃3 informs both 𝑃1 and 𝑃2 that 𝑇3 is 𝑆3-valid3 - - 𝑇3 - - - (𝑘7, 𝑣2)

    (𝑘4, 𝑣2)𝑇1𝑇2

    𝑇2 - 𝑇3 - - - - 𝑇3 awaits 𝑃1, 𝑃2

    𝑃1 informs 𝑃2 and 𝑃3 that 𝑇3 is invalid4 - - - - - - (𝑘7, 𝑣2)

    (𝑘4, 𝑣2)𝑇1𝑇2

    𝑇2 - 𝑇3 𝑇3 𝑇3 - - -

    4.3.1 Illustration of Distributed Validation. Let us assume wehave three sparse peers, 𝑃1, 𝑃2, and 𝑃3, in an organization.Each peer has only one smart-contract in the filter. Letus assume each peer has received a full block with threetransactions, as listed in Table 3. Table 4 presents the currentcommitted state at each peer’s state database. On each peer,the dependency extractor would apply the filter and constructthe dependency graph shown in Figure 9. As peer 𝑃1 hassmart-contract 𝑆1 in its filter, all three transactions areadded while 𝑃2 and 𝑃3 have only 𝑇2 & 𝑇3 and 𝑇3 in thedependency graph, respectively. Note that even though 𝑇3has read from and writes to all three smart-contracts, peer𝑃2 would consider only 𝑆2-states in the read-write set whileconstructing the dependency graph. The same is true for theother two peers. This does not affect the correctness, as theexample below shows.

    Let us assume we have two validation workers, 𝑤1 and𝑤2, per peer. Each worker would pick a transaction withno out-edges, as shown in row one of Table 5. The peer 𝑃1would find 𝑇1 and 𝑇2 to be valid based on OCC [28] check.However, only the write-set of 𝑇1 would be applied to thedirty state while 𝑇2 would wait for the validation result from𝑃2. Similarly, peer 𝑃3 would wait for the validation result of𝑇3 from 𝑃1 and 𝑃2. Once all required validation results areshared between peers, 𝑇2 would be marked valid on peer 𝑃1,and the dirty state would be updated. However, on peer 𝑃2,the dirty state would not be updated with 𝑇2’s write-set asit does not write to 𝑆2. As 𝑇3 has a fate dependency on 𝑇1,peer 𝑃1 would invalidate 𝑇3 and share the result with theother two peers so that they can invalidate it too. Row 6 inTable 5 presents the final result.

    4.4 Auto-Scaling PrimitivesFrom takeaway 4 in §2.3.2, we know that dynamically scalinga Fabric network by adding more peers takes significant time.A newly added vanilla peer copies, validates, and commits allthe blocks sequentially, resulting in a significant time beforethe State DB is in sync with other peers. We propose todramatically reduce the horizontal scale-up time by directly

    copying a ‘snapshot’ of the State DB at a given block 𝐵 andthen continuing regular validation and commit of blocks thatcome after block B.

    However, Fabric only stores the latest states in theState DB as of the last committed block, which mayget overwritten by future blocks. For example, Table 6presents the states at state DB after committing each block.

    Table 6: KV pairs in State DBBlock# State DB1 𝑘1, 𝑣12 𝑘1, 𝑣1, 𝑘2, 𝑣13 𝑘1, 𝑣1, 𝑘2, 𝑣1, 𝑘3, 𝑣14 𝑘2, 𝑣2, 𝑘3, 𝑣1

    To copy state DBat block 3, we needto copy the value of𝑘1, 𝑘2, and 𝑘3 at ver-sion 𝑣1. As the peercontinues to commit

    blocks, the latest state changes. After committing block 4,𝑘1 doesn’t even exist in state DB, and 𝑘2 has a new version𝑣2. In such cases, the older versions of 𝑘1 and 𝑘2 can beobtained from write-sets of transactions present in the blockstore. To copy states from State DB as of a given block,we add an index of the form {smart-contract, block number,transaction number} ↦→ a list of {key, isDelete, isDeferred}.This tracks state modifications done by every transaction.As we would be scaling up by adding sparse peers, we wouldonly be copying a part of the State DB. The states to becopied can be identified by doing a range query on this index.For example, if a newly joined sparse peer wants the stateof smart-contract 𝑆1 as of block 𝐵, it can request the fullpeer. The full peer runs the range query [Start={𝑆1, 0, 0},End={𝑆1, 𝐵 1, 0}) on the newly added index, and finds thekeys modified in this range. It can then use the states fromState DB whose version present in the State DB were createdin the requested block range. If a state 𝑘 is not in the StateDB because it was deleted or modified by a block 𝐵′ > 𝐵, wefind the last transaction in the block range 0, 𝐵 that wrote 𝑘.Then, we extract 𝑘’s value from the transaction’s write-set.

    Once the State DB is copied, the new peer can start sharingthe load. Simultaneously, the admin can update the filter(i.e., removing a contract) of full peer to make it a sparse

    9

  • 1 2 4 8 16vCPUs per peer

    0

    2K

    4K

    6K

    8K

    Thro

    ughp

    ut (

    tps) (a) Smallbank - 1 peer/org

    1 2 4 8 16vCPUs per peer

    (b) Smallbank - 4 peers/org

    1 2 4 8 16vCPUs per peer

    (c) YCSB - 4 peers/org

    Fabric TPS FastFabric TPS SmartFabric TPS

    0

    25

    50

    75

    100

    CPU

    Uti

    l (%

    )

    Figure 10: Vertical Scaling

    1 2 3 4 5 6Peers per Org

    0

    2K

    4K

    6K

    8K

    Thro

    ughp

    ut (

    tps) (a) Smallbank

    1 2 3 4 5 6Peers per Org

    (b) YCSB

    0

    25

    50

    75

    100

    CPU

    Uti

    l (%

    )

    Fabric CPU FastFabric CPU (EP) SmartFabric CPUFigure 11: Horizontal Scaling

    peer. If the new peer will further split into more sparse peerslater, we would need to copy the index and block store in thebackground, which can happen slowly. Instead, if the newpeer needs to be spun up to handle transient load spikes, theState DB copy would suffice.

    5 EVALUATIONWe implemented our proposals by adding 15K lines of GoLangcode to Hyperledger Fabric v1.4. Hereafter, we refer to our ap-proach as SmartFabric. We compare SmartFabric with vanillaFabric and FastFabric [26], which claims to achieve 20K tps.A follow-up work [25] showed that the stable implementa-tion [24] of FastFabric achieved 14K tps on a benchmarksimilar to Smallbank. Briefly, in order to scale, FastFabricseparates a peer’s roles to 3 separate node types: EndorserPeers (EP), FastPeer (FP) and Storage Peers (SP). Only SPcommits both blocks and state on the disk. Other nodes onlystore state in RAM. Note that we have applied one of theiroptimizations (block deserialization cache) to both vanillaFabric (as mentioned in §2.2) and our implementation. First,we focus our evaluation on the following guiding questions:

    Q1. Given a fixed cost of infrastructure, how much higherthroughput can SmartFabric provide?

    Q2. Given a required throughput, by how much does Smart-Fabric reduce the infrastructure cost?

    Next, we inspect the internals to show how each of the pro-posed optimizations individually performs. For experiments,we use the same default configurations described in §2.2.

    5.1 Vertical ScalingTo study the impact of vertical scaling on SmartFabric andFastFabric, we considered the following two scenarios: (1)single peer per organization; (1) four peers per organization.In scenario (1), sparse peers and FastFabric are not applicableas they require at least 2 peers per organization.

    (1) Single peer per organization. Figure 10(a) plots theimpact of number of vCPUs on the throughput and CPUutilization under the Smallbank workload. SmartFabric pro-vided 1.4× higher throughput (on average) than Fabric forthe same infrastructure cost. Further, with an increase in thenumber of vCPUs from 1 to 16, SmartFabric’s throughputimproved 7× compared to 5.5× improvement with Fabric.Similarly, SmartFabric required only 8 vCPUs to providehigher than Fabric’s throughput with 16 vCPUs, i.e., betterperformance at half the cost. We obtained similar results withYCSB but omitted the plot for brevity. This improvement isbecause Fabric underutilizes the CPU beyond a point whileSmartFabric maintains a high utilization. For example, with

    16 vCPUs, SmartFabric demonstrates 76% CPU utilizationversus 36% utilization with Fabric.

    (2) Four peers per organization. Figures 10(b) and 10(c)plot the impact of number of vCPUs on the throughputand CPU utilization for both smallbank and YCSB, respec-tively. SmartFabric provided 2.7× and 1.7× higher through-put (on average) than Fabric and FastFabric, respectively,for the same infrastructure cost. For example, with 16 vC-PUs, SmartFabric provided 7680 tps versus Fabric’s 2560 tpsand FastFabric’s 4480 tps. Further, Fabric’s peak through-put with 16 vCPU was met by FastFabric with 4 vCPUs,while SmartFabric provided comparable performance withjust 2 vCPUs (87% lower cost). SmartFabric showed theseimprovements due to a combination of pipelined executionsand sparse peers. There are two interesting observations withFastFabric’s CPU utilization: (1) the CPU utilization of Fast-Fabric and SmartFabric is almost the same under Smallbank,yet there is a performance gap; (2) the CPU utilization ofFastFabric suddenly dropped when the number of vCPUs >4 under YCSB. The reasons for these interesting behaviorsare explained in the next section.

    5.2 Horizontal Scaling comparisonFigure 11 plots the impact of number of peers in an orga-nization on the throughput and CPU utilization. Each peerhad 16 vCPUs. For FastFabric, when there were 𝑛 nodes,we configured them as 1 FP, 1 SP, and 𝑛 − 2 EP nodes.FastFabric requires a single dedicated FP node. We keptonly one SP because, like vanilla Fabric, each SP node woulddo the same work of committing the entire block to disk.Adding more EP nodes helps by allowing higher transactionendorsement. When there were only 2 nodes, we overlappedEP and SP roles on the same node because FP must be adedicated node. For SmartFabric, we evenly divided 𝑚 smartcontracts over 𝑛 nodes. To every node’s filter, we first added⌊︀

    𝑚𝑛

    ⌋︀contracts. We then added one additional contract to 𝑚

    mod 𝑛 nodes. We call these nodes ‘unevenly’ loaded, whileothers are referred to as ‘evenly loaded’. In our case, 𝑚 = 8.When 𝑛 = 6, each node gets 1 contract, and two nodes getan additional contract.

    SmartFabric provided 2.5× and 1.6× higher throughput(on average) than Fabric and FastFabric, respectively, forthe same number of peers per organization. Further, withjust a single node, SmartFabric achieved more throughputthan Fabric achieved using 4 nodes, i.e., better performancewith 75% lower cost. Similarly, SmartFabric provided higherthroughput with 2 nodes than FastFabric with 3 or more

    10

  • 1 2 3 4 5 6Peers per org

    0

    9

    18

    27

    36

    Bloc

    k Pr

    oces

    sing

    Tim

    e (m

    s)

    Block processing timeEvenly loaded peersUnevenly loaded peers0

    4K

    8K

    12K

    16K

    Thro

    ughp

    ut (

    tps) Throughput

    ActualTheoretical max (Unevenly loaded)Theoretical max (Evenly loaded)

    Figure 12: Effect of imbalanced load on Sparse Peers1 2 3 4 5 6

    Peers per org0

    9

    18

    27

    36

    Bloc

    k pr

    oces

    sing

    tim

    e (m

    s)

    Block processing timeFP EP SP0

    4K

    8K

    12K

    16K

    Thro

    ughp

    ut (

    tps) Throughput

    ActualTheoretical max (SP)Theoretical max (EP)Theoretical max (FP)

    Figure 13: FastFabric’s block processing time

    Smallbank YCSBS=0

    YCSBS=0.5

    YCSBS=1.0

    YCSBS=1.5

    0

    1K

    2K

    3K

    4K

    Thro

    ughp

    ut (

    tps)

    Effect of dependencies

    Vanilla ThroughputVanilla Goodput

    Pipelined ThroughputPipelined Goodput

    Figure 14: Effect of dependencies

    VP FB SB0

    2K

    4K

    6K

    8K

    Thro

    ughp

    ut (

    tps)

    Without Pipelining

    VP FB SB

    With Pipelining

    Smallbank TPS YCSB TPSFigure 15: Full Blocks (FB) & Sparse Blocks(SB) vs Vanilla Peer (VP)

    20 40 60 80 100Cross-SC invoke %

    0

    1.6K

    3.2K

    4.8K

    6.4K

    Thro

    ughp

    ut (

    tps) 2 SCs

    20 40 60 80 100Cross-SC invoke %

    3 SCs

    Vanilla TPS SmartFabric total TPS Deferred TPS

    0

    15

    30

    45

    60

    Late

    ncy

    (ms)

    Deferred txn extra latency

    Figure 16: Distributed transactions performance

    nodes. Interestingly, no approach provided higher through-put beyond 4 peers per organization. As we have alreadyexplained Fabric’s behavior in Section 2, we explain thebehavior of FastFabric and SmartFabric next.

    FastFabric did not benefit by adding even the fourth peeras it got bottlenecked by the SP node. In Section 2, weshowed that a vanilla peer that does not endorse transac-tions could achieve 3300 tps. An SP node is exactly likethis, except it gets pre-validated transactions and only com-mits them. Therefore, its peak throughput increased to 4480tps. Figure 13 shows the block processing times along withthe theoretical maximum throughput that each node canprovided. The actual throughput matched the theoreticalmaximum of SP nodes.

    SmartFabric also did not show improvement on addingmore than 4 peers, but the reason is different. With four nodes,each node committed transactions for two smart-contracts,and the IO-load was balanced. With six nodes, four nodeswere evenly loaded because they managed a single contract.The other two were unevenly loaded as they managed twocontracts. Therefore, in this case, the slowest node was stillsimilar to the case with four nodes. This can be seen in Fig-ure 12. The theoretical max of evenly loaded peers increasedas the number of contracts they managed decreased. Nev-ertheless, the actual throughput was dictated by unevenlyloaded peers. Thus, in reality, the benefit of the sparse peeroptimization would depend upon the number of contracts,their load distribution characteristics, and the number ofpeers in the organization.

    5.3 Inspecting internalsWe now perform an ablation study by looking at individualoptimizations. All experiments were done with 4 peers perorganization (16 peers in total), each having 16 vCPUs.

    (1) Pipelined Validation and Commit Phases. In this exper-iment, all peers were full peers. We first compare the peakperformances achieved on the Smallbank workload. Figure 14plots the throughput achieved with the pipelined executionagainst vanilla Fabric. As expected, there was a 1.4× improve-ment over vanilla Fabric while increasing the CPU utilization

    from 40% to 60%. The validation manager was so efficientthat the committer never got blocked. The size of the result-map was always larger than 200. This is because the timetaken by the committer (≈22 ms at 3600 endorsement re-quests per second—eps) was always higher than the timetaken by validators. Further, the end-to-end commit latency(validation + commit) for a block was reduced from 39 ms(at 2560 eps) to 27 ms (at 3600 eps).

    To study the effect of different degrees of dependenciesbetween transactions, in YCSB, we varied the skewness ofthe Zipf distribution, which was used to select the keys: wewent from an s-value=0.0 (uniform) to an s-value=1.5 (highlyskewed) in steps of 0.5. Figure 14 plots the throughput andgoodput achieved with pipelined execution against vanillaFabric. The reason for considering goodput is that certaintransactions get

    invalidated due to serializability conflicts. As expected, thepipelined execution outperformed vanilla Fabric when thes-value was low. As the s-value increased, more transactionswere invalidated, reducing the goodput and performance gainwith pipelining. Further, it is interesting to note that thethroughput increased with a decrease in the goodput as thepeer did not spend much time on the commit.

    (2) Sparse Peer with Full and Sparse Blocks. We evaluatethe performance of two variants of sparse peer proposedin §4. Each organization hosted 4 sparse peers where thefilter of each sparse peer contained only 2 non-overlappingsmart-contracts. Figure 15(a) plots the throughput achievedwith both variants of sparse peers against a network whereeach organization hosted 4 vanilla peers. As expected, thethroughput increased significantly by 2.4× with sparse peersfor both workloads. Compared to the sparse peer processingfull blocks, the sparse peer processing sparse block achievedhigher throughput due to the reduced IO operation. Fig-ure 15(b) plots the throughput achieved with the combina-tion of sparse peer and pipelined execution against vanillaFabric. The throughput increased significantly to 7680 tps,i.e., 3× that of vanilla Fabric.

    (3) Distributed Simulation and Validation. When transac-tions invoke multiple smart-contracts, SmartFabric employs

    11

  • Table 7: Comparison of our dependency graph with dependency graphs used in prior arts in permissioned blockchainOur dependency graph Par[16] Fabric++ [35] FabricSharp [34] XOX-Fabric [25]

    goal To validate transac-tions in parallel, andrespecting the prede-termined commit order

    To executetransactionsparallely

    To reorder transac-tions to reduce thetransaction abortrate within a block

    To abort unserializabletransactions before order-ing and to avoid transac-tion aborts within a block

    To re-executeconflicting trans-actions to reduceabort rate

    blockchain flavor EOV OE EOVdependenciestracked

    rw, wr, ww, pr, ep-rw,ep-wr, ep-ww

    rw, wr, ww rw rw, wr, ww

    dependency across blocks single block across blockscycles impossible possible impossiblesupported queries get, put, range get, put

    distributed simulation and validation. To study our proposedsystem’s performance in the presence of distributed trans-actions, we submitted transactions that invoked multiplesmallbank contracts. Figure 16 compares the throughput ofvanilla Fabric against SmartFabric under a varying mix ofcross-contract invocation transactions. We consider two sce-narios where cross-contract transactions invoke 2 and 3 con-tracts, respectively. As expected, SmartFabric’s performancedegraded as the percentage of transactions invoking multiplecontracts increased. This is because, with more transactionsinvoking multiple contracts, the amount of work to be done ateach peer increased. However, even with 100% cross-contracttransactions, SmartFabric outperformed vanilla Fabric by 2×in both scenarios.

    Interestingly, the number of transactions deferred per sec-ond was not substantial either. Even when 100% of transac-tions invoked two and three contracts, only ≈ 200 tps and≈ 400 tps respectively got deferred. In other words, merely5% to 10% of issued transactions were getting deferred evenwhen every transaction required distributed validation. Thelow deferral rate is because, with pipelined validation andcommit phases, transactions were getting validated much ear-lier than their results were required by the committer. Theextra latency experienced by deferred transactions rangedfrom 30 to 50 ms. Given that blockchain transactions alreadytake several hundreds of milliseconds [17], this is quite asmall penalty to pay for a big improvement in throughput.However, note that this duration is in the order of one ortwo block processing times in SmartFabric. Without defer-ring, the average block processing duration would increase2× or worse due to waiting, thus drastically reducing thepeak throughput. That would be a significant penalty to payfor not deferring 5% to 10% of transactions. Thus, deferringtransactions is key to this performance.

    (4) Scaling Up and Down. We evaluate the dynamic scalingapproach discussed in §4.4. Figure 17 plots the time takenfor a new peer to copy states from another. To perform a fair

    120 300 600Peer joining after (s)

    0250500750

    1000

    Syn

    c ti

    me

    (s)

    4 11 26155

    310

    685

    (a) Smallbank

    Vanilla Fabric State Copy

    120 300 600Peer joining after (s)

    10 33 70165

    400

    895(b) YCSB

    Figure 17: New peer sync time

    comparison withvanilla, we added anew sparse peer withall smart-contracts(the equivalent ofa full peer) andgenerated the sameload on existing

    peers. SmartFabric provided a multifold reduction than avanilla peer’s sync time at various minutes. This is becauseSmartFabric copied a much smaller amount of data. Ingeneral, the size of block store is several times higher thanthe size of state DB. As we directly copy required statesfrom the state DB, the time taken to add a new peer reducedsignificantly. In vanilla Fabric, we could turn off a peerfor scaling down a network. Here, we have to pay a smallpenalty. Naturally, scaling down sparse peers would requirecopying states from the sparse peers to other peers. Thetime to copy the states was similar to the scale-up time,which is quite small. Compared to the benefits of sparsepeers, this penalty is insignificant.

    6 RELATED WORKIn this section, we cover the existing work that improvedthe performance of Hyperledger Fabric. None of them havestudied the performance using different scaling techniques.While many past approaches ( [16], [35], [34], [25]) trackdependencies between transactions, they use them differently.Table 7 present a detailed comparison of our dependency graphwith dependency graphs used in prior arts in permissionedblockchain.

    Thakkar et al. [36] conducted a comprehensive performancestudy and found bottlenecks in Fabric v1.0 and providedguidelines to design applications and operate the network toattain a higher throughput. Further, they implemented a fewoptimizations on the peer process. These optimizations havealready been included in Fabric v1.4, and hence our workbuilds upon this work.

    Sharma et al. [35] used ideas from the database litera-ture to reorder transactions within a block by analzing theirdependencies, with the goal of reducing transaction aborts.Ruan et al. [34] extended [35] to abort unserializable trans-actions before ordering and to altogether avoid transactionaborts within a block by reordering them. These techniquesare orthogonal to our work as we do not modify the orderingof transactions. We focus on pipelined execution of differentphases and to avoid redundant work within an organization.

    Gorenflo et al. [26] proposed FastFabric that includes var-ious optimizations such as replacing the state DB with ahash table, storing blocks in a separate server, separating thecommitter and endorser into different servers, parallelly vali-dating the transactions headers, and caching the unmarshed

    12

  • blocks to reach a throughput of 20000 tps. However, we be-lieve that many of these optimizations are not practical for aproduction environment. For example, a state DB is must tosupport range queries and persist all current states (whichwould help to recover a peer quickly after a failure). Theproposed approach assumes the transactions to have no read-write conflicts. Moreover, their work is orthogonal to ours asthey do not (1) execute the validation and commit phases ina pipelined manner; (2) parallelly validate transaction duringthe serializability check; (3) have a concept of sparseness ina peer; (4) provide a framework to scale up a Fabric networkquickly. Note, we have adopted the block cache proposed inthis work, as mentioned in § 2.2.

    Gorenflo et al. [25] extended FastFabric [26] to introducepost-order execution of transactions to reduce the transactionaborts. They constructed a dependency graph at the peerwith rw, ww, wr dependencies. Whenever there was a conflictbetween a transaction, they re-exeucted the patch-up codepassed with the transaction to reduce the transaction aborts.This work is orthogonal to our approach.

    Ruan et al. [33] compared the performance of two permis-sioned blockchain platforms—Hyperledger Fabric & Quorumwith distributed databases, namely TiDB, and etcd. How-ever, they did not study the impact of vertical and horizontalscaling on Hyperledger Fabric.

    REFERENCES[1] Alipay press release. https://www.alibabagroup.com/en/news/press_pdf/p180201.pdf.[2] Amazon aws cost estimator. https://calculator.aws/.[3] Azure cost estimator. ihttps://azure.microsoft.com/en-

    in/pricing/calculator/.[4] Consensus quorum. https://consensys.net/quorum/.[5] Electrum decentralized marketplace. https://electrumdark.co/.[6] Forbes blockchain 50 2021. https://www.forbes.com/sites/michaeldelcastillo/2021/02/02/blockchain-

    50/?sh=113b8b9231cb.[7] Google cloud cost estimator. https://cloud.ibm.com/estimator/review.[8] grpc. https://grpc.io/.[9] Ibm cloud cost estimator. https://cloud.google.com/products/calculator.

    [10] Open bazaar decentralized marketplace. https://openbazaar.org/.[11] Origami network decentralized marketplace. https://ori.network/.[12] particl decentralized marketplace. https://particl.io/.[13] Understanding digital tokens: Market overviews and proposed

    guidelines for policymakers and practitioners by token alliance,chamber of digital commerce. https://morningconsult.com/wp-content/uploads/2018/07/token-alliance-whitepaper-web-final.pdf.

    [14] Visa annual report. https://s1.q4cdn.com/050606653/files/doc_financials/annual/2018/visa-2018-annual-report-final.pdf.

    [15] M. Alomari, M. Cahill, A. Fekete, and U. Rohm. The cost ofserializability on platforms that use snapshot isolation. In 2008IEEE 24th International Conference on Data Engineering, pages576–585, April 2008.

    [16] M. J. Amiri, D. Agrawal, and A. E. Abbadi. Parblockchain:Leveraging transaction parallelism in permissioned blockchainsystems. 2019.

    [17] E. Androulaki, A. Barger, V. Bortnikov, C. Cachin, K. Christidis,A. De Caro, D. Enyeart, C. Ferris, G. Laventman, Y. Manevich,S. Muralidharan, C. Murthy, B. Nguyen, M. Sethi, G. Singh,K. Smith, A. Sorniotti, C. Stathakopoulou, M. Vukolić, S. W.Cocco, and J. Yellick. Hyperledger fabric: A distributed operat-ing system for permissioned blockchains. In Proceedings of theThirteenth EuroSys Conference, EuroSys ’18, pages 30:1–30:15,New York, NY, USA, 2018. ACM.

    [18] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, andP. O’Neil. A critique of ansi sql isolation levels. In Proceedingsof the 1995 ACM SIGMOD International Conference on Man-agement of Data, SIGMOD ’95, page 1–10, New York, NY, USA,1995. Association for Computing Machinery.

    [19] V. Buterin. Ethereum: A next-generation smart contract anddecentralized application platform, 2014. Accessed: July 31, 2019.

    [20] Y. Chen. Blockchain tokens and the potential democratization ofentrepreneurship and innovation. Business Horizons, 61(4):567 –575, 2018.

    [21] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, andR. Sears. Benchmarking cloud serving systems with ycsb. InProceedings of the 1st ACM Symposium on Cloud Computing,SoCC10, pages 143–154, New York, NY, USA, 2010. Associationfor Computing Machinery.

    [22] H. Dang, T. T. A. Dinh, D. Loghin, E.-C. Chang, Q. Lin, andB. C. Ooi. Towards scaling blockchain systems via sharding. InProceedings of the 2019 International Conference on Manage-ment of Data, SIGMOD ’19, pages 123–140, New York, NY, USA,2019. ACM.

    [23] S. Goel, A. Singh, R. Garg, M. Verma, and P. Jayachandran. Re-source fairness and prioritization of transactions in permissionedblockchain systems (industry track). In Proceedings of the 19thInternational Middleware Conference Industry, Middleware ’18,pages 46–53, New York, NY, USA, 2018. ACM.

    [24] C. Gorenflo. Fastfabric 1.4 implementation:https://github.com/cgorenflo/fabric/tree/fastfabric-1.4.

    [25] C. Gorenflo, L. Golab, and S. Keshav. Xox fabric: A hybridapproach to blockchain transaction execution. In 2020 IEEEInternational Conference on Blockchain and Cryptocurrency(ICBC), pages 1–9, 2020.

    [26] C. Gorenflo, S. Lee, L. Golab, and S. Keshav. Fastfabric: Scalinghyperledger fabric to 20,000 transactions per second. In 2019IEEE International Conference on Blockchain and Cryptocur-rency (ICBC), pages 455–463, May 2019.

    [27] Z. István, A. Sorniotti, and M. Vukolić. Streamchain: Doblockchains need blocks? In Proceedings of the 2Nd Workshop onScalable and Resilient Infrastructures for Distributed Ledgers,SERIAL’18, pages 1–6, New York, NY, USA, 2018. ACM.

    [28] H. T. Kung and J. T. Robinson. On optimistic methods forconcurrency control. ACM Trans. Database Syst., 6(2):213–226,June 1981.

    [29] Y. Manevich, A. Barger, and Y. Tock. Endorsement in hyper-ledger fabric via service discovery. IBM Journal of Research andDevelopment, 63(2/3):2:1–2:9, March 2019.

    [30] H. Meir, A. Barger, and Y. Manevich. Increasing concurrency inhyperledger fabric. In Proceedings of the 12th ACM InternationalConference on Systems and Storage, SYSTOR ’19, pages 179–179,New York, NY, USA, 2019. ACM.

    [31] S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system, Dec2008. Accessed: July 31, 2019.

    [32] D. Ongaro and J. Ousterhout. In search of an understandableconsensus algorithm. In Proceedings of the 2014 USENIX Con-ference on USENIX Annual Technical Conference, USENIXATC’14, pages 305–320, Berkeley, CA, USA, 2014. USENIX Asso-ciation.

    [33] P. Ruan, T. T. A. Dinh, D. Loghin, M. Zhang, G. Chen, Q. Lin, andB. C. Ooi. Blockchains vs. distributed databases: Dichotomy andfusion. In Proceedings of the 2021 ACM SIGMOD InternationalConference on Management of Data, SIGMOD ’21, page 543–557,2021.

    [34] P. Ruan, D. Loghin, Q.-T. Ta, M. Zhang, G. Chen, and B. C. Ooi.A transactional perspective on execute-order-validate blockchains.In Proceedings of the 2020 ACM SIGMOD International Con-ference on Management of Data, SIGMOD ’20, page 543–557,New York, NY, USA, 2020. Association for Computing Machinery.

    [35] A. Sharma, F. M. Schuhknecht, D. Agrawal, and J. Dittrich. Blur-ring the lines between blockchains and database systems: Thecase of hyperledger fabric. In Proceedings of the 2019 Interna-tional Conference on Management of Data, SIGMOD ’19, pages105–122, New York, NY, USA, 2019. ACM.

    [36] P. Thakkar, S. Nathan, and B. Viswanathan. Performance bench-marking and optimizing hyperledger fabric blockchain platform.In 2018 IEEE 26th International Symposium on Modeling, Anal-ysis, and Simulation of Computer and Telecommunication Sys-tems (MASCOTS), pages 264–276, Sep. 2018.

    13

    Abstract1 Introduction2 Background and Motivation2.1 Hyperledger Fabric2.2 Impact of Scaling Techniques on EOV2.3 Mitigation of an Overloaded Situation

    3 Design of Pipelined Execution3.1 Validation Phase3.2 Commit Phase3.3 Proof of correctness

    4 Design of Sparse Peer4.1 Sparseness in Validation and Commit4.2 Distributed Simulation4.3 Distributed Validation and Commit4.4 Auto-Scaling Primitives

    5 Evaluation5.1 Vertical Scaling5.2 Horizontal Scaling comparison5.3 Inspecting internals

    6 Related WorkReferences