biscotti: a ledger for private and secure peer-to-peer ... · strates that biscotti is scalable,...

Biscotti: A Ledger for Private and Secure Peer-to-Peer Machine Learning

Muhammad ShayanUniversity of British Columbia

[email protected]

Clement FungUniversity of British Columbia

[email protected]

Chris J.M. YoonUniversity of British Columbia

[email protected]

Ivan BeschastnikhUniversity of British Columbia

[email protected]

AbstractCentralized solutions for privacy-preserving multi-party MLare becoming increasingly infeasible: a variety of attackshave demonstrated their weaknesses [21, 6, 27]. FederatedLearning [10, 26] is the current state of the art in supportingsecure multi-party ML: data is maintained on the owner’s de-vice and is aggregated through a secure protocol. However,this process assumes a trusted centralized infrastructure forcoordination and clients must trust that the central servicedoes not maliciously use the byproducts of client data.

As a response, we propose Biscotti: a fully decentral-ized P2P approach to multi-party ML, which leveragesblockchain primitives to coordinate a privacy-preserving MLprocess between peering clients. Our evaluation demon-strates that Biscotti is scalable, fault tolerant, and defendsagainst known attacks. Biscotti is able to protect the perfor-mance of the global model at scale even when 48% of theadversaries are malicious, and at the same time provides pri-vacy while preventing poisoning attacks [22].

1 Introduction

A common thread in ML applications is the collection ofmassive amounts of training data, which is often centralizedfor analysis. However, when training ML models in a multi-party setting, users must share their potentially sensitive in-formation with a centralized service.

Federated learning [26] is the current state of the art insecure multi-party ML: clients train a shared model witha secure aggregator without revealing their underlying dataor computation. But, doing so introduces a subtle threat:clients, who previously acted as passive data contributors,are now actively involved in the training process [22]. Thispresents new privacy and security challenges [6].

Prior work has demonstrated that adversaries can attackthe shared model through poisoning attacks [9, 33], in whichan adversary contributes adversarial updates to influenceshared model parameters. Adversaries can also attack the

privacy of other other clients in federated learning. In aninformation leakage attack, an adversary poses as an honestclient and attempts to steal or de-anonymize sensitive train-ing data through careful observation and isolation of a vic-tim’s model updates [21, 27].

We claim that solutions that solve these two attacks areat tension and are inherently difficult to solve concurrently:client contributions or data can be made public and verifiableto prevent poisoning, but this violates the privacy guaranteesof federated learning. Client contributions can be made moreprivate, but this eliminates the potential for accountabilityfrom adversaries. Prior work has attempted to solve thesetwo attacks individually through centralized anomaly detec-tion [40], differential privacy [4, 17, 18] or secure aggrega-tion [10]. However, a private and decentralized solution thatsolves both attacks concurrently does not yet exist. This isthe focus of our work.

Because ML does not require strong consensus or con-sistency to converge [36], traditional peer to peer (P2P)strong consensus protocols such as BFT protocols [12] aretoo restrictive for ML workloads. To facilitate private,verifiable, crowd-sourced computation, distributed ledgers(blockchains) [31] have emerged. Through design elements,such as publicly verifiable proof of work, eventual consis-tency, and ledger-based consensus, blockchains have beenused for a variety of decentralized multi-party tasks such ascurrency management [19, 31], archival data storage [5, 29]and financial auditing [32]. Despite this wide range of ap-plications, a fully realized, accessible system for large scalemulti-party ML that is robust to both attacks on the globalmodel and attacks on other clients does not exist.

We propose Biscotti: a decentralized public P2P sys-tem that co-designs a blockchain ledger with a privacy-preserving multi-party ML process. Peering clients join Bis-cotti and contribute to a ledger to train a global model, un-der the assumption that peers are willing to collaborate onbuilding ML models, but are unwilling to share their data.Each peer has a local training data set that matches the de-sired objective of the training process and increases train-

1

arX

iv:1

811.

0990

4v1

[cs

.LG

] 2

4 N

ov 2

018

ing data diversity. Biscotti is designed to support stochas-tic gradient descent (SGD) [11], an optimization algorithmwhich iteratively selects a batch of training examples, com-putes their gradients against the current model parameters,and takes gradient steps in the direction that minimizes theloss function. SGD is general purpose and can be used totrain a variety of models, including deep learning [14].

The Biscotti blockchain coordinates ML training betweenthe peers. Each peer has a local training data set that matchesthe learning goal of the global model and provides trainingdata diversity. Peers in the system are weighed by the value,or stake, that they have in the system. Inspired by priorwork [19], Biscotti uses proof of stake in combination withverifiable random functions (VRFs) [28] to select key rolesthat help to arrange the privacy and security of peer SGD up-dates. Our use of stake prevents groups of colluding peersfrom overtaking the system without a sufficient ownershipof stake. Biscotti prevents peers from poisoning the modelthrough a Reject on Negative Influence defense [8] and pre-vents peers from performing an information leakage attackthrough differentially private noise [4] and Shamir secretsfor secure aggregation [39].

The contributions of our Biscotti design include:

• A novel protocol that leverages blockchain primitives tofacilitate secure and private ML in a P2P setting

• The translation of established blockchain primitivessuch as proof of stake [19] and verification into a multi-party ML setting.

In our evaluation we deployed Biscotti on Azure and con-sidered its performance, scalability, churn tolerance, andability to withstand different attacks. We found that Bis-cotti can train an MNIST softmax model with 100 peers ona 60,000 image dataset in 50 minutes. Biscotti is fault tol-erant to node churn every 15 seconds across 50 nodes, andconverges even with such churn. Finally, we subjected Bis-cotti to information leakage attacks [27] and poisoning at-tacks [22] from prior work and found that Biscotti is resilientto both types of attacks.

2 Challenges and contributions

In this section we describe some of the challenges indesigning a peer to peer (P2P) solution for multi-party MLand the key pieces in Biscotti’s design that resolve each ofthese challenges.

Poisoning attacks: update validation using RONI. Inmulti-party ML, peers possess a disjoint, private set of thetotal training data. They have full control over this hiddendataset and thus a malicious peer can perform poisoning at-tacks without limitation.

More noise (smaller ) = more privacy

More aggregation (larger batch size) = more privacy

"

Figure 1: Biscotti uses two mechanisms to provide privacy:(top row) adding noise to SGD updates, and (bottom row)aggregating batches of SGD updates.

Biscotti validates an SGD update using the Reject on Neg-ative Influence (RONI) defense [8], in which the effect of aset of candidate data points (or model updates) are evaluatedand compared to a known baseline model. In multi-partysettings, known baseline models are not available to users,so Biscotti validates an SGD update by evaluating the effectof the isolated update with respect to a peer’s local dataset.The performance of the current model with and without theupdate is evaluated. A validation error is computed and up-dates that negatively influence the model are rejected.

Since no peer has access to the global training dataset, us-ing a peers’ local data for update verification may introducefalse positives and incorrectly reject honest updates. Biscottisolves this by using a majority voting scheme. As morepeers are used to validate a peer’s SGD update, the baselinedataset approaches the global data distribution, reducingfalse positives.

Information leakage attacks: random verifier peers anddifferentially private updates using pre-committed noise.Revealing SGD updates to untrusted nodes puts a peer’straining data at risk. By observing a peer’s updates from eachverification round, an adversary can perform an informationleakage attack [27] and recover details about the peer’s train-ing data.

We prevent attacks during update verification in two ways.First, Biscotti uses verifiable random functions (VRFs) [28]to ensure that malicious peers cannot deterministicallybe selected to verify a victim’s gradient. Second, peerssend differentially-private updates [4, 17] to verifier peers(Figure 1 top): before sending a gradient to a verifier,pre-committed ε-differentially private noise is added to theupdate, masking the peer’s gradient in a way that neither thepeer nor the attacker can influence or observe. By verifyingnoised SGD updates, peers in Biscotti can verify the updatesof other peers without directly observing their un-noisedversions. As shown in Figure 1, a decreasing value of ε

2

provides stronger privacy for a user, at the cost of a higherfalse positive rate.

Leaking information through public ledger: secure up-date aggregation. The goal of a blockchain is to providecomplete verifiability and auditing of state in a ledger, butthis is counter to the goal of privacy-preserving multi-partyML. For example, information leakage attacks are also pos-sible if SGD updates are stored in the ledger.

The differentially private updates noted above are onepiece of the puzzle. The second piece is secure update ag-gregation: a block in Biscotti does not store updates fromindividual peers, but rather an aggregate that obfuscates anysingle peer’s contribution during that round of SGD. we usepolynomial commitments [23] to hide the values of individ-ual updates before they are aggregated in a block, therebyhiding the update until it is aggregated as a part of a block inthe ledger (Figure 1 bottom).

The bottom of Figure 1 shows the effect of increasing thenumber of batched SGD updates. Even when these updatesare coming from the same user, it becomes increasinglydifficult to perform information leakage on a single instanceof a user’s SGD update when the number of aggregatedexamples increases.

Sybil attacks: VRFs based on proof of stake. Adversariescan collude or generate aliases to increase their influence ina sybil attack [16]. For example, an adversary can exploit themajority voting scheme to validate their malicious SGD up-dates. With an increasing number of peers in the system, anadversary increases their chance of sending malicious SGDupdates to the final block.

Inspired by Algorand [19], Biscotti uses three verifiablerandom functions (VRFs) [28] to select a subset of peers forthe different stages of the training process: adding noise toupdates, validating an update, and securely aggregating theupdate. To mitigate the effect of sybils, these VRFs selectpeers proportionally to peer stake, ensuring that an adver-sary cannot increase their influence in the system withoutincreasing their total stake.

3 Assumptions and threat model

We assume a public machine learning system that is backedby an auxiliary measure of stake. Stake may be derived froman online data sharing marketplace, a shared reputation scoreamong competing agencies or auxiliary information on a so-cial network.

Like federated learning, Biscotti assumes that users arewilling to collaborate on building ML models among eachother, but are unwilling to directly share their data when do-ing so [26]. Each peer has a private, locally-held trainingdata set that matches the learning goal of the global model,

but each peer’s dataset has sufficient utility to increase per-formance of the global model.

3.1 Design assumptions

Proof of stake. Users in the system are weighed by thevalue they have in the system [19]. We assume that there isa proof of stake function that is available to all nodes in thesystem, that takes a peer identifier and returns the fractionof stake that this peer has in the system. Peers accrue stakeas they contribute towards building the global model, andlose stake when their proposed updates are rejected (peersuse a well-defined stake update mechanism). We assumethat at any point in time the majority of stake in the systemis honest and the stake function to be properly bootstrapped:upon initialization and onwards, any set of peers that hasa majority of stake in the system is legitimate and will notattempt to subvert the system.

Blockchain topology. Each peer is connected to somesubset of other peers in a topology that allows flooding orgossip- based dissemination of updates that eventually reachall peers in the system. For example, this could be a randommesh topology with flooding, similar to the one used fortransaction and block dissemination in Bitcoin [31].

Machine learning. We assume that ML training parametersare known to all peers: the model, its hyperparameters, itsoptimization algorithm, the learning goal of the system andother details (these are distributed in the first block). Peershave local datasets that they wish to keep private and is suf-ficiently unique from the rest of the global data. However, apeer’s dataset is not so distinct that it cannot be used as thequiz set for validation of other gradients. We assume thatthe distribution of data between clients in the system is suffi-ciently uniform such that RONI [8] is accurate when used tovalidate gradients. This is essential for RONI to be used as adefense against poisoning attacks in our setting.

3.2 Attacker assumptions

Peers may be adversarial and send malicious updates to thesystem to perform a poisoning attack on the shared model oran information leakage attack against a targeted peer victim.In doing so, we assume that peers may control multiple peersin the system and collude in a sybil attack [16]. We do notrestrict adversaries at the API level: they can provide anyarbitrary value when computing, validating, or adding noiseto a gradient update. Although peers may be able to increasethe number of peers they control in the system, we assumethat adversaries cannot artificially increase their stake in thesystem except by providing valid updates that significantlyimprove the performance of the global model.

3

When adversaries perform a poisoning attack, we assumethat their goal is to influence the final global model predic-tion outputs in a way that impacts the performance of theglobal model on validation sets in an observable manner. Ourdefense relies on using RONI with local validation datasetsto verify the authenticity of gradients: this does not includegradients that are used for specific targetted poisoning at-tacks [22] that only impact a narrow subset of classes in themodel, or attacks on unused parts of the model topology,such as backdoor attacks [6, 20].

When adversaries perform an information leakage attack,we assume that they aim to learn properties of a victim’s lo-cal dataset. This does not include class-level informationleakage attacks on federated learning [21], which attempt tolearn the properties of an entire target class. We do not con-sider the properties of a defined class to be private in our set-ting, and only consider attacks that are performed by target-ing and observing the contributions from an individual targetpeer.

4 Biscotti design

Biscotti implements peer-to-peer ML based on SGD. For thisprocess, Biscotti’s design has the following goals:• Convergence to an optimal global model• Peer contributions to the model are verified before being

accepted to prevent poisoning• Peer training data is kept private and information leak-

age attacks on training data are prevented• Colluding peers cannot gain influence without acquir-

ing sufficient stakeBiscotti meets these goals through a blockchain-based

design that we describe in this section.

Design overview. Each peer in the system has local datathat the peer is incentivized to contribute to the learning pro-cedure, and an amount of preconceived stake in the system.Using Biscotti, peers join a public ledger and collaborativelytrain a global model. Each block in the distributed ledgerrepresents a single iteration of SGD and the ledger containsthe state of the global model at each iteration. Figure 2overviews the Biscotti design with a step-by-step of illustra-tion of what happens during a single SGD iteration in whicha single block is generated.

In each iteration, peers locally compute SGD updates (step1 in Figure 2). Since SGD updates need to be kept private,

each peer first masks their update using differentially privatenoise. This noise is obtained from a unique set of noisingpeers for each client selected by a VRF [28]. By using aunique noising VRF set, Biscotti prevents unmasking of up-dates using collision attacks. (step 2 and 3 ). This noise iscommitted by the noise providers in advance of the iterationduring a bootstrapping phase; therefore, it cannot be used tomaliciously mask poisoned updates as valid.

The masked updates are validated by a verification com-mittee (selected using a different VRF, also weighted by peerstake) to defend against poisoning. Each member in the ver-ification committee signs the commitment to the peer’s un-masked update if their local RONI passes (step 4 ). The ma-jority of the committee must sign an update (step 5 ) for itto be considered valid. The commitments of each unmaskedupdates that accumulates enough verification signatures aredivided into Shamir secret shares (step 6 ) and given toan aggregation committee (selected using a third weightedVRF) that uses a secure protocol to aggregate the unmaskedupdates (step 7 ). Each peer who contributes a share tothe final aggregated update receives additional stake in thesystem, which can be verified through the visibility of com-mitments in the ledger. The aggregate update is added tothe global model to create a newly-generated block, whichis added to the ledger: the block is disseminated to all thepeers so that the next iteration can take place with an updatedglobal model and stake. (step 8 ).

Next, we describe how we bootstrap the Biscotti trainingprocess.

4.1 Initializing the trainingBiscotti requires peers to initialize the training process frominformation distributed in the first (genesis) block. We as-sume that this block is distributed out of band by a trustedauthority and the information within it is reliable. This blockincludes key P2P information like each peer i’s public keyPKi, and information required for multi-party ML, such asthe initial model state w0 and expected number of iterationsT .

Biscotti also requires the initial stake for all peers and thestake update function, which is used to update the weightof peers based on their contributions throughout the learningprocess.

Lastly, Biscotti requires pre-committed differentially pri-vate noise to be committed to the system. This noise is usedin the noising protocol of Biscotti (step 2 in Figure 2) andmust be pre-committed to prevent information leakage andpoisoning attacks.

In summary, each peer that joins Biscotti obtains the fol-lowing information from the genesis block:• The commitment public key PK for creating commit-

ments to SGD updates (see Appendix C)• The public keys PKi of all other peers in the system

which are used to create and verify signatures in the verifica-tion stage.• Pre-commitments to T iterations of differentially pri-

vate noise ζ1..T for each SGD iteration by each peer. (SeeFigure 5 and Appendix B).• The initial stake distribution among the peers.• A stake update function for updating a peer’s stake

when a new block is appended.

4

…

…Data1

Secure Agg.VRF set t

Blockchain

… …

Get noise to mask update

NoisingVRF set t

Local SGDfor iteration t

Data2 Data3 DataN

VerifierVRF set t

Data’ Data’’

Run RONI toverify updates

Noisy update

Signatures

1

2 3

4

5

Run secure aggregation

⌃�wjCreate new block with aggregated gradient

Shamir secret shares6

7

8

Retrieve noise

�w1 �w2 �w3 �wN

�w2 + ⌃k"

Genesis block

Initialize ML

Block t

✏1 ✏k

"

Figure 2: The ordered steps in a single iteration of the Biscotti algorithm.

COMM(�w1)sign1

Verifier commitment sigs for :

wt

COMM(�wu)sign1

COMM(�wu)signj

stake01

. . .COMM(�w1)signj

stake0u

. . .

.

.

.

Verifier commitment sigs for :�w1 �wu

COMM(�w1)Commitment to update by peer 1: Commitment to update by peer u:

COMM(�wu)

Global model:

Updated stake for peer 1: Updated stake for peer u:

Aggregate of updates: Prev. block hash: 0xab0123ef⌃u�w

Figure 3: Block contents at iteration t. Note that wt iscomputed using wt−1 +Σwu where wt−1 is the global modelstored in the block at iteration t−1 and j is the set of verifiersfor iteration t.

4.2 Blockchain designDistributed ledgers are constructed by appending read-onlyblocks to a chain structure and disseminating blocks using agossip protocol. Each block maintains a pointer to its pre-vious block as a cryptographic hash of the contents in thatblock.

Each block in Biscotti contains, in addition to the previ-ous block hash pointer, an aggregate (∆w) of SGD updatesfrom multiple peers and a snapshot of the global model wtat iteration t. Newly appended blocks to the ledger storethe aggregate updates ∑∆wi of multiple peers. Each blockalso contains a list of commitments for each peer i’s updateCOMM (∆wi) for privacy and verifiability. These commit-ments provide privacy by hiding the individual updates, yetcan be homomorphically combined to verify that the update

to the global model by the aggregator ∑∆wi was computedhonestly. The following equality holds if the list of commit-ted updates equal the aggregate sum:

COMM(∑∆wi) = ∏i

COMM(∆wi) (2)

This process continues until the specified number of itera-tions T is met, upon which the learning process is terminatedand each peer extracts the global model from the final block.In summary, each block for an iteration t contains the fol-lowing (Figure 3):• The global model wt• The aggregate ∑∆w, which moves the model from wt−1

to wt .• The commitments of each peer’s model update in ∑∆w• A list of verifier signatures of each commitment con-

firming that the update has passed validation.• An updated stake map for all peers in the system• The SHA-256 hash of the previous block at t−1

4.3 Using stake for role selectionFor each ML iteration in the algorithm, public VRFs desig-nate roles for each peer in the system to prevent colludingpeers from dominating the system. These VRFs are weighedbased on the stake that peers have in the system.

Each role in the system (noiser, verifier, aggregator) is se-lected using a different VRF. Peers can be any or all of theroles in any given iteration. To provide additional privacy,the output of the noiser VRF should be unique to each peer.Therefore, it uses a peer i’s public key PKi as the randomseed in the VRF (more details are given in Section 4.4). Theverification and aggregation VRFs are seeded with a global

5

peer2

peer1stake region

peer3

peer2

peerN

peer3

peer1

peer2stake region

hashh(hash)

h(h(hash))

GeneratedBlock

VRF

Figure 4: Biscotti uses a consistent hashing protocol basedon the current stake to determine the roles for each iteration.

public key and the SHA-256 hash of the previous block toenforce determinism once a block is appended. For any it-eration t, the output of the VRF is globally observed andverifiable. Since an adversary cannot predict the future stateof a block until it is created, they cannot speculate on outputsof the VRF and strategically perform attacks.

To assign each role to multiple peers, we use consistenthashing (Figure 4). The initial SHA-256 hash is repeatedlyre-hashed: each new hash result is mapped onto a keyspacering where portions of the keyspace are proportionally as-signed to peers based on their stake. This provides the samestake-based property as Algorand [19]: a peer’s probabilityof winning this lottery is proportional to their stake.

Each peer runs each of the VRFs to determine whetherthey are an aggregator, verifier or both during a particulariteration. Each peer that is not an aggregator or verifier alsocomputes their set of noise providers using the noise VRF.This noise VRF set is also accompanied by a proof. When anoising peer receives a noise request, they can use this proofand the peer’s public key to verify its correctness, allowingthem to determine that they are indeed a noise provider forthat particular peer.

4.4 Noising protocol

Sending a peer’s SGD update directly to another peer mayleak information about their private dataset [27]. To preventthis, peers use differential privacy to hide their updates priorto verification. We adopt our implementation of differentialprivacy from prior work by Abadi et al. [4], which samplesnoise from a normal distribution and adds it to SGD updates(see Appendix B for formalisms).

Since the noise is only used in the verification stage, thisdoes not affect the utility of the model once the update isaggregated into the ledger. Biscotti uses secure aggregation

to preserve privacy in the aggregation phase, so differentialprivacy is not required in this stage.

Using pre-committed noise to thwart poisoning. Attack-ers may maliciously use the noising protocol to execute poi-soning or information leakage attacks. For example, a peercan send a poisonous update ∆wpoison, and add noise ζp that‘unpoisons’ this update into an honest-looking update ∆w,such that ∆wpoison + εpoison = ∆w. By doing this, a veri-fier observes the honest update ∆w, but the poisonous update∆wpoison is applied to the model.

To prevent this, Biscotti requires that every peer pre-commits the noise vector ζt for every iteration t ∈ [1..T ] inthe genesis block. Since the updates cannot be generated inadvance without the knowledge of the global model, there-fore a peer cannot effectively commit noise that unpoisonsan update. Furthermore, Biscotti requires that the noise thatis added is taken from a different peer than the one creatingthe update. This peer is determined using a noising VRF andfurther restricts the control that a malicious peer has over thenoise used to sneak poisoned updates past verification.

Using a VRF-chosen set of noisers to thwart informationleakage. A further issue may occur in which a noising peer Aand a verifier B can collude in an information leakage attackagainst a victim peer C. The noising peer A can commit aset of zero noise which does not hide the original gradientvalue at all. When the victim peer C sends its noised gradientto the verifier B, B performs an information leakage attackand correlates C’s gradient back to its original training data.This attack is viable because the verifier B knows that A isproviding the random noise and A provides zero noise.

This attack motivates Biscotti’s use of a private VRF thatselects a group of noising peers based on the victim C’spublic key. In doing so, an adversary cannot pre-determinewhether their noise will be used in a specific verificationround by a particular victim, and also cannot pre-determineif the other peers in the noising committee will be maliciousin a particular round. Our results in Figure 12 show that theprobability of an information leakage is negligible given asufficient number of noising peers.

Protocol description. For an ML workload that may be ex-pected to run for a specific number of iterations T , each peeri generates T noise vectors ζt and commits these noise vec-tors into the ledger, storing a table of size N by T (Figure 5).When a peer is ready to contribute an update in an iteration,it runs the noising VRF and contacts each noising peer k,requesting the noise vector ζk pre-committed in the gene-sis block COMM(ζk). The peer then uses a verifier VRF todetermine the set of verifiers. The peer masks their updateusing this noise and submits to these verifiers the maskedupdate, a commitment to the noise, and a commitment to theunmasked update. It also submits the noise VRF proof which

6

Noise committed

Noise committedfor iteration 1

comm(noise21)

comm(noise11)

comm(noiseN1 )

comm(noise1T )

comm(noise2T )

comm(noiseNT )

……

…

……

T iterations

Npeerspeer 2

by

Figure 5: Peers commit noise to an N by T structure. Eachrow i contains all the noise committed by a single peer i,and each column t contains potential noise to be used duringiteration t. When committing noise at an iteration i, peersexecute a VRF and request ζ k

i from peer k

attests to the verifier that its noise is sourced from peers thatare a part of their noise VRF set.

4.5 Verification protocolThe verifier peers evaluate the quality of a peer’s update withrespect to their local dataset and reject the update if it failstheir RONI validation. Each verifier receives the followingfrom each peer i:• The masked SGD update: (∆wi +∑k ζk)• Commitment to the SGD update: COMM(∆wi)• The set of k noise commitments:

{COMM(ζ1),COMM(ζ2), ...,COMM(ζk))}• A VRF proof confirming the identity of the k noise

peersWhen a verifier receives a masked update from another

peer, it can confirm that the masked SGD update is consis-tent with the commitments to the unmasked update and thenoise by using the homomorphic property of the commit-ments [23]. A masked update is legitimate if the followingequality holds:

COMM(∆wi +∑k

εk) =COMM(∆wi)∗∏k

COMM(∆ek)

The verifier then evaluates the masked SGD update usingRONI [8]. In RONI, as long as the isolated update on themodel w dose not degrade the performance beyond a thresh-old, the update is accepted and COMM(∆wi) is signed by theverifier using their public key.

Verifiers use their local dataset in performing verificationand may therefore disagree on the impact of an update. Weuse a majority voting scheme to accept or reject updates.Once a peer receives a majority number of signatures fromthe verifiers, the update can be disseminated for aggregation.

4.6 Aggregation protocolAll peers with a sufficient number of signatures in the verifi-cation stage aggregate their SGD updates together and apply

the aggregate to the global model. The update equation inSGD (see Appendix A) can be re-written as:

wt+1 = wt +

∆wveri f ied

∑i=1

∆wi

where ∆wi is the verified SGD update of peer i and wt is theglobal model at iteration t.

However, these updates cannot be directly collected at apeer because they contain sensitive information. Hence, theypresent a privacy dilemma: no peer should observe an up-date from any other peer, but the updates must eventually bestored in a block.

The objective of the aggregation protocol is to enable a setof m aggregators, predetermined by the VRF function, to cre-ate the next block with the updated global model by learningthe ∑i ∆wi without learning any individual updates. Biscottiuses a technique that preserves privacy of the individual up-dates if at least half of the m aggregators participate honestlyin the aggregation phase. This guarantee holds if the VRFfunction selects a majority of honest aggregators, which islikely when the majority of stake is honest.

Biscotti achieves the above guarantees using polynomialcommitments (see Appendix C) combined with verifiable se-cret sharing [39] of individual updates. The update of lengthd is encoded within a d-degree polynomial, which can bebroken down into n shares such that (n = 2∗ (d +1)). Thesen shares are distributed equally among m aggregators. Sincean update can be reconstructed using (d+1) shares, it wouldrequire m

2 colluding aggregators to compromise the privacyof an individual update. Therefore, given that any majorityof aggregators is honest and does not collude, the privacy ofan individual peer update is preserved.

Recall that a peer with a verified update already pos-sesses a commitment C = COMM(∆wi(x)) to its SGD up-date signed by a majority of the verifiers from the previousstep. To compute and distribute its update shares among theaggregators, peer i runs the following secret sharing proce-dure:

1. The peer computes the required set of secret sharessm,i = {z,∆wi(z)|z ∈ Z} for aggregator m. In orderto ensure that an adversary does not provide sharesfrom a poisoned update, the peer computes a set ofassociated witnesses witm,i = {COMM(Ψz(x))|Ψz(x) =∆w(x)−∆w(z)

x−z }. These witnesses will allow the aggregatorto verify that the secret share belongs to the update ∆wicommitted to in C. It then sends < C,sm,i,witm,i > toeach aggregator along with the signatures obtained inthe verification stage.

2. After receiving the above vector from peer i, the aggre-gator m runs the following sequence of validations:

7

(a) m ensures that C has passed the validation phaseby verifying that it has the signature of the major-ity in the verification set.

(b) m verifies that in each share (z,∆wi(z)) ∈ sm,i∆wi(z) is the correct evaluation at z of the poly-nomial committed to in C. (For details, see Ap-pendix C)

Once every aggregator has received shares for the mini-mum number of updates u required for a block, each aggre-gator aggregates its individual shares and shares the aggre-gate with all of the other aggregators. As soon as a aggre-gator receives the aggregated d + 1 shares from at least halfof the aggregators, it can compute the aggregate sum of theupdates and create the next block. The protocol to recover∑

uj=1 ∆w j is as follows:

1. All m aggregators broadcast the sum of their acceptedshares and witnesses < ∑

ui=1 smi ,∑

uj=1 witm j >

2. Each aggregator verifies the aggregated broadcastshares made by each of the other aggregators by check-ing the consistency of the aggregated shares and wit-nesses.

3. Given that m obtains the shares from m2 aggregators in-

cluding itself, m can interpolate the aggregated sharesto determine the aggregated secret ∑

uj=1 ∆w j

Once m has figured out ∑ui=1 ∆wi, it can create a block with

the updated global model. All commitments to the updatesand the signature lists that contributed to the aggregate areadded to the block. The block is then disseminated in thenetwork. Any peer in the system can verify that all updatesare verified by looking at the signature list and homomorphi-cally combine the commitments to check that the update tothe global model was computed honestly (see 4.2). If any ofthese conditions are violated, the block is rejected.

4.7 Blockchain consensus

Because VRF-computed subsets are globally observable byeach peer, and based only on the SHA-256 hash of the latestblock in the chain, ledger forks should rarely occur in Bis-cotti. For an update to be included in the ledger at any iter-ation, the same noising/verification/aggregation committeesare used. Thus, race conditions between aggregators will notcause forks in the ledger to occur as frequently as in e.g.,BitCoin [31].

When a peer observes a more recent ledger state throughthe gossip protocol, it can catch up by verifying that the com-putation performed is correct by running the VRF for theledger state and by verifying the signatures of the designatedverifiers and aggregators for each new block.

In Biscotti, each verification and aggregation step occursonly for a specified duration. Any updates that are not suc-cessfully propagated in this period of time are dropped: Bis-cotti does not append stale updates to the model once com-peting blocks have been committed to the ledger. This syn-chronous SGD model is acceptable for large scale ML work-loads which have been shown to be tolerant of bounded asyn-chrony [36]. However, these stale updates could be leveragedin future iterations to lead to faster convergence if their learn-ing rate is appropriately decayed [14]. We leave this as anoptimization for future work.

5 Implementation

We implemented Biscotti in 4,500 lines of Go 1.10 and 1,000lines of Python 2.7 and released it as an open source project.We use Go to handle all networking and distributed systemsaspects of our design. We used PyTorch [34] to generateSGD updates and noise during training. By building on thegeneral-purpose API in PyTorch, Biscotti can support anymodel that can be optimized using SGD. We use the go-python [3] library to interface between Python and Go.

We use the kyber [2] and CONIKS [1] libraries to imple-ment the cryptographic parts of our system. We use CONIKSto implement our VRF function and kyber to implementthe commitment scheme and public/private key mechanisms.To bootstrap clients with the noise commitments and publickeys, we use an initial genesis block. We used the bn256curve API in kyber for generating our commitments and pub-lic keys that form the basis of the aggregation protocol andverifier signatures. For signing updates, we use the Schnorrsignature [38] scheme instead of ECDSA because multipleverifier Schnorr signatures can be aggregated together intoone signature [25]. Therefore, our block size remains con-stant as the verifier set grows.

6 Evaluation

We had several goals when designing the evaluation of ourBiscotti implementation. We wanted to demonstrate that (1)Biscotti can be used to successfully train a variety of MLmodels in a P2P deployment, (2) Biscotti’s design scales toaccommodate more nodes, (3) Biscotti is robust to poisoningattacks and information leakage, and (4) Biscotti is tolerantto node churn.

For experiments done in a distributed setting, we deployedBiscotti to train an ML model across 20 Azure A4m v2 vir-tual machines, with 4 CPU cores and 32 GB of RAM. Wedeployed a varying number of peers in each of the VMs.We measure the error of the global model against a parti-tioned validation dataset and ran each experiment for 100 it-erations. To evaluate Biscotti’s defense mechanisms, we ranknown inference and poisoning attacks on federated learn-

8

Dataset Model Type Examples (n) Params (d)Credit Card LogReg 21000 25

MNIST Softmax 60000 7850

Table 1: The final convergence rate and number of parame-ters across other model types and parameters (LogReg, Soft-max).

Parameter Default ValuePrivacy budget (ε) 2Number of noisers 2

Number of verifiers 3Number of aggregators 3

Proportion of secret shares needed u 0.125Initial stake Uniform, 10 each

Stake update function Linear, +/- 5

Table 2: The default parameters used for all experiments inBiscotti, unless otherwise stated.

ing [27, 22] and measured their effectiveness under variousattack scenarios and Biscotti parameters. We also evaluatedthe performance implications of our design by isolating spe-cific components of our system and varying the VRF com-mittee sizes with different numbers of peers.

By default, we execute Biscotti with the parameter val-ues in Table 2. Table 1 shows the datasets, types of model,number of training examples n, and the number of param-eters d in the model that we used for our experiments. Inlocal training we were able to train a model on the CreditCard and MNIST datasets with accuracy of 100% and 96%,respectively.

6.1 Baseline performanceWe start by evaluating how Biscotti generalizes to differentworkloads and model types. We evaluated Biscotti with lo-gistic regression and softmax classifiers. However, due tothe general-purpose PyTorch API, we claim that Biscotti cangeneralize to models of arbitrary size and complexity, as longas they can be optimized with SGD and can be stored in ourblock structure. We evaluate logistic regression with a creditcard fraud dataset from the UC Irvine data repository [15],which uses an individual’s financial and personal informa-tion to predict whether or not they will default on their nextcredit card payment. We evaluate softmax with the canon-ical MNIST [24] dataset, a task that involves predicting adigit based on its image.

We first execute Biscotti in a baseline deployment andcompare it to the original federated learning baseline [26].For this we partitioned the MNIST dataset [24] into 100equal partitions, each of which was shared with an honestpeer on an Azure cluster of 20 VMs, with each VM hosting5 peers. These Biscotti/Federated Learning peers collabo-

0 500 1000 1500 2000 2500 3000Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

Valid

atio

n Er

ror

Biscotti 100 nodesFederated Learning 100 nodes

Figure 6: Comparing the time to convergence of Biscotti toa federated learning baseline

0 20 40 60 80 100Training Iterations

0.0

0.2

0.4

0.6

0.8

1.0

Valid

atio

n Er

ror

Biscotti 100 nodesFederated Learning 100 nodes

Figure 7: Comparing the iterations to convergence of Bis-cotti to a federated learning baseline

rated on training an MNIST softmax classifier, and after 100iterations both models approached the global optimum. Theconvergence rate over time for both systems is plotted in Fig-ure 6 and the same convergence over the number of iterationsis shown in Figure 7. In this deployment, Biscotti takes about8.3 times longer (6 minutes vs. 50 minutes) than FederatedLearning, yet achieves similar model performance after 100iterations.

6.2 Performance cost break-down

In breaking down the overhead in Biscotti, we deployed Bis-cotti over a varying number peers in training on MNIST. Wecaptured the total amount of time spent in each of the ma-jor phases of our algorithm in Figure 2: collecting the noisefrom each of the noising clients (steps 2 and 3 ), execut-ing RONI and collecting the signatures (steps 4 and 5 )and securely aggregating the SGD update (steps 6 and 7 ).Figure 8 shows the breakdown of the total cost per iterationfor each stage under a deployment of 40, 60, 80 and 100nodes. Biscotti spends most of its time in the verificationstage. We found this to be a limitation of the go-python li-brary and performing RONI on models in PyTorch. Optimiz-ing these components remains future work.

9

40 60 80 100Number of Peers

0

20

40

60

80

100

Tim

e Pe

r Ite

ratio

n (s

)

0.2 0.5 2.3 0.6 7.1

17.929.0

33.8

3.4 3.711.7

3.1

23.834.0

48.053.8

NoisingVerification

Secure AggregationTotal

Figure 8: Evaluating the breakdown of performance costs inBiscotti

Noisers Verifiers Aggregators020406080

100120140

Tim

e Pe

r Ite

ratio

n (s

)

29.8 31.5 30.7 31.1 49.2

27.5 32.1

91.6

23.7

N = 3 N = 5 N = 10

Figure 9: The average amount of time taken per iteration un-der an increasing number of noisers, verifiers or aggregators

6.3 Scaling up VRF sets in Biscotti

We evaluate Biscotti’s performance as we change the size ofthe VRF-computed noiser/verifier/aggregator sets. For thiswe deploy Biscotti on Azure with the MNIST dataset with afixed size of 50 peers, and only vary the number of noisersneeded for each SGD update, number of verifiers used foreach verification committee, and the number of aggregatorsused in secure aggregation. Each time one of these sizes waschanged, the change was performed in isolation; the rest ofthe system used the default values in Table 2. Figure 9 plotsthe average time taken per iteration in each experiment.

Increasing the number of noisers only incurs additionalround trip latencies and adds little overhead to the system.As shown in Figure 8, verification adds the largest overheadto our system, with a large performance penalty when 10verifiers are used. Lastly, the performance of the system im-proves when the number of aggregators increases. Since ag-gregators do not contribute updates to the ledger at a giveniteration, this reduces the number of contributors, which low-ers the aggregate amount of communication and verificationin the system. Additionally, increasing the number of aggre-gators makes it easier for a sufficient number of secret sharesto be collected in any given iteration, reducing the amount oftime taken to create the block.

0 20 40 60 80 100Iterations

0.0

0.2

0.4

0.6

0.8

1.0

Trai

ning

Erro

r

4 nodes/minute2 nodes/minute1 node/minute

Figure 10: Evaluating the impact of churn on the perfor-mance of the model. Churn was injected by failing/joiningnodes at a pre-determined rate. Selecting which peer to failis randomly sampled at uniform, and a new node joins thesystem at the same rate.

6.4 Training with node churnA key feature of Biscotti’s P2P design is that it is resilient tonode churn (node joins and failures). For example, the fail-ure of any Biscotti node does not block or prevent the systemfrom converging. We evaluate Biscotti’s resilience to peerchurn by performing an MNIST deployment with 50 peers,failing a peer at random at a specified rate. For each specifiedtime interval, a peer is chosen randomly and is killed. In thenext time interval, a new peer joins the network and is boot-strapped, maintaining a constant total number of 50 peers.Figure 10 shows the rate of convergence for varying failurerates of 4, 2, 1 peers failing and joining per minute. Whena verifier or aggregator fails, Biscotti defaults to the next it-eration, which appears as periods of inactivity/flat regions inFigure 10.

Even with churn Biscotti is able to make progress towardsthe global objective, albeit with some delay. Increasing thefailure rate does not harm Biscotti’s ability to converge; Bis-cotti is resilient to churn rates up to 4 nodes per minute (1node joining and 1 node failing every 15 seconds).

6.5 Biscotti defense mechanismsWe evaluate Biscotti’s ability to resolve the challengespresented in Section 2: security (from poisoning), privacy(using secure aggregation), and privacy (using noise).

Defending against a poisoning attack. We deploy Bis-cotti and federated learning and subject both systems to apoisoning attack while training on the Credit Card dataset.We introduce 48% of peers into the system with a maliciousdataset: all labels in their local dataset are flipped. A suc-cessful attack results in a final global model that predictscredit card defaults to be normal, and normal credit card billsto be defaults, resulting in a high error. Figure 11 shows theresulting test error while training on federated learning and

10

0 20 40 60 80 100Iterations

0.0

0.2

0.4

0.6

0.8

1.0

Valid

atio

n Er

ror

Federated learning3 verifiers5 verifiers

Figure 11: Comparing federated learning and Biscotti’straining error across iterations when 48% of the total stake(peers) in the system is malicious

0.0 0.1 0.2 0.3 0.4 0.5Percentage of colluders in the system

0.00

0.02

0.04

0.06

0.08

0.10

Prob

abilit

y of

unm

aske

d up

date

s

# Noisers = 3# Noisers = 5# Noisers = 10

Figure 12: Evaluating the frequency of information leakagesin Biscotti when under attack by a set of colluding adver-saries

Biscotti when 48% of the total stake is malicious.In federated learning, no defense mechanisms exist, and

thus a poisoning attack creates a tug or war between honestpeers and malicious peers. The performance of the model(top line in Figure 11) jostles between the global objec-tive and the poisoning objective, without the model makingprogress towards reaching either of the objectives.

The combination of RONI verification and the stake up-date function ensures that honest peers gain influence overtime. Each time a malicious client’s update is rejected, theirstake in the system is penalized. In the above experiments,after 100 iterations (at the end of the training), the stake ofhonest clients went up from 52% to 58%.

Defending against an information leakage attack. We alsoevaluate a proposed attack on the noising protocol, whichaims to de-anonymize peer gradients. This attack is per-formed when a verifier colludes with several malicious peers.When bootstrapping the system, the malicious peers pre-commit noise that sums to 0. As a result, when a noisingVRF selects these noise elements for verification, the masked

gradient is not actually masked with any ε noise, allowing amalicious peer to perform an information leakage attack onthe victim.

We evaluated the effectiveness of this attack by deployingBiscotti with 100 peers and varying the proportion of mali-cious users (and malicious stake) in the system. Each mali-cious peer is bootstrapped in the system with zero noise, andperforms an information leakage whenever they are chosenas a verifier, and the total added noise is zero. We measurethe probability of a privacy violation occurring in this de-ployment, and Figure 12 shows the probability of a privacyviolation when the proportion of malicious peers increases.

When the number of noisers for an iteration is 3, an ad-versary requires needs at least 15% of stake to successfullyunmask an SGD update. This trend continues when 5 noisersare used: over 30% of the stake must be malicious. When thenumber of noisers is 10 (which has minimal additional over-head according to Figure 9), privacy violations do not occureven with 50% of malicious stake. By using a stake-basedVRF for selecting noising clients, Biscotti prevents adver-saries from performing information leakage attacks on otherclients unless their proportion of stake in the system is over-whelmingly large.

7 Limitations

RONI limitations. We used RONI for our poisoning detec-tion mechanism in Biscotti. However, for RONI to work ef-fectively, the data distribution among the clients should ide-ally be IID. This ensures that the verifiers in each round arequalified to evaluate the quality of updates by having datafor classes that are included in the update. Applications liketraining a collective facial recognition system where the datahas a bimodal distribution (e.g., every client possesses databelonging to a different class) would not make significantprogress in Biscotti because updates would be frequently re-jected by RONI.

Fortunately, Biscotti is compatible with other poisoningdetection approaches. In particular, Biscotti can integrateany mechanism that can deal with masked updates and doesnot require history of updates (since previous updates areaggregated in the blockchain).

Stake limitations and data diversity. The stake that a clientpossesses plays a significant role in determining the chancethat a node is selected as a noiser/verifier/aggregator. Forverifiers this is motivated by the fact that the more contribu-tions you make to the training, the better qualified you are inevaluating the quality of future updates. However, this naiveverifier selection means that some unpoisoned updates mightbe rejected because a verifier may be insufficiently qualifiedto evaluate the quality of those particular updates.

In our future work we plan to implementing a stake mech-anism that is based on the diversity of data that a peer con-

11

tributes to the system. This data diversity measure wouldallow a peer to be matched with a verifier with similardata/classes so that the effectiveness of the update can beestablished more accurately.

Biscotti’s use of stake does not resolve fundamental issuewith the stake mechanism, which we consider to be outsidethe scope of our design. For example, Biscotti’s stake mech-anism suffers from the classic problem of nothing at stakeand long range attacks.

8 Related work

We review the broader literature on secure, private, anddistributed ML. Many solutions defend against poisoningattacks and information leakage attacks separately, butdue to the tradeoff between accountability and privacy, nosolution is able to defend against both in a P2P setting.

Securing ML. As an alternative defense against poisoningattacks, Baracaldo et al. [7] employ data provenance as ameasure against poisoning attacks. The history of trainingdata in the system is tracked, and when an anomalous datasource is detected, the system removes information fromanomalous sources of data. This system requires a systemto track data provenance, but if this exists, Biscotti could po-tentially encode this information into the block structure andretroactively retrain models upon detection of an anomaloussource.

AUROR [40] and ANTIDOTE [37] are other systemsto defend multi-party ML systems from poisoning attacks.However, these attacks rely on anomaly detection methods,which require centralized access to information in the systemand are infeasible in a P2P setting.

Biscotti is designed for the P2P setting: models, trainingdata and computational resources are not housed in a centrallocation and there does not exist a single point of control inthe training process.

By using RONI, Biscotti increases its false positive ratein detecting malicious updates but is able to decentralize theinformation required in detection poisoning attacks, whileproviding data and SGD update privacy.

Privacy-preserving ML. Cheng et al. recently proposed us-ing TEEs and privacy-preserving smart contracts to providesecurity and privacy to multi-party ML tasks [13]. But, someTEEs have been found to be vulnerable [42]; Biscotti doesnot use TEEs.

Other solutions employ two party computation [30] andencrypt both the model parameters and the training datawhen performing multi-party ML. However, this requires anassumption about the two party setting, and systems in thisspace do not scale well to P2P settings.

Differential privacy is another technique that is often ap-plied to multi-party ML systems to provide privacy [18, 4,

41], but this introduces privacy-utility tradeoffs. Since Bis-cotti does not apply the noised updates to the model, we donot need to consider the privacy-utility tradeoff when per-forming distributed SGD. However, the value of ε affectsthe error rate in verification.

Biscotti does not rely on the assumptions made by priorwork to provide privacy and is novel in simultaneously han-dling poisoning attacks and in preserving privacy.

9 Conclusion

The emergence of large scale multi-party ML workloads anddistributed ledgers for scalable consensus have produced tworapid and powerful trends; Biscotti’s design lies at their con-fluence. To the best of our knowledge, this is the first sys-tem to provide privacy-preserving peer-to-peer ML througha distributed ledger, while simultaneously considering poi-soning attacks. And, unlike prior work in distributed ledgersto facilitate privacy-preserving ML, Biscotti does not rely ontrusted execution environments or specialized hardware. Inour evaluation we demonstrated that Biscotti can coordinatea collaborative learning process across 100 peers and pro-duces a final model that is similar in utility to state of the artfederated learning alternatives. We also illustrated its abilityto withstand poisoning and information leakage attacks, andfrequent failures and joining of nodes (one node joining, onenode leaving every 15 seconds).

Acknowledgments

We would like to thank Heming Zhang for his help withbootstrapping the deployment of Biscotti on PyTorch. Wewould also like to thank Gleb Naumenko, Nico Ritschel,Jodi Spacek, and Michael Hou for their work on the ini-tial Biscotti prototype. This research has been sponsored bythe Huawei Innovation Research Program (HIRP), ProjectNo: HO2018085305. We also acknowledge the support ofthe Natural Sciences and Engineering Research Council ofCanada (NSERC), 2014-04870.

References[1] A CONIKS Implementation in Golang. https://github.com/

coniks-sys/coniks-go, 2018.

[2] DEDIS Advanced Crypto Library for Go. https://github.com/

dedis/kyber, 2018.

[3] go-python. https://github.com/sbinet/go-python, 2018.

[4] ABADI, M., CHU, A., GOODFELLOW, I., MCMAHAN, B.,MIRONOV, I., TALWAR, K., AND ZHANG, L. Deep learning withdifferential privacy. In 23rd ACM Conference on Computer and Com-munications Security (2016), CCS.

[5] AZARIA, A., EKBLAW, A., VIEIRA, T., AND LIPPMAN, A. Medrec:Using blockchain for medical data access and permission manage-ment. OBD ’16.

12

https://github.com/coniks-sys/coniks-go

https://github.com/coniks-sys/coniks-go

https://github.com/dedis/kyber

https://github.com/dedis/kyber

https://github.com/sbinet/go-python

[6] BAGDASARYAN, E., VEIT, A., HUA, Y., ESTRIN, D., ANDSHMATIKOV, V. How To Backdoor Federated Learning. ArXiv e-prints (2018).

[7] BARACALDO, N., CHEN, B., LUDWIG, H., AND SAFAVI, J. A. Mit-igating poisoning attacks on machine learning models: A data prove-nance based approach. In Proceedings of the 10th ACM Workshop onArtificial Intelligence and Security (2017), AISec.

[8] BARRENO, M., NELSON, B., JOSEPH, A. D., AND TYGAR, J. D.The Security of Machine Learning. Machine Learning 81, 2 (2010).

[9] BIGGIO, B., NELSON, B., AND LASKOV, P. Poisoning AttacksAgainst Support Vector Machines. In Proceedings of the 29th Inter-national Coference on International Conference on Machine Learning(2012), ICML.

[10] BONAWITZ, K., IVANOV, V., KREUTER, B., MARCEDONE, A.,MCMAHAN, H. B., PATEL, S., RAMAGE, D., SEGAL, A., ANDSETH, K. Practical secure aggregation for privacy-preserving ma-chine learning. In Proceedings of the 2017 ACM SIGSAC Conferenceon Computer and Communications Security (2017), CCS.

[11] BOTTOU, L. Large-Scale Machine Learning with Stochastic GradientDescent. COMPSTAT ’10. 2010.

[12] CASTRO, M., AND LISKOV, B. Practical byzantine fault tolerance.In Proceedings of the Third Symposium on Operating Systems Designand Implementation (1999), OSDI ’99.

[13] CHENG, R., ZHANG, F., KOS, J., HE, W., HYNES, N., JOHNSON,N., JUELS, A., MILLER, A., AND SONG, D. Ekiden: A platform forconfidentiality-preserving, trustworthy, and performant smart contractexecution. ArXiv e-prints (2018).

[14] DEAN, J., CORRADO, G. S., MONGA, R., CHEN, K., DEVIN, M.,LE, Q. V., MAO, M. Z., RANZATO, M., SENIOR, A., TUCKER, P.,YANG, K., AND NG, A. Y. Large scale distributed deep networks. InProceedings of the 25th International Conference on Neural Informa-tion Processing Systems - Volume 1 (2012), NIPS.

[15] DHEERU, D., AND KARRA TANISKIDOU, E. UCI machine learningrepository, 2017.

[16] DOUCEUR, J. J. The sybil attack. IPTPS ’02.

[17] DWORK, C., AND ROTH, A. The Algorithmic Foundations of Dif-ferential Privacy. Foundations and Trends in Theoretical ComputerScience 9, 3-4 (2014).

[18] GEYER, R. C., KLEIN, T., AND NABI, M. Differentially private fed-erated learning: A client level perspective. NIPS Workshop: MachineLearning on the Phone and other Consumer Devices (2017).

[19] GILAD, Y., HEMO, R., MICALI, S., VLACHOS, G., AND ZEL-DOVICH, N. Algorand: Scaling byzantine agreements for cryptocur-rencies. In Proceedings of the 26th Symposium on Operating SystemsPrinciples (2017), SOSP.

[20] GU, T., DOLAN-GAVITT, B., AND GARG, S. BadNets: IdentifyingVulnerabilities in the Machine Learning Model Supply Chain. ArXive-prints (2017).

[21] HITAJ, B., ATENIESE, G., AND PEREZ-CRUZ, F. Deep Models Un-der the GAN: Information Leakage from Collaborative Deep Learn-ing. In Proceedings of the 2017 ACM SIGSAC Conference on Com-puter and Communications Security (2017), CCS.

[22] HUANG, L., JOSEPH, A. D., NELSON, B., RUBINSTEIN, B. I., ANDTYGAR, J. D. Adversarial Machine Learning. In Proceedings ofthe 4th ACM Workshop on Security and Artificial Intelligence (2011),AISec.

[23] KATE, A., AND ZAVERUCHA, GREGORY M.AND GOLDBERG, I.Constant-size commitments to polynomials and their applications.

[24] LECUN, Y., BOTTOU, L., BENGIO, Y., AND HAFFNER, P. Gradient-based learning applied to document recognition. Proceedings of theIEEE 86, 11 (1998).

[25] MAXWELL, G., POELSTRA, A., SEURIN, Y., AND WUILLE, P. Sim-ple schnorr multi-signatures with applications to bitcoin. CryptologyePrint Archive, Report 2018/068, 2018.

[26] MCMAHAN, H. B., MOORE, E., RAMAGE, D., HAMPSON, S., ANDY ARCAS, B. A. Communication-Efficient Learning of Deep Net-works from Decentralized Data. In Proceedings of the 20th Inter-national Conference on Artificial Intelligence and Statistics (2017),AISTATS.

[27] MELIS, L., SONG, C., DE CRISTOFARO, E., AND SHMATIKOV,V. Inference Attacks Against Collaborative Learning. ArXiv e-prints(2018).

[28] MICALI, S., VADHAN, S., AND RABIN, M. Verifiable random func-tions. In Proceedings of the 40th Annual Symposium on Foundationsof Computer Science (1999), FOCS.

[29] MILLER, A., JUELS, A., SHI, E., PARNO, B., AND KATZ, J. Perma-coin: Repurposing bitcoin work for data preservation. In Proceedingsof the 2014 IEEE Symposium on Security and Privacy (2014), S&P.

[30] MOHASSEL, P., AND ZHANG, Y. Secureml: A system for scalableprivacy-preserving machine learning, 2017.

[31] NAKAMOTO, S. Bitcoin: A peer-to-peer electronic cash system.

[32] NARULA, N., VASQUEZ, W., AND VIRZA, M. zkledger: Privacy-preserving auditing for distributed ledgers. In 15th USENIX Sym-posium on Networked Systems Design and Implementation (2018),NSDI.

[33] NELSON, B., BARRENO, M., CHI, F. J., JOSEPH, A. D., RUBIN-STEIN, B. I. P., SAINI, U., SUTTON, C., TYGAR, J. D., AND XIA,K. Exploiting Machine Learning to Subvert Your Spam Filter. InProceedings of the 1st Usenix Workshop on Large-Scale Exploits andEmergent Threats (2008), LEET.

[34] PASZKE, A., GROSS, S., CHINTALA, S., CHANAN, G., YANG, E.,DEVITO, Z., LIN, Z., DESMAISON, A., ANTIGA, L., AND LERER,A. Automatic differentiation in pytorch. In NIPS Autodiff Workshop(2017).

[35] PEDERSEN, T. P. Non-interactive and information-theoretic secureverifiable secret sharing. In Proceedings of the 11th Annual Interna-tional Cryptology Conference (1992), CRYPTO.

[36] RECHT, B., RE, C., WRIGHT, S., AND NIU, F. Hogwild: A lock-freeapproach to parallelizing stochastic gradient descent. In Advances inNeural Information Processing Systems 24, NIPS. 2011.

[37] RUBINSTEIN, B. I., NELSON, B., HUANG, L., JOSEPH, A. D., LAU,S.-H., RAO, S., TAFT, N., AND TYGAR, J. D. ANTIDOTE: Under-standing and Defending Against Poisoning of Anomaly Detectors. InProceedings of the 9th ACM SIGCOMM Conference on Internet Mea-surement (2009), IMC.

[38] SCHNORR, C. P. Efficient identification and signatures for smartcards. In Advances in Cryptology — CRYPTO’ 89 Proceedings (1990).

[39] SHAMIR, A. How to share a secret. Communications of the ACM(1979).

[40] SHEN, S., TOPLE, S., AND SAXENA, P. Auror: Defending againstpoisoning attacks in collaborative deep learning systems. In Proceed-ings of the 32nd Annual Conference on Computer Security Applica-tions (2016), ACSAC.

[41] SHOKRI, R., AND SHMATIKOV, V. Privacy-preserving deep learning.In Proceedings of the 2015 ACM SIGSAC Conference on Computerand Communications Security (2015), CCS.

[42] VAN BULCK, J., MINKIN, M., WEISSE, O., GENKIN, D., KASIKCI,B., PIESSENS, F., SILBERSTEIN, M., WENISCH, T. F., YAROM, Y.,AND STRACKX, R. Foreshadow: Extracting the keys to the Intel SGXkingdom with transient out-of-order execution. In Proceedings of the27th USENIX Security Symposium (2018), USENIX SEC.

13

Appendix A Distributed SGD

Given a set of training data, a model structure, and a pro-posed learning task, ML algorithms train an optimal set ofparameters, resulting in a model that optimally performs thistask. In Biscotti, we assume stochastic gradient descent(SGD) [11] as the optimization algorithm.

In federated learning [26], a shared model is updatedthrough SGD. Each client uses their local training data andtheir latest snapshot of the shared model state to compute theoptimal update on the model parameters. The model is thenupdated and clients update their local snapshot of the sharedmodel before performing a local iteration again. The modelparameters w are updated at each iteration i as follows:

wt+1 = wt −ηt(λwt +1b ∑(xi,yi)∈Bt

∇l(wt ,xi,yi)) (1a)

where ηt represents a degrading learning rate, λ is a regular-ization parameter that prevents over-fitting, Bt represents agradient batch of local training data examples (xi,yi) of sizeb and ∇l represents the gradient of the loss function.

SGD is a general learning algorithm that can be used totrain a variety of models, including deep learning [11]. Atypical heuristic involves running SGD for a fixed number ofiterations or halting when the magnitude of the gradient fallsbelow a threshold. When this occurs, model training is con-sidered complete and the shared model state wt is returnedas the optimal model w∗.

In a multi-party ML setting federated learning assumesthat clients possess training data that is not identically andindependently distributed (non-IID) across clients. In otherwords, each client possesses a subset of the global datasetthat contains specific properties distinct from the global dis-tribution.

When performing SGD across clients with partitioneddata sources, we redefine the SGD update ∆i,twg at iterationt of each client i to be:

∆i,twg = ληtwg +ηt

b ∑(x,y)∈Bi,t

∇l(wg,x,y) (1b)

where the distinction is that the gradient is computed on aglobal model wg, and the gradient steps are taken using alocal batch Bi,t of data from client i. When all SGD updatesare collected, they are averaged and applied to the model,resulting in a new global model. The process then proceedsiteratively until convergence.

To increase privacy guarantees in federated learning, se-cure aggregation protocols have been added to the centralserver [10] such that no individual client’s SGD update is di-rectly observable by server or other clients. However, thisrelies on a centralized service to perform such an aggrega-tion and does not provide security against adversarial attackson ML.

Appendix B Differentially private SGD

Data: Batch size b, Learning rate ηt , Privacy budget ε ,Expected update dimension d

Result: Precommited noise for an SGD update ζt ∀Tfor iteration t ∈ [1..T ] do

// Sample noise of length d for each

expected sample in batch

for Example i ∈ [1..b] doSample noise ζi = N (0,σ2I)

endζt =

ηtb ∑i ζi

Commit ζt to the genesis block at column t.end

Algorithm 1: Precommiting differentially Private noise forSGD, taken from Abadi et al. [4].

We use the implementation of differential privacy as ex-plained by Abadi et al. [4], which samples normally dis-tributed noise at each iteration, shown in Algorithm 1. Eachclient commits noise to the genesis block for all expected it-erations T . Abadi’s solution also requires that the norm ofthe gradients be clipped to have a maximum norm of 1, weassume this to also be the case when performing SGD in Bis-cotti.

This precommited noise is designed such that a neutralthird party aggregates a client update ∆i,twg from Equation(1b) and precommitted noise ζt from Algorithm 1 withoutany additional information. The noise is generated withoutany prior knowledge of the SGD update it will be appliedto while retaining the computation and guarantees providedby prior work. The noisy SGD update ∆i,twg follows fromaggregation:

∆i,twg = ∆i,twg +ζt

Appendix C Polynomial Commitments andVerifiable Secret Sharing

Polynomial Commitments [23] is a scheme that allows com-mitments to a secret polynomial for verifiable secret sharing[39]. This allows the committer to distribute secret sharesfor a secret polynomial among a set of nodes along with wit-nesses that prove in zero-knowledge that each secret sharebelongs to the committed polynomial. The polynomial com-mitment is constructed as follows:

Given two groups G1 and G2 with generators g1 and g2of prime order p such that there exists a asymmetric bilinearpairing e : G1×G2 → GT for which the t-SDH assumptionholds, a commitment public key (PK) is generated such thatPK = {g,gα ,g(α)2

, ...,g(α)t} ∈ Gt+11 where α is the secret

key. The committer can create a commitment to a polyno-mial φ(x) = ∑

tj=0 φ jx j of degree t using the commitment PK

14

such that:

COMM(PK,φ(x)) =deg(φ)

∏j=0

(gα j)φ j

Given a polynomial φ(x) and a commitmentCOMM(φ(x)), it is trivial to verify whether the com-mitment was generated using the given polynomial ornot. Moreover, we can multiply two commitments toobtain a commitment to the sum of the polynomials in thecommitments by leveraging their homomorphic property:

COMM(φ1(x)+φ2(x)) =COMM(φ1(x))∗COMM(φ2(x))

Once the committer has has generated COMM(φ(x)), itcan carry out a (n, t) - secret sharing scheme to share thepolynomial among a set of n participants in such a way thatin the recovery phase a subset of at least t participants cancompute the secret polynomial. All secret shares (i,φ(i))shared with the participants are evaluations of the polyno-mial at a unique point i and are accompanied by a com-mitment to a witness polynomial COMM(ψi(x)) such thatψi(x) =

φ(x)−φ(i)x−i . By leveraging the divisibility property of

the two polynomials {φ(x),ψ(x)} and the bilinear pairingfunction e, it is trivial to verify that the secret share comesfrom the committed polynomial [23]. This is carried out byevaluating whether the following equality holds:

e(C,g2)?= e(witm,i,z,

gα2

gz2)e(g1,g2)

∆wi(z)

If the above holds, then the share is accepted. Otherwise,the share is rejected.

This commitment scheme is unconditionally binding andcomputationally hiding given the Discrete Logarithm as-sumption holds [23]. This scheme can be easily extended toprovide unconditional hiding by combining it with anotherscheme called Pedersen commitments [35] however we donot implement it.

15

biscotti: a ledger for private and secure peer-to-peer ... · strates that biscotti is scalable,...

Documents