robust network supercomputing with malicious processes (reliably executing tasks upon estimating the...

37
Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar* Sanguthevar Rajasekaran Alexander A. Shvartsman *Computer Science & Engineering Department University of Connecticut Storrs, CT

Upload: angela-patrick

Post on 12-Jan-2016

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

Robust Network Supercomputing with Malicious Processes

(Reliably Executing Tasks Upon Estimating the Number of Malicious

Processes)Kishori M. Konwar*

Sanguthevar RajasekaranAlexander A. Shvartsman

*Computer Science & Engineering DepartmentUniversity of Connecticut

Storrs, CT

Page 2: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

2

Motivation Internet supercomputing is increasingly becoming a powerful tool for harnessing massive amounts of computational resources

availability of high bandwidth Internet connections there is an enormous number of processes around the world comes at a cost substantially lower than acquiring a supercomputer or building a cluster of powerful machines

Page 3: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

3

`

Master

Worker

Workers

Worker

Worker

Worker

Worker

Page 4: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

4

`

Master

Worker

Workers

Worker Worker

Worker

Worker

TASKS

Page 5: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

5

`

Master

Worker

Workers

Worker Worker

Worker

Worker

Page 6: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

6

PrimeNet Server PrimeNet Server is a distributed, massively parallel scientific

computing Internet Supercomputer

Supported by Entropia.com and ranks among the most powerful computers in the world

A project comprised of about 30,000 PCs and laptops

Currently sustains a 22,296 billion floating point operations per second (gigaflops) (operations that involve fractional

numbers )

Page 7: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

7

SETI@home

SETI@home project a massive distributed cooperative computer

Used for analysis of gigabytes of data for Search for Extraterrestrial Intelligence (SETI)

Comprises of millions of voluntary machines around

SETI@home project reported its speed to be more than 57,290 billion floating point operations per second

Page 8: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

8

Reliability Issues The master and perhaps certain workers are reliable

they will correctly execute the tasks assigned by the server

However, workers are commonly unreliable they may return to the master incorrect results due to

unintended failures caused, e.g., by over-clocked processors

may deceivingly claim to have performed assigned work so as to obtain incentive such as getting higher rank

Page 9: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

9

`

Master

Worker

Workers

Worker Worker

Worker

Worker

Page 10: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

10

Some Previous Studies [FGLS05] Assumed the worker processes

might act maliciously and hence deliberately return wrong results. goal is to design algorithm that enable the

master to accept correct results with high probability at a lower cost

they provided a randomized algorithm unfortunately the cost complexity results

depend on several parameters and hard to interpret

Page 11: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

11

Some Previous Studies (cont’d)

[GM05] considered the problem of maximizing the expected number of correct result the tasks are dependent any worker computes correctly with probability p < 1

any incorrectly computed task corrupts all dependent tasks

the goal is to compute a schedule that maximizes expected number of correct results under a given time constraint

they showed the optimization problem to be NP-hard provided some solutions on a restricted DAG

Page 12: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

12

Overview

Models of Computation Stopping Rule Algorithm based solution Detection of Faulty Processors Performing Tasks with Faulty Workers Conclusions

Page 13: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

13

Overview

Models of Computation Stopping Rule Algorithm based solution Detection of Faulty Processors Performing Tasks with Faulty Workers Conclusions

Page 14: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

14

Models of Computation Processes takes steps in lock steps, i.e., in synchrony

Processes communicate by exchanging messages

The tasks are independent and idempotent

Processes are subject to failures and can return incorrect results maliciously

Workers, P = {1,2, . . ., n} and a master M

Page 15: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

15

Work Complexities

[CDS01] defined as work complexity or available processor steps

All steps taken by processes during execution of the algorithm are counted including the steps of the idling and waiting non-faulty processes

work [DHW92] define work as the number of performed tasks

counting multiplicities Approach does not charge for idling and waiting this is

called task oriented work

work

Page 16: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

16

Few Comments

work

We say that an even E occurs with high probability (w.h.p.) to mean that Pr[E] = 1 – O(n -) for some constant > 0.

work

Page 17: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

17

Modeling Failures

Failure model Fa

f-fraction, 0 < f < ½ of the n workers may fail

Each possibly faulty worker independently exhibits faulty behavior with probability

0 < p < ½. The master has no a priori knowledge of f and p.

Page 18: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

18

Modeling Failures (cont’d)

Failure model Fb

There is a fixed bound on the f-fraction, 0 < f < ½ of the n workers that can be faulty

Any worker from the remaining (1-f)-fraction of the workers fails with probability 0 < p <1/2 independently of other workers

The master knows the values of f and p.

Page 19: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

19

Algorithmic Template procedure for master process M, task T

Choose a set S P

Send task T to each processor p S

Wait for the results from the processes in S

Decide on the result value v from the responses

procedure for worker w P

Wait to receive a task from master M

Upon receiving a task from M

Execute the task

Send the result to M

Page 20: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

20

Overview Models of Computation Stopping Rule Algorithm based solution Detection of Faulty Processors Performing Tasks with Faulty Workers Conclusions

Page 21: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

21

(, )-approximation algorithm

Z is a random variable distributed in the interval [0,1] with mean Z

Z1, Z2, Z3 .... are independently and identically distributed according to the random variable Z

An (, )-approximation algorithm, with 0 < < 1,

> 0 for estimating Z satisfies

Pr[Z (1- ) Z (1+ ) ] > 1 -

where is the estimated value of Z

~

Z~Z~

Page 22: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

22

Stopping Rule Algorithm

[Dagum, Karp, Luby, and Ross 1995]

Input Parameters (, ) with 0 < < 1, > 0

Let 1 = 1 + (1+ ) // = 0.72 & = 4 log(2/ )/2

Initialize N 0 , S 0

While S < 1 do: N N+1, S S + ZN

Output: Z 1 /N~

Page 23: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

23

Stopping Rule TheoremTheorem (Stopping Rule Theorem) [Dagum, Karp, Luby, and Ross]

Let Z be a random variable in [0,1] with Z = E[Z] > 0. Let

be the estimate produced and let NZ be the number of experiments that SRA runs with respect to Z on input and . Then,

(i) Pr[Z (1- ) Z (1+ ) ] > 1 -

(ii) E[NZ ] 1 /Z and

(iii) Pr[NZ >(1+ ) 1 /Z ] /2

Z~

Z~

Page 24: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

24

Algorithm Af,p to estimate f and p

Page 25: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

25

Work Complexity of Af,p

Theorem: Algorithm Af,p is an (, )-approximation algorithm,

0 < < 1, > 0, for the estimation of f and p with work

complexity O(log2n), complexity O(n log n), message

complexity O(log2 n) and time complexity O(log n), with high

probability.

work

Page 26: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

26

Overview

Models of Computation Stopping Rule Algorithm based solution Detection of Faulty Processors Performing Tasks with Faulty Workers Conclusions

Page 27: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

27

Detection of Faulty Processors

Lemma: It is not possible to perform all the n

tasks correctly, in the failure model Fa with linear

complexity (i.e., O(n)) with high probability.work

Page 28: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

28

Detection of Faulty Processors procedure for master process M

Initially, F For t = 0, …. k log n, k > 0

Choose a set S P \ F

Send each process p S “test” task

Wait for the results from the processes in S

If the response is faulty

F F {p: p is a faulty process}

End If

End For

Page 29: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

29

Detection of Faulty Processors Lemma: The algorithm detects all faulty processes among

the n workers in O(log n) time with O(n) work with high

probability

Theorem[Karp 04]: Suppose that a(x) is a non-decreasing,

continuous function that is strictly increasing on {x | a(x) >0},

and m(x) is a continuous function. Then for every positive real x

and every positive integer t,

Pr[T(x) > u(x) + ta(x)] (m(x)/x)t

where u(x) is the solution to the equation u(x)=a(x) + u(m(x))

with m0(x) :=0 and mi+1(x):= m(mi(x)).

Page 30: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

30

Overview

Models of Computation Stopping Rule Algorithm based solution Detection of Faulty Processors Performing Tasks with Faulty Workers Conclusions

Page 31: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

31

Performing Tasks under Faprocedure for master process M:

Initially, C , J set of n tasks Randomly choose a set, possibly with repetition, SP, |S|=kn/log n

workers k>0 is a constant For i = 1, …, k' log n, k' > 0 Send to each worker pS a “test” task Collect the responses from all the workers. End For If all the responses from a worker pS are correct then C C {p} End if For i=1, …, n/|C| Send |C| jobs from J, not sent in previous iteration, one to each

worker in C. Collect the responses from the C workers End For

Page 32: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

32

Work and Time Complexities

Theorem: The algorithm performs all n tasks correctly in

O(log n) time and has O(n) work and complexities,

with high probability.

work

Page 33: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

33

Overview

Models of Computation Stopping Rule Algorithm based solution Detection of Faulty Processors Performing Tasks with Faulty Workers Conclusions

Page 34: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

34

Performing Tasks under Fbprocedure for master process M,

For t = 0, …. k log n, k > 0

Choose a random permutation R Sn

Foreach j [n] Send task to processor (j) End For Collect the responses from all the workers End For Foreach j [n] Choose the majority of the results of computation for task as the result End For

Page 35: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

35

Work and Time Complexities

Theorem: The algorithm performs all n tasks correctly in

O(log n) time and has and work complexities O(n log n),

for 0 < p, f < ½ and (1- f)(1- p) > ½ with high probability

work

Page 36: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

36

Overview

Models of Computation Stopping Rule Algorithm based solution Detection of Faulty Processors Performing Tasks with Faulty Workers Conclusions

Page 37: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*

37

Conclusions

Perform tasks under above models where the tasks are dependent The dependency graph can be DAG Quantify work and time complexities on some characteristics of the DAG