lecture 9-cs648-2013 randomized algorithms

38
Randomized Algorithms CS648 Lecture 9 Random Sampling part-I (Approximating a parameter) 1

Upload: anshul-yadav

Post on 26-Jun-2015

58 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Lecture 9-cs648-2013 Randomized Algorithms

Randomized AlgorithmsCS648

Lecture 9Random Sampling

part-I(Approximating a parameter)

1

Page 2: Lecture 9-cs648-2013 Randomized Algorithms

Overview of the Lecture

Randomization Framework for estimation of a parameter1. Number of balls from a bag2. Size of transitive closure of a directed graph

• An Inspirational Problem from Continuous probability

Page 3: Lecture 9-cs648-2013 Randomized Algorithms

AN INSPIRATIONAL PROBLEM FROM CONTINUOUS PROBABILITY

Page 4: Lecture 9-cs648-2013 Randomized Algorithms

Question: points are selected randomly uniformly and independently from interval [0,1]. What is the expected value of the smallest number ?

We shall solve many problems dealing with random points in an interval [0,1] in this course. But we won’t require any knowledge of continuous probability theory .All we shall require is the following fact which is quite obvious:

P(point belongs to an interval of length )=

0 1𝑝 𝑝𝑝

Page 5: Lecture 9-cs648-2013 Randomized Algorithms

0 1

Sampling points on a line segment

Question: What is E[ ?Answer: It appears to depend upon .

𝐗3 𝐗4𝐗𝑛+1𝐗𝑛𝐗1 𝐗2

0 1

Page 6: Lecture 9-cs648-2013 Randomized Algorithms

Sampling points on a Circle (of circumference 1)

Question: What is E[ ?

By symmetry of the circle, each has identical probability distribution. E[]=E[]= … = E[]E[+]= ?? E[]= ??

𝐗2

𝐗3

𝐗4

𝐗𝑛

𝐗𝑛− 1𝐗1

1

1/𝑛

Page 7: Lecture 9-cs648-2013 Randomized Algorithms

Transforming a line segment to a circle(just a different perspective)

The knot formed by joining the ends of the line segment

Give the knot a uniformly random

rotation around the circle

Page 8: Lecture 9-cs648-2013 Randomized Algorithms

Transforming a line segment to a circle(just a different perspective)

Selecting points randomly uniformly on a unit line

Selecting points randomly uniformly on a unit circle.

𝐗2

𝐗3

𝐗4

𝐗𝑛+1

𝐗𝑛 𝐗1

First uniformly random point is the

knot.

The next points are the usual

points on the line segment.

Page 9: Lecture 9-cs648-2013 Randomized Algorithms

0 1

We have got the answer of the problem(without any knowledge of continuous probability theory)

Question: What is E[ ?

E[] = … = E[] = … = E[] =

𝐗3 𝐗4𝐗𝑛+1𝐗𝑛𝐗1 𝐗2

0 1

Page 10: Lecture 9-cs648-2013 Randomized Algorithms
Page 11: Lecture 9-cs648-2013 Randomized Algorithms

ESTIMATING THE NUMBER OF BALLS IN A BAG

Page 12: Lecture 9-cs648-2013 Randomized Algorithms

Estimating the number of Balls in a BAG

• There is a bag containing balls.• , the number of balls is unknown.• Each ball has a unique label from [1, ].

AIM: To estimate accurately and with high probability.

For example:“Report a number such that with probability at least 99%,

TOOL: Sampling

4

t

12

3

5

n

j

q :c:

i

l

l:

:

:

:::

Page 13: Lecture 9-cs648-2013 Randomized Algorithms

Estimating the number of Balls in a BAG

IDEA: The label of a sample ball provides some info.

X: random variable for the label of a ball sampled randomly uniformly from bag.Question: What is E[X] ?Answer: E[X] =

4

t

12

3

5

n

j

q :c:

i

l

l:

:

:

:::

Can we use it to design an algorithm ?

Page 14: Lecture 9-cs648-2013 Randomized Algorithms

Estimating the number of Balls in a BAG

A simple algorithm:1. Pick a ball randomly and uniformly from the

bag.2. Let be its label.3. .4. Report .

4

t

12

3

5

n

j

q :c:

i

l

l:

:

:

:::

Page 15: Lecture 9-cs648-2013 Randomized Algorithms

How good is the estimate ?

Question: What is P( ) ?Answer:

Question: How to reduce the error probability ? Answer:

2 N1 N-1

multiple sampling.

𝑁 /4

Page 16: Lecture 9-cs648-2013 Randomized Algorithms

Multiple samplings to improve accuracy and reduce error probability

Question: Which ball among the sampled balls will have label closest to ?

Question: How many of balls are expected to have label ?

Answer: .

21 N

Page 17: Lecture 9-cs648-2013 Randomized Algorithms

A better algorithm for estimating the number of balls:

1. ; // is a multiset2. Repeat times { Pick a ball randomly uniformly from the bag. Let be the its label. ; return the ball into the bag; }3. Let be the th largest label from .4. .5. Report .

Page 18: Lecture 9-cs648-2013 Randomized Algorithms

Question: What is P( ) ?

: number of balls sampled from [1 … ]P( ) = ?? is sum of Bernoulli random variables ,…, such that

P(=1) ?? E[] = = ’s are independent.Applying Chernoff Bound, P() =

P()

21 N𝑵 /𝟒

P(>)

¼

Page 19: Lecture 9-cs648-2013 Randomized Algorithms

Final result

Theorem: The randomized Monte Carlo performs sampling and reports a number such that with probability at least ,

Page 20: Lecture 9-cs648-2013 Randomized Algorithms

Randomized framework for estimating a parameter

• Let be a parameter which needs to be estimated.• Design a randomized experiment such that there is a random variable

such that

• If takes value , then return ?? as the estimate for .

To improve accuracy in estimation:• repeat the experiment times. • Let has taken value ,…, .• Calculate such that is most likely to be closest to .• Return .

𝒇 −𝟏 (𝒂 )

Page 21: Lecture 9-cs648-2013 Randomized Algorithms

ESTIMATING THE SIZE OF TRANSITIVE CLOSURE OF A DIRECTED GRAPH

Page 22: Lecture 9-cs648-2013 Randomized Algorithms

Estimating size of Transitive Closure of a Directed Graph

Let be a directed graph on vertices and edges, For any , Reach() = { | is reachable from }. = |Reach()|

Problem: Given a directed graph on vertices and edges, compute for each .

Applications: (Graph based Data bases)1. Query requires collecting information stored at nodes reachable from a given node.2. An estimate on the number of nodes reachable can be used to get an estimate on

the time (or processing) required to answer the query. 3. This estimate can be used for optimizing a set of queries to be answered.

Page 23: Lecture 9-cs648-2013 Randomized Algorithms

Estimating size of Transitive Closure of a Directed Graph

Problem: Given a directed graph on vertices and edges, compute for each .

Deterministic Algorithm1. Perform DFS/BFS from each to compute Reach().2. |Reach()|;3. Return .

Time complexity: O()

Page 24: Lecture 9-cs648-2013 Randomized Algorithms

Estimating size of Transitive Closure of a Directed Graph

Problem: Given a directed graph on vertices and edges, compute for each .

Randomized Monte Carlo Algorithm1. For any and every vertex , computes such that

() ()

2. Error Probability < for any constant 3. Time complexity: O(() )

Page 25: Lecture 9-cs648-2013 Randomized Algorithms

Randomized Monte Carlo Algorithm for estimating the size of transitive closure of directed graph

Ingredients

1. A Deterministic O() time algorithm for a problem “MinLabel”.

2. Inference from the inspirational probability problem we discussed today.

Page 26: Lecture 9-cs648-2013 Randomized Algorithms

MIN-Label Problem

Given a directed graph on vertices and edges, where each each stores a real number for each ,

Problem: Given a directed graph on vertices and edges, and array (), compute for each .

Page 27: Lecture 9-cs648-2013 Randomized Algorithms

MIN-Label Problem

Algorithm1 1. Compute : the graph obtained by reversing all edge directions.2. Sort vertices in the increasing order of their () value.3. Repeat until ?? { Pick vertex of least () value; Let it be ; Perform DFS/BFS to compute ; For each vertex , (); Remove from }

Time complexity: O()

Is empty

Page 28: Lecture 9-cs648-2013 Randomized Algorithms

MIN-Label Problem

Algorithm2 (usually many problems are easier on Directed acyclic graphs)

1. Compute Strongly connected components of .2. Build DAG (directed acyclic graph) from after converting each SCC to a

vertex.3. Solve the problem on this DAG using DFS/BFS.

Time complexity: O()

Page 29: Lecture 9-cs648-2013 Randomized Algorithms

Inference from the inspirational problem

If numbers are selected randomly uniformly and independently from [0,1], the expected value of the smallest number is = .

Question: If some numbers were selected randomly uniformly and independently from [0,1], and the smallest among them is , then what is a right guess for the numbers selected ?

Answer: ??1𝑡−1

Page 30: Lecture 9-cs648-2013 Randomized Algorithms

RANDOMIZED MONTE CARLO ALGORITHM FOR ESTIMATING THE SIZE OF

TRANSITIVE CLOSURE OF A DIRECTED GRAPH

Page 31: Lecture 9-cs648-2013 Randomized Algorithms

𝑥

Page 32: Lecture 9-cs648-2013 Randomized Algorithms

𝑥

0.45

0.71

0.220.53

0.830.38

Page 33: Lecture 9-cs648-2013 Randomized Algorithms

0.34

0.14

0.45

0.71

0.220.53

0.83

0.28

0.9010.65

0.265

0.490.54

0.74

0.38

0.81

0.63

Page 34: Lecture 9-cs648-2013 Randomized Algorithms

Estimating size of Transitive Closure of a Directed Graph

A simple algorithm:

1. Assign to each a random no. () selected uniformly and independently from [0,1].

2. Compute minL() for each ;

3. () ??

4. Return .

𝟏minL (𝒗 )

−𝟏

Page 35: Lecture 9-cs648-2013 Randomized Algorithms

Estimating size of Transitive Closure of a Directed Graph

A better algorithm:For to do

{ 1. Assign to each a random no. selected uniformly and independently from [0,1].

2. Compute minL() for each ;

3. ] minL();}

() ??

Return .

Page 36: Lecture 9-cs648-2013 Randomized Algorithms

Question 1: Which value among ], …, } is likely to be closest to ?

Question 2: How many of], …, } are likely to have value ?

Question 3: What is the probability that ] for any fixed is Answer: (Hint: for this to happen all vertices in Reach() must get () )

This probability is = ≈

0 1

Can you answer Question 2 now ?𝒌𝒆

Page 37: Lecture 9-cs648-2013 Randomized Algorithms

Estimating size of Transitive Closure of a Directed Graph

A better algorithm:For to do { 1. Assign to each a random no. selected uniformly and independently from [0,1].

2. Compute minL() for each ;

3. ] minL();}min*() (th largest value among ], …, } ;

() ;

Return .

Page 38: Lecture 9-cs648-2013 Randomized Algorithms

Homework

Use Chernoff bound to get a high probability bound on the error.Hint: Proceed along similar lines as in the case of estimating number of balls in a bag.

Make sincere attempts to do this homework. I shall discuss the same briefly in the beginning of the next class.