on the r ange m aximum-sum s egment q uery problem

34
111/03/22 111/03/22 Chen and Chao Chen and Chao 1 On the On the R R ange ange M M aximum-Sum aximum-Sum S S egment egment Q Q uery Problem uery Problem Kuan-Yu Chen and Kun-Mao Kuan-Yu Chen and Kun-Mao Chao Chao Department of Computer Department of Computer Science and Information Science and Information Engineering, Engineering, National Taiwan National Taiwan University, Taiwan University, Taiwan

Upload: paul-summers

Post on 31-Dec-2015

9 views

Category:

Documents


0 download

DESCRIPTION

On the R ange M aximum-Sum S egment Q uery Problem. Kuan-Yu Chen and Kun-Mao Chao Department of Computer Science and Information Engineering, National Taiwan University, Taiwan. The Maximum-Sum Segment. Also called the maximum-sum interval or the maximum-scoring region - PowerPoint PPT Presentation

TRANSCRIPT

112/04/19112/04/19 Chen and ChaoChen and Chao 11

On the On the RRange ange MMaximum-Sum aximum-Sum SSegment egment QQuery Problemuery Problem

Kuan-Yu Chen and Kun-Mao Kuan-Yu Chen and Kun-Mao ChaoChao

Department of Computer Department of Computer Science and Information Science and Information

Engineering,Engineering,National Taiwan University, National Taiwan University,

TaiwanTaiwan

112/04/19112/04/19 Chen and ChaoChen and Chao 66

The Maximum-Sum The Maximum-Sum SegmentSegment

Also called the maximum-sum Also called the maximum-sum interval or the maximum-scoring interval or the maximum-scoring regionregion

Given a sequence of numbers, the Given a sequence of numbers, the maximum-sum segmentmaximum-sum segment is simply is simply the contiguous subsequence having the contiguous subsequence having the greatest total sum.the greatest total sum.

<5, -5.1, 1, 3, -4, 2, 3, -4, 7><5, -5.1, 1, 3, -4, 2, 3, -4, 7>Zero prefix-/suffix-sums are possible.

With greatest total sum = 8

112/04/19112/04/19 Chen and ChaoChen and Chao 77

A Relevant Problem - RMQA Relevant Problem - RMQ

Range Minima (Maxima) Query Problem Range Minima (Maxima) Query Problem (also called Discrete Range Searching)(also called Discrete Range Searching)

Given a sequence of numbers, by Given a sequence of numbers, by preprocessing the sequence we wishpreprocessing the sequence we wish to to retrieve the minimum (maximum) retrieve the minimum (maximum) value within a given querying interval value within a given querying interval efficientlyefficiently

<5, -5.1, 1, 3, -4, 2, 3, -4, 7><5, -5.1, 1, 3, -4, 2, 3, -4, 7>

MinimumMaximum

112/04/19112/04/19 Chen and ChaoChen and Chao 88

RRange ange MMaximum-Sum aximum-Sum SSegment egment

QQuery Problemuery Problem

Definition: Definition: The input is a sequence <aThe input is a sequence <a11,a,a22,, ………… aann> of > of

real numbers which is to be preprocessed. real numbers which is to be preprocessed. A query is comprised of two intervals S A query is comprised of two intervals S

and E.and E. Our goal is to return the maximum-sum Our goal is to return the maximum-sum

segment whose starting index lies in S and segment whose starting index lies in S and end index lies in E.end index lies in E.

112/04/19112/04/19 Chen and ChaoChen and Chao 99

A Nonoverlapping ExampleA Nonoverlapping Example

Input Sequence:Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -

5, 35, 3 Total sum = 6Starti

ng region

End region

112/04/19112/04/19 Chen and ChaoChen and Chao 1010

An Overlapping ExampleAn Overlapping Example

Input Sequence:Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -

5, 35, 3 Total sum = 8Starti

ng region

End region

112/04/19112/04/19 Chen and ChaoChen and Chao 1111

Our ResultsOur Results

We propose an algorithm that runs in O(n) We propose an algorithm that runs in O(n) preprocessing time and O(1) query time preprocessing time and O(1) query time under the unit-cost RAM model.under the unit-cost RAM model.

We show that the RMSQ techniques yield We show that the RMSQ techniques yield alternative O(n) time algorithms for the alternative O(n) time algorithms for the following problems:following problems: The maximum-sum segment with length The maximum-sum segment with length

constraintsconstraints All maximal-sum segmentsAll maximal-sum segments

112/04/19112/04/19 Chen and ChaoChen and Chao 1212

StrategyStrategy Reduce the RMSQ to the RMQ problemReduce the RMSQ to the RMQ problem

Theorem.Theorem. If there is a <f(n), g(n)>-time If there is a <f(n), g(n)>-time solution for the RMQ problem, then there solution for the RMQ problem, then there is a <f(n)+O(n), g(n)+O(1)>-time solution is a <f(n)+O(n), g(n)+O(1)>-time solution for the RMSQ problem.for the RMSQ problem.

RMSQ RMQ

O(n)

O(1)

112/04/19112/04/19 Chen and ChaoChen and Chao 1313

Cumulative Sum/ Prefix SumCumulative Sum/ Prefix Sum

prefix-sum(i) = a1+a2+…+ai

112/04/19112/04/19 Chen and ChaoChen and Chao 1414

Computing sum(i,j)Computing sum(i,j) in O(1) in O(1) timetime

prefix-sum(prefix-sum(ii) = ) = a1+a2+…+ai

all all nn prefix sums are computable in prefix sums are computable in OO((nn) ) time.time.

sum(sum(ii, , jj) = prefix-sum() = prefix-sum(jj) – prefix-) – prefix-sum(sum(ii-1)-1)

prefix-sum(j)

i j

prefix-sum(i-1)

112/04/19112/04/19 Chen and ChaoChen and Chao 1515

Case 1: NonoverlappingCase 1: Nonoverlapping

sum(i, j ) = prefix-sum(j) – prefix-sum(i-sum(i, j ) = prefix-sum(j) – prefix-sum(i-1)1)

Prefix-sum sequence:Prefix-sum sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -

5, 35, 3

Find the lowest point

here

Find the highest point

here

Range Minima Query

Maximize Maximize Minimize

112/04/19112/04/19 Chen and ChaoChen and Chao 1616

Case 2: OverlappingCase 2: Overlapping

Some problems may occurSome problems may occur Prefix-sum sequencePrefix-sum sequence 9, -10, 4, -2, 5, -5, 4, -3, 6, -11, 8, -3, 4, -9, -10, 4, -2, 5, -5, 4, -3, 6, -11, 8, -3, 4, -

5, 35, 3

Find the lowest point

here

Find the highest point

here

Negative Sum !!

112/04/19112/04/19 Chen and ChaoChen and Chao 1717

Case 2: OverlappingCase 2: Overlapping

Divide into 3 possible cases:Divide into 3 possible cases: Prefix-sum sequence:Prefix-sum sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -

5, 35, 3

Find the lowest point

here

Find the highest point

here

Find the lowest point

here

Find the highest point

here

Range Minima QueryPreprocessing time = f(n)Query time = g(n)

Range Minima QueryPreprocessing time = f(n)Query time = g(n)

What should we do?

112/04/19112/04/19 Chen and ChaoChen and Chao 1818

Dealing with the Special Case:Dealing with the Special Case:Single Range QuerySingle Range Query

Input Sequence:Input Sequence: 9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -

5, 35, 3

Challenge: Can this special case be Challenge: Can this special case be reduced to the RMQ problem?reduced to the RMQ problem?

Total sum = 6

112/04/19112/04/19 Chen and ChaoChen and Chao 1919

Reduction ProcedureReduction Procedure Step 1. Find a partner for each index.Step 1. Find a partner for each index. Step 2. Record the sum of each pair Step 2. Record the sum of each pair

in an arrayin an array Step 3. Retrieve the maximum-sum Step 3. Retrieve the maximum-sum

pair by applying the RMQ techniquespair by applying the RMQ techniques

112/04/19112/04/19 Chen and ChaoChen and Chao 2020

Our First Attempt (1)Our First Attempt (1)

Step 1: For each index Step 1: For each index ii, we define , we define the lowest point preceding the lowest point preceding ii as its as its partnerpartner

Prefix-sum sequence:Prefix-sum sequence:

i

Lowest point Find a

partner within this

region

112/04/19112/04/19 Chen and ChaoChen and Chao 2121

Our First Attempt (2)Our First Attempt (2)

Step 2: Record sum(Step 2: Record sum(partner(i), ipartner(i), i) in ) in an arrayan array

i

Lowest point

sum(partner(i), i)

112/04/19112/04/19 Chen and ChaoChen and Chao 2222

Our First Attempt (3)Our First Attempt (3)

Step 3: Apply the RMQ techniques to Step 3: Apply the RMQ techniques to the arraythe array

i

Lowest point

sum(partner(i), i)

The maximum-sum pair can be

retrieved

Applying RMQ to this

sequence

Querying this interval

112/04/19112/04/19 Chen and ChaoChen and Chao 2323

Bump into DifficultiesBump into Difficulties

What if its partners go beyond the What if its partners go beyond the querying interval?querying interval?

i

partner(i)

sum(partner(i), i)

Needs to be updated

We might have to update every pair!

112/04/19112/04/19 Chen and ChaoChen and Chao 2424

A Better PartnerA Better Partner

Prefix-sum sequencePrefix-sum sequence

iLeft_bound(i)

Find the nearest point

at least as large as i

Find the lowest point

New partner(i)

112/04/19112/04/19 Chen and ChaoChen and Chao 2525

Why Is It Better? (1)Why Is It Better? (1)

It remains the best choice.It remains the best choice. It saves lots of update steps.It saves lots of update steps.

It turns out that zero or one point needs It turns out that zero or one point needs to be updated.to be updated.

112/04/19112/04/19 Chen and ChaoChen and Chao 2626

Why Is It Better? (2)Why Is It Better? (2)-- Remains the Best-- Remains the Best

iLeft_bound(i)

Find the nearest

higher point

Find the lowest point

partner(i)

Impossible region

112/04/19112/04/19 Chen and ChaoChen and Chao 2727

Why Is It Better? (3)Why Is It Better? (3)-- Minimal-Maximal Property-- Minimal-Maximal Property

Height(partner(i))< Height(j) < Height(partner(i))< Height(j) < Height(i), for all partner(i)< j< iHeight(i), for all partner(i)< j< i

i

partner(i)

Next higher point

No one higher than i

No one lower than partner(i)

Maximal point

Minimal point

112/04/19112/04/19 Chen and ChaoChen and Chao 2828

Why Is It Better? (4)Why Is It Better? (4)-- Save Some Updates-- Save Some Updates

Prefix-sum sequencePrefix-sum sequence

i

partner(i)

Next higher point

Querying interval

No one higher than i

Can not be the right end of the maximum-

sum segment

112/04/19112/04/19 Chen and ChaoChen and Chao 2929

Why Is It Better? (5)Why Is It Better? (5)-- Nesting Property-- Nesting Property

For two indices i < j, it cannot be the For two indices i < j, it cannot be the case that partner(i)<partner(j) ≦i<jcase that partner(i)<partner(j) ≦i<j

j

partner(j)

Maximal point

Minimal point

partner(i)

Maximal point

Minimal point

i

112/04/19112/04/19 Chen and ChaoChen and Chao 3030

Why Is It Better? (6)Why Is It Better? (6)-- An example-- An example

No overlapping is allowedNo overlapping is allowed

9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 39, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3 Nesting PropertyNesting Property

9, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 39, -10, 4, -2, 4, -5, 4, -3, 6, -11, 8, -3, 4, -5, 3

112/04/19112/04/19 Chen and ChaoChen and Chao 3131

When a Query ComesWhen a Query Comes-- Case 1: No Exceeding-- Case 1: No Exceeding

The maximum pair (partner(i), i) lies The maximum pair (partner(i), i) lies in the querying intervalin the querying interval

i

partner(i)

Querying interval

Retrieve the maximum pair

We are done. Output (partner(i), i).

112/04/19112/04/19 Chen and ChaoChen and Chao 3232

(Partner(i), i) is the maximum pair.Nesting propertyCan not be the right end of the maximum-sum segment. Compare (new_partner(i), i) and (partner(j), j)

When a Query ComesWhen a Query Comes-- Case 2: Exceeding-- Case 2: Exceeding

The maximum pair (partner(i), i) The maximum pair (partner(i), i) goes beyond the querying intervalgoes beyond the querying interval

i

partner(i)

Querying interval

Retrieve the maximum pair

Maximal

Minimal

Retrieve the maximum pair

j

partner(j)Update

partner(i)

112/04/19112/04/19 Chen and ChaoChen and Chao 3333

Time ComplexityTime Complexity RMSQ can be reduced to the RMQ problem in O(n) RMSQ can be reduced to the RMQ problem in O(n)

timetime

Since under the unit-cost RAM model, there is a Since under the unit-cost RAM model, there is a <O(n), O(1)>-time solution for the RMQ problem, <O(n), O(1)>-time solution for the RMQ problem, there is a <O(n), O(1)>-time solution for the RMSQ there is a <O(n), O(1)>-time solution for the RMSQ problem.problem.

On the other hand, RMQ can be reduced to the RMSQ On the other hand, RMQ can be reduced to the RMSQ problem in O(n) time, too. (Range Maxima Query: For problem in O(n) time, too. (Range Maxima Query: For each two adjacent elements, we augment a negative each two adjacent elements, we augment a negative number whose absolute value is larger than them.)number whose absolute value is larger than them.)

RMSQ RMQ

O(n)

O(1)

112/04/19112/04/19 Chen and ChaoChen and Chao 3434

Use RMSQ Techniques to Solve Use RMSQ Techniques to Solve Two Two Relevant ProblemsRelevant Problems

1. Finding the Maximum-Sum Segment 1. Finding the Maximum-Sum Segment with length constraints in O(n) time.with length constraints in O(n) time.

- Y.-L. Lin, T. Jiang, K.-M. Chao, 2002- Y.-L. Lin, T. Jiang, K.-M. Chao, 2002

- T.-H Fan et al.,- T.-H Fan et al., 20032003

2. Finding all maximal scoring 2. Finding all maximal scoring subsequences in O(n) time.subsequences in O(n) time.

- W. L. Ruzzo & M. Tompa, 1999- W. L. Ruzzo & M. Tompa, 1999

112/04/19112/04/19 Chen and ChaoChen and Chao 3535

Problem 1:The Maximum-Sum Problem 1:The Maximum-Sum Segment with Length Segment with Length

ConstraintsConstraints Lin, Jiang, and Chao [Lin, Jiang, and Chao [JCSS JCSS 2002] and 2002] and

Fan Fan et al.et al. [ [CIAACIAA 2003] gave 2003] gave OO((nn))--time time algorithmsalgorithms for this problem. for this problem. Length at least L, and at most ULength at least L, and at most U

LU

112/04/19112/04/19 Chen and ChaoChen and Chao 3636

Problem 1: Finding the Problem 1: Finding the Maximum-Sum Segment with Maximum-Sum Segment with

Length ConstraintsLength Constraints Length at least L, at most ULength at least L, at most U For each index For each index ii, find the maximum-, find the maximum-

sum segment whose starting point sum segment whose starting point lies in [i-U+1, i-L+1] and end point is lies in [i-U+1, i-L+1] and end point is ii

LU

Runs in O(n) time since each query costs O(1) time

iRMSQ query

112/04/19112/04/19 Chen and ChaoChen and Chao 3737

Problem 2: All Maximal-Sum Problem 2: All Maximal-Sum SegmentsSegments

Ruzzo and Tompa [Ruzzo and Tompa [ISMBISMB 1999] gave 1999] gave a O(n)-time algorithm for this a O(n)-time algorithm for this problem.problem.

Recursive definition.Recursive definition.R(S)L(S)

S

112/04/19112/04/19 Chen and ChaoChen and Chao 3838

Problem 2: Finding All Maximal Problem 2: Finding All Maximal Scoring SubsequencesScoring Subsequences

Recursive calls.Recursive calls. Input sequence:Input sequence:

Runs in O(n) time since each query costs O(1) time

R(S)L(S)

S

RMSQ query