csci x490 algorithms for computational...

214
CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai) January 19, 2016

Upload: others

Post on 22-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

CSCI x490 Algorithms for Computational Biology

Lecture Note 1 (by Liming Cai)

January 19, 2016

Page 2: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 3: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 4: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 5: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 6: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 7: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 8: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 9: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 10: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 11: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 12: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Structure of the Course

I Part I. Introduction to Algorithms (Chapter 2)

I Part II. Fundamental Techniques (Chapters 4 - 6)

algorithms: exhaustive search, greedy, dynamic programming

problems: motif finding, sequence alignment, gene finding

I Part III Advanced Algorithms (Chapters 7 - 10)

algorithms: graph-, string-, and tree-algorithms

problems: DNA sequencing, pattern finding, phylogeny

I Part IV Probabilistic Methods (Chapters 11 - 12)

algorithms: HMM, stochastic grammars, Markov networks

problems: decoding, learning, inference algorithms

Page 13: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Part I. Introduction to Algorithms

Chapter 2. Algorithms and Complexity

Algorithm: a well-defined procedure that takes an input andproduces an output.

Input(x)⇒ Body ⇒ Output(y)

Example: Algorithm MAX;Input: List x = {a1, · · · , an};

Body (a finite series of instructions);Output: y, the maximum of a1, · · · , an.

An algorithm: a finite process to compute a function or a relation.

Page 14: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Part I. Introduction to Algorithms

Chapter 2. Algorithms and Complexity

Algorithm: a well-defined procedure that takes an input andproduces an output.

Input(x)⇒ Body ⇒ Output(y)

Example: Algorithm MAX;Input: List x = {a1, · · · , an};

Body (a finite series of instructions);Output: y, the maximum of a1, · · · , an.

An algorithm: a finite process to compute a function or a relation.

Page 15: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Part I. Introduction to Algorithms

Chapter 2. Algorithms and Complexity

Algorithm: a well-defined procedure that takes an input andproduces an output.

Input(x)⇒ Body ⇒ Output(y)

Example: Algorithm MAX;Input: List x = {a1, · · · , an};

Body (a finite series of instructions);Output: y, the maximum of a1, · · · , an.

An algorithm: a finite process to compute a function or a relation.

Page 16: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Part I. Introduction to Algorithms

Chapter 2. Algorithms and Complexity

Algorithm: a well-defined procedure that takes an input andproduces an output.

Input(x)⇒ Body ⇒ Output(y)

Example: Algorithm MAX;Input: List x = {a1, · · · , an};

Body (a finite series of instructions);Output: y, the maximum of a1, · · · , an.

An algorithm: a finite process to compute a function or a relation.

Page 17: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Part I. Introduction to Algorithms

Chapter 2. Algorithms and Complexity

Algorithm: a well-defined procedure that takes an input andproduces an output.

Input(x)⇒ Body ⇒ Output(y)

Example: Algorithm MAX;Input: List x = {a1, · · · , an};

Body (a finite series of instructions);Output: y, the maximum of a1, · · · , an.

An algorithm: a finite process to compute a function or a relation.

Page 18: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Notation conventions for algorithm writing (pseudo-code)

Memory: variables, arrays, arguments, parametersArray Access: ai: the ith location of array aAssignment: a← bArithmetic: a+ b, a− b, a ∗ b, a/b, abConditional: if condition is true

body 1else

body 2

For Loop: for i← low to highbody

While Loop: while condition is truebody

Page 19: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Notation conventions for algorithm writing (pseudo-code)

Memory: variables, arrays, arguments, parameters

Array Access: ai: the ith location of array aAssignment: a← bArithmetic: a+ b, a− b, a ∗ b, a/b, abConditional: if condition is true

body 1else

body 2

For Loop: for i← low to highbody

While Loop: while condition is truebody

Page 20: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Notation conventions for algorithm writing (pseudo-code)

Memory: variables, arrays, arguments, parametersArray Access: ai: the ith location of array a

Assignment: a← bArithmetic: a+ b, a− b, a ∗ b, a/b, abConditional: if condition is true

body 1else

body 2

For Loop: for i← low to highbody

While Loop: while condition is truebody

Page 21: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Notation conventions for algorithm writing (pseudo-code)

Memory: variables, arrays, arguments, parametersArray Access: ai: the ith location of array aAssignment: a← b

Arithmetic: a+ b, a− b, a ∗ b, a/b, abConditional: if condition is true

body 1else

body 2

For Loop: for i← low to highbody

While Loop: while condition is truebody

Page 22: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Notation conventions for algorithm writing (pseudo-code)

Memory: variables, arrays, arguments, parametersArray Access: ai: the ith location of array aAssignment: a← bArithmetic: a+ b, a− b, a ∗ b, a/b, ab

Conditional: if condition is truebody 1

elsebody 2

For Loop: for i← low to highbody

While Loop: while condition is truebody

Page 23: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Notation conventions for algorithm writing (pseudo-code)

Memory: variables, arrays, arguments, parametersArray Access: ai: the ith location of array aAssignment: a← bArithmetic: a+ b, a− b, a ∗ b, a/b, abConditional: if condition is true

body 1else

body 2

For Loop: for i← low to highbody

While Loop: while condition is truebody

Page 24: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Notation conventions for algorithm writing (pseudo-code)

Memory: variables, arrays, arguments, parametersArray Access: ai: the ith location of array aAssignment: a← bArithmetic: a+ b, a− b, a ∗ b, a/b, abConditional: if condition is true

body 1else

body 2

For Loop: for i← low to highbody

While Loop: while condition is truebody

Page 25: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Notation conventions for algorithm writing (pseudo-code)

Memory: variables, arrays, arguments, parametersArray Access: ai: the ith location of array aAssignment: a← bArithmetic: a+ b, a− b, a ∗ b, a/b, abConditional: if condition is true

body 1else

body 2

For Loop: for i← low to highbody

While Loop: while condition is truebody

Page 26: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Example: an iterative algorithm computing the nth number in theFibonacci series 1, 1, 2, 3, 5, 8, 13, 21, . . . .

Fibonacci (n)1. F1 ← 12. F2 ← 13. for i← 3 to n4. Fi ← Fi−1 + Fi−25. return (Fn)

How is the algorithm executed?

Page 27: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Example: an iterative algorithm computing the nth number in theFibonacci series 1, 1, 2, 3, 5, 8, 13, 21, . . . .

Fibonacci (n)1. F1 ← 12. F2 ← 13. for i← 3 to n4. Fi ← Fi−1 + Fi−25. return (Fn)

How is the algorithm executed?

Page 28: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Example: an iterative algorithm computing the nth number in theFibonacci series 1, 1, 2, 3, 5, 8, 13, 21, . . . .

Fibonacci (n)1. F1 ← 12. F2 ← 13. for i← 3 to n4. Fi ← Fi−1 + Fi−25. return (Fn)

How is the algorithm executed?

Page 29: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Recursive algorithms

Rec-Fibonacci(n)1. if n = 1 OR n = 2, return (1)2. else3. T1 = Rec-Fibonacci(n− 1);4. T2 = Rec-Fibonacci(n− 2);5. return (T1 + T2);

How is the algorithm executed?

Page 30: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Recursive algorithms

Rec-Fibonacci(n)1. if n = 1 OR n = 2, return (1)2. else3. T1 = Rec-Fibonacci(n− 1);4. T2 = Rec-Fibonacci(n− 2);5. return (T1 + T2);

How is the algorithm executed?

Page 31: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Recursive algorithms

Rec-Fibonacci(n)1. if n = 1 OR n = 2, return (1)2. else3. T1 = Rec-Fibonacci(n− 1);4. T2 = Rec-Fibonacci(n− 2);5. return (T1 + T2);

How is the algorithm executed?

Page 32: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Page 33: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Page 34: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Page 35: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Advantages with recursive algorithms

Example: recursive StringCopy

Key ingredients for admitting recursive algorithms

More examplesSum(n), linear search,summation over numbers in a ’triangle’, etc.

Page 36: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Advantages with recursive algorithms

Example: recursive StringCopy

Key ingredients for admitting recursive algorithms

More examplesSum(n), linear search,summation over numbers in a ’triangle’, etc.

Page 37: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Advantages with recursive algorithms

Example: recursive StringCopy

Key ingredients for admitting recursive algorithms

More examples

Sum(n), linear search,summation over numbers in a ’triangle’, etc.

Page 38: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Advantages with recursive algorithms

Example: recursive StringCopy

Key ingredients for admitting recursive algorithms

More examplesSum(n), linear search,

summation over numbers in a ’triangle’, etc.

Page 39: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Advantages with recursive algorithms

Example: recursive StringCopy

Key ingredients for admitting recursive algorithms

More examplesSum(n), linear search,summation over numbers in a ’triangle’, etc.

Page 40: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

A more complex example: Towers of Hanoi

TowersOfHanoi(First, Second, Third, n)1. if n = 1, MoveOne(First, Third)2. else

TowersOfHanoi(First, Third, Second, n-1)MoveOne(First, Third)TowersOfHanoi(Second, First,Third, n-1)

Page 41: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

A more complex example: Towers of Hanoi

TowersOfHanoi(First, Second, Third, n)

1. if n = 1, MoveOne(First, Third)2. else

TowersOfHanoi(First, Third, Second, n-1)MoveOne(First, Third)TowersOfHanoi(Second, First,Third, n-1)

Page 42: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

A more complex example: Towers of Hanoi

TowersOfHanoi(First, Second, Third, n)1. if n = 1, MoveOne(First, Third)

2. elseTowersOfHanoi(First, Third, Second, n-1)MoveOne(First, Third)TowersOfHanoi(Second, First,Third, n-1)

Page 43: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

A more complex example: Towers of Hanoi

TowersOfHanoi(First, Second, Third, n)1. if n = 1, MoveOne(First, Third)2. else

TowersOfHanoi(First, Third, Second, n-1)MoveOne(First, Third)TowersOfHanoi(Second, First,Third, n-1)

Page 44: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

A more complex example: Towers of Hanoi

TowersOfHanoi(First, Second, Third, n)1. if n = 1, MoveOne(First, Third)2. else

TowersOfHanoi(First, Third, Second, n-1)

MoveOne(First, Third)TowersOfHanoi(Second, First,Third, n-1)

Page 45: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

A more complex example: Towers of Hanoi

TowersOfHanoi(First, Second, Third, n)1. if n = 1, MoveOne(First, Third)2. else

TowersOfHanoi(First, Third, Second, n-1)MoveOne(First, Third)

TowersOfHanoi(Second, First,Third, n-1)

Page 46: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

A more complex example: Towers of Hanoi

TowersOfHanoi(First, Second, Third, n)1. if n = 1, MoveOne(First, Third)2. else

TowersOfHanoi(First, Third, Second, n-1)MoveOne(First, Third)TowersOfHanoi(Second, First,Third, n-1)

Page 47: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Disadvantages with recursive algorithms

May be inefficient!

(1) overhead: use of stacks: push/pop(2) possible re-computation:

e.g., RecFib(5) may be computed several times

Page 48: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Disadvantages with recursive algorithms

May be inefficient!

(1) overhead: use of stacks: push/pop

(2) possible re-computation:

e.g., RecFib(5) may be computed several times

Page 49: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Disadvantages with recursive algorithms

May be inefficient!

(1) overhead: use of stacks: push/pop(2) possible re-computation:

e.g., RecFib(5) may be computed several times

Page 50: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm Efficiency: how is it defined?

Counting the number of basic operations used

Example

SelectionSort(a, n)1 for i← 1 to n− 12. j ← i {starting of inner loop, assume ai to be

the smallest elements in ai, . . . , anj memorizes the index of the smallest element }

3. for k ← i+ 1 to n {search for the smallest element}4. if ak < aj5. j ← k6. Swap ai and aj7 return array a

Page 51: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm Efficiency: how is it defined?

Counting the number of basic operations used

Example

SelectionSort(a, n)1 for i← 1 to n− 12. j ← i {starting of inner loop, assume ai to be

the smallest elements in ai, . . . , anj memorizes the index of the smallest element }

3. for k ← i+ 1 to n {search for the smallest element}4. if ak < aj5. j ← k6. Swap ai and aj7 return array a

Page 52: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm Efficiency: how is it defined?

Counting the number of basic operations used

Example

SelectionSort(a, n)1 for i← 1 to n− 12. j ← i {starting of inner loop, assume ai to be

the smallest elements in ai, . . . , anj memorizes the index of the smallest element }

3. for k ← i+ 1 to n {search for the smallest element}4. if ak < aj5. j ← k6. Swap ai and aj7 return array a

Page 53: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm Efficiency (cont’)

SelectionSort(a, n)1 for i← 1 to n− 12. j ← i3. for k ← i+ 1 to n4. if ak < aj5. j ← k6. Swap ai and aj7 return array a

Count the total number of basic operations needed:

= c1 × n+ c2 × (n− 1) + c3 ×n−1∑i=1

(n− i) + c4,5 ×n−1∑i=1

(n− i− 1) + c6 × (n− 1) + c7

= an2 + bn + c for some constant a > 0, b, c

Page 54: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm Efficiency (cont’)

SelectionSort(a, n)1 for i← 1 to n− 12. j ← i3. for k ← i+ 1 to n4. if ak < aj5. j ← k6. Swap ai and aj7 return array a

Count the total number of basic operations needed:

= c1 × n+ c2 × (n− 1) + c3 ×n−1∑i=1

(n− i) + c4,5 ×n−1∑i=1

(n− i− 1) + c6 × (n− 1) + c7

= an2 + bn + c for some constant a > 0, b, c

Page 55: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm Efficiency (cont’)

SelectionSort(a, n)1 for i← 1 to n− 12. j ← i3. for k ← i+ 1 to n4. if ak < aj5. j ← k6. Swap ai and aj7 return array a

Count the total number of basic operations needed:

= c1 × n+ c2 × (n− 1) + c3 ×n−1∑i=1

(n− i) + c4,5 ×n−1∑i=1

(n− i− 1) + c6 × (n− 1) + c7

= an2 + bn + c for some constant a > 0, b, c

Page 56: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

How to count basic operations in recursive algorithms ?

RecFib(n)1. if n = 1 OR n = 2 return (1)2. else

return (RecFib(n− 1) + RecFib(n− 2))

Deriving and solving recurrences!

- Let t(n) be the time needed for computing RecFib(n)- then

t(n) = c+ t(n− 1) + t(n− 2)

t(n) = 1, when n = 1, 2

- solve it exactly, or

- estimate lower and upper bounds.

Page 57: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

How to count basic operations in recursive algorithms ?

RecFib(n)1. if n = 1 OR n = 2 return (1)2. else

return (RecFib(n− 1) + RecFib(n− 2))

Deriving and solving recurrences!

- Let t(n) be the time needed for computing RecFib(n)- then

t(n) = c+ t(n− 1) + t(n− 2)

t(n) = 1, when n = 1, 2

- solve it exactly, or

- estimate lower and upper bounds.

Page 58: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

How to count basic operations in recursive algorithms ?

RecFib(n)1. if n = 1 OR n = 2 return (1)2. else

return (RecFib(n− 1) + RecFib(n− 2))

Deriving and solving recurrences!

- Let t(n) be the time needed for computing RecFib(n)- then

t(n) = c+ t(n− 1) + t(n− 2)

t(n) = 1, when n = 1, 2

- solve it exactly, or

- estimate lower and upper bounds.

Page 59: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

How to count basic operations in recursive algorithms ?

RecFib(n)1. if n = 1 OR n = 2 return (1)2. else

return (RecFib(n− 1) + RecFib(n− 2))

Deriving and solving recurrences!

- Let t(n) be the time needed for computing RecFib(n)

- thent(n) = c+ t(n− 1) + t(n− 2)

t(n) = 1, when n = 1, 2

- solve it exactly, or

- estimate lower and upper bounds.

Page 60: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

How to count basic operations in recursive algorithms ?

RecFib(n)1. if n = 1 OR n = 2 return (1)2. else

return (RecFib(n− 1) + RecFib(n− 2))

Deriving and solving recurrences!

- Let t(n) be the time needed for computing RecFib(n)- then

t(n) = c+ t(n− 1) + t(n− 2)

t(n) = 1, when n = 1, 2

- solve it exactly, or

- estimate lower and upper bounds.

Page 61: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

How to count basic operations in recursive algorithms ?

RecFib(n)1. if n = 1 OR n = 2 return (1)2. else

return (RecFib(n− 1) + RecFib(n− 2))

Deriving and solving recurrences!

- Let t(n) be the time needed for computing RecFib(n)- then

t(n) = c+ t(n− 1) + t(n− 2)

t(n) = 1, when n = 1, 2

- solve it exactly, or

- estimate lower and upper bounds.

Page 62: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

How to count basic operations in recursive algorithms ?

RecFib(n)1. if n = 1 OR n = 2 return (1)2. else

return (RecFib(n− 1) + RecFib(n− 2))

Deriving and solving recurrences!

- Let t(n) be the time needed for computing RecFib(n)- then

t(n) = c+ t(n− 1) + t(n− 2)

t(n) = 1, when n = 1, 2

- solve it exactly,

or

- estimate lower and upper bounds.

Page 63: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

How to count basic operations in recursive algorithms ?

RecFib(n)1. if n = 1 OR n = 2 return (1)2. else

return (RecFib(n− 1) + RecFib(n− 2))

Deriving and solving recurrences!

- Let t(n) be the time needed for computing RecFib(n)- then

t(n) = c+ t(n− 1) + t(n− 2)

t(n) = 1, when n = 1, 2

- solve it exactly, or

- estimate lower and upper bounds.

Page 64: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

How to count basic operations in recursive algorithms ?

RecFib(n)1. if n = 1 OR n = 2 return (1)2. else

return (RecFib(n− 1) + RecFib(n− 2))

Deriving and solving recurrences!

- Let t(n) be the time needed for computing RecFib(n)- then

t(n) = c+ t(n− 1) + t(n− 2)

t(n) = 1, when n = 1, 2

- solve it exactly, or

- estimate lower and upper bounds.

Page 65: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 66: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 67: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),

based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 68: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 69: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 70: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 71: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 72: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 73: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 74: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1

i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 75: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0

α = (1 +√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 76: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 77: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

t(n) = c+ t(n− 1) + t(n− 2)

where t(n) = 1, when n = 1, 2

• Estimate the lower and upper bounds of t(n),based on the recursive tree structure:

lower bound: t(n) ≥ 2n2 =√2n> 1.414n

upper bound: t(n) ≤ 2n

So 1.414n < t(n) < 2n

• Solve it exactly: assume t(n) = αn, we have

αn = c+ αn−1 + αn−2

α2 = α+ 1 i.e., α2 − α− 1 = 0α = (1 +

√5)/2 ≈ 1.618

So t(n) ≈ 1.618n

Page 78: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Analysis of recursive algorithms (cont’)

Another example: Search a sorted list (of indexes i, . . . , j) for a key

BinarySearch (L, i, j, key)1. if i > j return (0) {base case, list is empty, not found}2. else3. let m← d i+j

2e {get mid point index }

4. if key = Lm return(m) { found }5. else6. if key < Lm BinarySearch (L, i,m− 1, key) { search the left half list}7. else { key > Lm }8. BinarySearch (L,m+ 1, j, key) { search the right half list}

Can you write an iterative algorithm for binary search?

Page 79: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Analysis of recursive algorithms (cont’)

Another example: Search a sorted list (of indexes i, . . . , j) for a key

BinarySearch (L, i, j, key)1. if i > j return (0) {base case, list is empty, not found}2. else3. let m← d i+j

2e {get mid point index }

4. if key = Lm return(m) { found }5. else6. if key < Lm BinarySearch (L, i,m− 1, key) { search the left half list}7. else { key > Lm }8. BinarySearch (L,m+ 1, j, key) { search the right half list}

Can you write an iterative algorithm for binary search?

Page 80: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Analysis of recursive algorithms (cont’)

Another example: Search a sorted list (of indexes i, . . . , j) for a key

BinarySearch (L, i, j, key)1. if i > j return (0) {base case, list is empty, not found}2. else3. let m← d i+j

2e {get mid point index }

4. if key = Lm return(m) { found }5. else6. if key < Lm BinarySearch (L, i,m− 1, key) { search the left half list}7. else { key > Lm }8. BinarySearch (L,m+ 1, j, key) { search the right half list}

Can you write an iterative algorithm for binary search?

Page 81: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 82: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c),

with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 83: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 84: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 85: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)

t(n) = c+ c+ t(b n22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 86: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 87: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 88: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .

t(n) = c+ c+ c+ · · ·+ c+ t(b n2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 89: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc)

where n2k

= 1t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,

where k = log2 n.So t(n) = c(log2 n+ 1)

Page 90: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 91: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0)

= k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 92: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 93: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Assume t(n) to be total time to for BinarySearch (L, i, j, key)where n = j − i+ 1, the length of the list to be searched.

t(n) = c+ t(bn2c), with base case t(0) = c′

then

t(n) = c+ t(bn2 c)t(n) = c+ c+ t(b n

22c)

t(n) = c+ c+ c+ t(b n23c)

. . .t(n) = c+ c+ c+ · · ·+ c+ t(b n

2kc) where n

2k= 1

t(n) = c+ c+ c+ · · ·+ c+ c+ t(0) = k × c+ c,where k = log2 n.

So t(n) = c(log2 n+ 1)

Page 94: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Big-O notation for complexity

O(n) includes n, 3n+ 15, 1000n, 0.001n, etc.O(n) also includes

√n, log2 n, etc.

O(n) does not include n2, n log2 n, etc.O(n2) includes n2, 3n2, etc.O(n2) also include n,

√n, etc.

O(n100) does not include 2n, nn, n!, etc.

polynomial time vs exponential time.

ı.e, tractable problems vs intractable problems

Page 95: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Big-O notation for complexity

O(n) includes n, 3n+ 15, 1000n, 0.001n, etc.

O(n) also includes√n, log2 n, etc.

O(n) does not include n2, n log2 n, etc.O(n2) includes n2, 3n2, etc.O(n2) also include n,

√n, etc.

O(n100) does not include 2n, nn, n!, etc.

polynomial time vs exponential time.

ı.e, tractable problems vs intractable problems

Page 96: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Big-O notation for complexity

O(n) includes n, 3n+ 15, 1000n, 0.001n, etc.O(n) also includes

√n, log2 n, etc.

O(n) does not include n2, n log2 n, etc.O(n2) includes n2, 3n2, etc.O(n2) also include n,

√n, etc.

O(n100) does not include 2n, nn, n!, etc.

polynomial time vs exponential time.

ı.e, tractable problems vs intractable problems

Page 97: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Big-O notation for complexity

O(n) includes n, 3n+ 15, 1000n, 0.001n, etc.O(n) also includes

√n, log2 n, etc.

O(n) does not include n2, n log2 n, etc.

O(n2) includes n2, 3n2, etc.O(n2) also include n,

√n, etc.

O(n100) does not include 2n, nn, n!, etc.

polynomial time vs exponential time.

ı.e, tractable problems vs intractable problems

Page 98: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Big-O notation for complexity

O(n) includes n, 3n+ 15, 1000n, 0.001n, etc.O(n) also includes

√n, log2 n, etc.

O(n) does not include n2, n log2 n, etc.O(n2) includes n2, 3n2, etc.

O(n2) also include n,√n, etc.

O(n100) does not include 2n, nn, n!, etc.

polynomial time vs exponential time.

ı.e, tractable problems vs intractable problems

Page 99: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Big-O notation for complexity

O(n) includes n, 3n+ 15, 1000n, 0.001n, etc.O(n) also includes

√n, log2 n, etc.

O(n) does not include n2, n log2 n, etc.O(n2) includes n2, 3n2, etc.O(n2) also include n,

√n, etc.

O(n100) does not include 2n, nn, n!, etc.

polynomial time vs exponential time.

ı.e, tractable problems vs intractable problems

Page 100: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Big-O notation for complexity

O(n) includes n, 3n+ 15, 1000n, 0.001n, etc.O(n) also includes

√n, log2 n, etc.

O(n) does not include n2, n log2 n, etc.O(n2) includes n2, 3n2, etc.O(n2) also include n,

√n, etc.

O(n100) does not include 2n, nn, n!, etc.

polynomial time vs exponential time.

ı.e, tractable problems vs intractable problems

Page 101: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Big-O notation for complexity

O(n) includes n, 3n+ 15, 1000n, 0.001n, etc.O(n) also includes

√n, log2 n, etc.

O(n) does not include n2, n log2 n, etc.O(n2) includes n2, 3n2, etc.O(n2) also include n,

√n, etc.

O(n100) does not include 2n, nn, n!, etc.

polynomial time vs exponential time.

ı.e, tractable problems vs intractable problems

Page 102: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm design techniques (included in this class)

exhaustive search (including branch-and-bound)greed algorithmsdynamic programmingdivide-and-conquerbased on combinatorics, graph theory, etc.machine learningrandomized algorithms

Page 103: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm design techniques (included in this class)

exhaustive search (including branch-and-bound)

greed algorithmsdynamic programmingdivide-and-conquerbased on combinatorics, graph theory, etc.machine learningrandomized algorithms

Page 104: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm design techniques (included in this class)

exhaustive search (including branch-and-bound)greed algorithms

dynamic programmingdivide-and-conquerbased on combinatorics, graph theory, etc.machine learningrandomized algorithms

Page 105: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm design techniques (included in this class)

exhaustive search (including branch-and-bound)greed algorithmsdynamic programming

divide-and-conquerbased on combinatorics, graph theory, etc.machine learningrandomized algorithms

Page 106: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm design techniques (included in this class)

exhaustive search (including branch-and-bound)greed algorithmsdynamic programmingdivide-and-conquer

based on combinatorics, graph theory, etc.machine learningrandomized algorithms

Page 107: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm design techniques (included in this class)

exhaustive search (including branch-and-bound)greed algorithmsdynamic programmingdivide-and-conquerbased on combinatorics, graph theory, etc.

machine learningrandomized algorithms

Page 108: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm design techniques (included in this class)

exhaustive search (including branch-and-bound)greed algorithmsdynamic programmingdivide-and-conquerbased on combinatorics, graph theory, etc.machine learning

randomized algorithms

Page 109: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 2. Algorithms and Complexity

Algorithm design techniques (included in this class)

exhaustive search (including branch-and-bound)greed algorithmsdynamic programmingdivide-and-conquerbased on combinatorics, graph theory, etc.machine learningrandomized algorithms

Page 110: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Part II. Fundamental Techniques

Chapter 4. Exhaustive Search

motif finding, median string problems

Chapter 5. Greedy Algorithms

genome rearrangement

Chapter 6. Dynamic Programming

sequence alignment, multiple alignment, gene finding

Page 111: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.4 Regulatory motifs in DNA sequences

Sequence motifs regulate (turn on/off) gene expression

Example:

transcriptional binding sites TCGGGGATTCC

transcriptional factor: protein that binds to the siteallows RNA polymerase to transcribe downstream genescalled l-mers

Page 112: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Upstream sequences of genes

CGGGGCTATGCAACTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAATGCAACTCCAAAGCGGACAAA

GGATGCAACTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGATGCAACTCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCATGCAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAACTTTCAAC

TACATGATCTTTTGATGCAACTTGGATGATGAGGGAATGC motifs are underscored.

CGGGGCTATGCAACTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAATGCAACTCCAAAGCGGACAAA

GGATGCAACTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGATGCAACTCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCATGCAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAACTTTCAAC

TACATGATCTTTTGATGCAACTTGGATGATGAGGGAATGC underlines are removed.

All motifs are the same ATGCAACT.

Page 113: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Upstream sequences of genes

CGGGGCTATGCAACTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAATGCAACTCCAAAGCGGACAAA

GGATGCAACTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGATGCAACTCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCATGCAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAACTTTCAAC

TACATGATCTTTTGATGCAACTTGGATGATGAGGGAATGC motifs are underscored.

CGGGGCTATGCAACTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAATGCAACTCCAAAGCGGACAAA

GGATGCAACTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGATGCAACTCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCATGCAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAACTTTCAAC

TACATGATCTTTTGATGCAACTTGGATGATGAGGGAATGC underlines are removed.

All motifs are the same ATGCAACT.

Page 114: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Upstream sequences of genes

CGGGGCTATGCAACTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAATGCAACTCCAAAGCGGACAAA

GGATGCAACTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGATGCAACTCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCATGCAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAACTTTCAAC

TACATGATCTTTTGATGCAACTTGGATGATGAGGGAATGC motifs are underscored.

CGGGGCTATGCAACTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAATGCAACTCCAAAGCGGACAAA

GGATGCAACTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGATGCAACTCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCATGCAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAACTTTCAAC

TACATGATCTTTTGATGCAACTTGGATGATGAGGGAATGC underlines are removed.

All motifs are the same ATGCAACT.

Page 115: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

two mutations in every motif.

CGGGGCTATcCAgCTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAggGCAACTCCAAAGCGGACAAA

GGATGgAtCTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGAaGCAACcCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCtTGgAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAatTTTCAAC

TACATGATCTTTTGATGgcACTTGGATGATGAGGGAATGC

CGGGGCTATCCAGCTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAGGGCAACTCCAAAGCGGACAAA

GGATGGATCTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGAAGCAACCCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCTTGGAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAATTTTCAAC

TACATGATCTTTTGATGGCACTTGGATGATGAGGGAATGC underscores are removed.

How do identify these motifs?

Page 116: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

two mutations in every motif.

CGGGGCTATcCAgCTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAggGCAACTCCAAAGCGGACAAA

GGATGgAtCTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGAaGCAACcCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCtTGgAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAatTTTCAAC

TACATGATCTTTTGATGgcACTTGGATGATGAGGGAATGC

CGGGGCTATCCAGCTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAGGGCAACTCCAAAGCGGACAAA

GGATGGATCTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGAAGCAACCCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCTTGGAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAATTTTCAAC

TACATGATCTTTTGATGGCACTTGGATGATGAGGGAATGC underscores are removed.

How do identify these motifs?

Page 117: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

two mutations in every motif.

CGGGGCTATcCAgCTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAggGCAACTCCAAAGCGGACAAA

GGATGgAtCTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGAaGCAACcCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCtTGgAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAatTTTCAAC

TACATGATCTTTTGATGgcACTTGGATGATGAGGGAATGC

CGGGGCTATCCAGCTGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAGGGCAACTCCAAAGCGGACAAA

GGATGGATCTGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGAAGCAACCCCAGGAGCGCCTTTGCTGGTTCTACCTG

AATTTTCTAAAAAGATTATAATGTCGGTCCTTGGAACTTC

CTGCTGTACAACTGAGATCATGCTGCATGCAATTTTCAAC

TACATGATCTTTTGATGGCACTTGGATGATGAGGGAATGC underscores are removed.

How do identify these motifs?

Page 118: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.5 Profiles for motifs

CGGGGCTatccagctGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAgggcaactCCAAAGCGGACAAA

GGatggatctGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGaagcaaccCCAGGAGCGCCTTTGCTGGTTCTACCTG

CTAAAAGATTATAATGTCGGTCCttggaactTC

CTGTACATCATGCTGCatgccattTTCAAC

TACATGATCTTTTGatggcactTGGATGATGAGGGAATGC

————motifs are in lower case; they are aligned to build a profile.

Let s = {8, 19, 3, 5, 24, 17, 15} be set of starting positions in sample sequences.

Page 119: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.5 Profiles for motifs

CGGGGCTatccagctGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAgggcaactCCAAAGCGGACAAA

GGatggatctGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGaagcaaccCCAGGAGCGCCTTTGCTGGTTCTACCTG

CTAAAAGATTATAATGTCGGTCCttggaactTC

CTGTACATCATGCTGCatgccattTTCAAC

TACATGATCTTTTGatggcactTGGATGATGAGGGAATGC

————motifs are in lower case; they are aligned to build a profile.

Let s = {8, 19, 3, 5, 24, 17, 15} be set of starting positions in sample sequences.

Page 120: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Then profile P (s)

CGGGGCTatccagctGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAgggcaactCCAAAGCGGACAAA

GGatggatctGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGaagcaaccCCAGGAGCGCCTTTGCTGGTTCTACCTG

CTAAAAGATTATAATGTCGGTCCttggaactTC

CTGTACATCATGCTGCatgccattTTCAAC

TACATGATCTTTTGatggcactTGGATGATGAGGGAATGC

————

Consensus of 7 motifs (of length 8)

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

Page 121: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Then profile P (s)

CGGGGCTatccagctGGGTCGTCACATTCCCCTTTCGATA

TTTGAGGGTGCCCAATAAgggcaactCCAAAGCGGACAAA

GGatggatctGATGCCGTTTGACGACCTAAATCAACGGCC

AAGGaagcaaccCCAGGAGCGCCTTTGCTGGTTCTACCTG

CTAAAAGATTATAATGTCGGTCCttggaactTC

CTGTACATCATGCTGCatgccattTTCAAC

TACATGATCTTTTGatggcactTGGATGATGAGGGAATGC

————

Consensus of 7 motifs (of length 8)

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

Page 122: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.6 The motif finding problem

Given profile P (s)

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

Let MP (s)(j) be the largest count in column j, e.g., MP (s)(1) = 5

Then consensus score Score(s,DNA) = Σlj=1MP (s)(j).

e.g. Score(s,DNA) = 5 + 5 + 6 + 4 + 5 + 5 + 6 + 6 = 42

If the motif has length l and there are t sequences, thenThe best possible alignment has score: l × tThe worst possible alignment score is lt

4

Page 123: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.6 The motif finding problem

Given profile P (s)

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

Let MP (s)(j) be the largest count in column j, e.g., MP (s)(1) = 5

Then consensus score Score(s,DNA) = Σlj=1MP (s)(j).

e.g. Score(s,DNA) = 5 + 5 + 6 + 4 + 5 + 5 + 6 + 6 = 42

If the motif has length l and there are t sequences, thenThe best possible alignment has score: l × tThe worst possible alignment score is lt

4

Page 124: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.6 The motif finding problem

Given profile P (s)

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

Let MP (s)(j) be the largest count in column j, e.g., MP (s)(1) = 5

Then consensus score Score(s,DNA) = Σlj=1MP (s)(j).

e.g. Score(s,DNA) = 5 + 5 + 6 + 4 + 5 + 5 + 6 + 6 = 42

If the motif has length l and there are t sequences, thenThe best possible alignment has score: l × tThe worst possible alignment score is lt

4

Page 125: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.6 The motif finding problem

Given profile P (s)

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

Let MP (s)(j) be the largest count in column j, e.g., MP (s)(1) = 5

Then consensus score Score(s,DNA) = Σlj=1MP (s)(j).

e.g. Score(s,DNA) = 5 + 5 + 6 + 4 + 5 + 5 + 6 + 6 = 42

If the motif has length l and there are t sequences, thenThe best possible alignment has score: l × tThe worst possible alignment score is lt

4

Page 126: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Motif Finding Problem: Given a set of DNA sequences, find a setof l-mers, one from each sequence, that maximizes the consensus score.

Input: A t× n matrix of DNA sequences and l, length of the patternOutput: An array of t starting positions s = (s1, s2, . . . , st)

to maximize Score(s,DNA).

A related problem is finding a median string.

Page 127: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Motif Finding Problem: Given a set of DNA sequences, find a setof l-mers, one from each sequence, that maximizes the consensus score.

Input: A t× n matrix of DNA sequences and l, length of the pattern

Output: An array of t starting positions s = (s1, s2, . . . , st)to maximize Score(s,DNA).

A related problem is finding a median string.

Page 128: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Motif Finding Problem: Given a set of DNA sequences, find a setof l-mers, one from each sequence, that maximizes the consensus score.

Input: A t× n matrix of DNA sequences and l, length of the patternOutput: An array of t starting positions s = (s1, s2, . . . , st)

to maximize Score(s,DNA).

A related problem is finding a median string.

Page 129: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Motif Finding Problem: Given a set of DNA sequences, find a setof l-mers, one from each sequence, that maximizes the consensus score.

Input: A t× n matrix of DNA sequences and l, length of the patternOutput: An array of t starting positions s = (s1, s2, . . . , st)

to maximize Score(s,DNA).

A related problem is finding a median string.

Page 130: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Given two l-mers v and w, we can compute the Hamming distancebetween v and w:

dH(v, w) as the number of positions that differ in v and w.

e.g., dH(ATTGTC, ACTCTC) = 2

By abusing the notation a little, let dH(v, si) be the Hammingdistance between v and the l-mer starting at position si in the ithsequence.

And define

dH(v, s) =

t∑i=1

dH(v, si)

where s = {s1, s2, . . . , st}.

Page 131: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Given two l-mers v and w, we can compute the Hamming distancebetween v and w:

dH(v, w) as the number of positions that differ in v and w.e.g., dH(ATTGTC, ACTCTC) = 2

By abusing the notation a little, let dH(v, si) be the Hammingdistance between v and the l-mer starting at position si in the ithsequence.

And define

dH(v, s) =

t∑i=1

dH(v, si)

where s = {s1, s2, . . . , st}.

Page 132: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Given two l-mers v and w, we can compute the Hamming distancebetween v and w:

dH(v, w) as the number of positions that differ in v and w.e.g., dH(ATTGTC, ACTCTC) = 2

By abusing the notation a little, let dH(v, si) be the Hammingdistance between v and the l-mer starting at position si in the ithsequence.

And define

dH(v, s) =

t∑i=1

dH(v, si)

where s = {s1, s2, . . . , st}.

Page 133: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Given two l-mers v and w, we can compute the Hamming distancebetween v and w:

dH(v, w) as the number of positions that differ in v and w.e.g., dH(ATTGTC, ACTCTC) = 2

By abusing the notation a little, let dH(v, si) be the Hammingdistance between v and the l-mer starting at position si in the ithsequence.

And define

dH(v, s) =

t∑i=1

dH(v, si)

where s = {s1, s2, . . . , st}.

Page 134: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

DefineTotalDistance(v,DNA) = min

all s(dH(v, s))

where the minimization is taken over all s’s.

A string v is a median string among the set of DNA sequences ifTotalDistance(v,DNA) achieves the minimum.

Median String Problem: Given a set of DNA sequences, find amedian string

Input: A t× n matrix DNA sequences and length lOutput: A string v of length l that minimizes

TotalDistance(v,DNA) over all strings of length l.

Page 135: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

DefineTotalDistance(v,DNA) = min

all s(dH(v, s))

where the minimization is taken over all s’s.

A string v is a median string among the set of DNA sequences ifTotalDistance(v,DNA) achieves the minimum.

Median String Problem: Given a set of DNA sequences, find amedian string

Input: A t× n matrix DNA sequences and length lOutput: A string v of length l that minimizes

TotalDistance(v,DNA) over all strings of length l.

Page 136: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

DefineTotalDistance(v,DNA) = min

all s(dH(v, s))

where the minimization is taken over all s’s.

A string v is a median string among the set of DNA sequences ifTotalDistance(v,DNA) achieves the minimum.

Median String Problem: Given a set of DNA sequences, find amedian string

Input: A t× n matrix DNA sequences and length lOutput: A string v of length l that minimizes

TotalDistance(v,DNA) over all strings of length l.

Page 137: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

DefineTotalDistance(v,DNA) = min

all s(dH(v, s))

where the minimization is taken over all s’s.

A string v is a median string among the set of DNA sequences ifTotalDistance(v,DNA) achieves the minimum.

Median String Problem: Given a set of DNA sequences, find amedian string

Input: A t× n matrix DNA sequences and length lOutput: A string v of length l that minimizes

TotalDistance(v,DNA) over all strings of length l.

Page 138: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Median String Problem and Motif Finding Problem arecomputationally equivalent!

Let s be the starting positions with consensus score Score(s,DNA)Let w be the consensus string of the corresponding profile. Then

dH(w, s) = lt− Score(s,DNA)

e.g., w =ATGCAACT, s = {8, 19, 3, 5, 24, 17, 15}, l = 8, t = 7,Score(s,DNA) = 42. Indeed,

dH(w, s) = 2× 7 = 14 = 7× 8− 42

But why?

Page 139: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Median String Problem and Motif Finding Problem arecomputationally equivalent!

Let s be the starting positions with consensus score Score(s,DNA)Let w be the consensus string of the corresponding profile. Then

dH(w, s) = lt− Score(s,DNA)

e.g., w =ATGCAACT, s = {8, 19, 3, 5, 24, 17, 15}, l = 8, t = 7,Score(s,DNA) = 42. Indeed,

dH(w, s) = 2× 7 = 14 = 7× 8− 42

But why?

Page 140: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Median String Problem and Motif Finding Problem arecomputationally equivalent!

Let s be the starting positions with consensus score Score(s,DNA)Let w be the consensus string of the corresponding profile. Then

dH(w, s) = lt− Score(s,DNA)

e.g., w =ATGCAACT, s = {8, 19, 3, 5, 24, 17, 15}, l = 8, t = 7,Score(s,DNA) = 42. Indeed,

dH(w, s) = 2× 7 = 14 = 7× 8− 42

But why?

Page 141: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Median String Problem and Motif Finding Problem arecomputationally equivalent!

Let s be the starting positions with consensus score Score(s,DNA)Let w be the consensus string of the corresponding profile. Then

dH(w, s) = lt− Score(s,DNA)

e.g., w =ATGCAACT, s = {8, 19, 3, 5, 24, 17, 15}, l = 8, t = 7,Score(s,DNA) = 42. Indeed,

dH(w, s) = 2× 7 = 14 = 7× 8− 42

But why?

Page 142: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Median String Problem and Motif Finding Problem arecomputationally equivalent!

Let s be the starting positions with consensus score Score(s,DNA)Let w be the consensus string of the corresponding profile. Then

dH(w, s) = lt− Score(s,DNA)

e.g., w =ATGCAACT, s = {8, 19, 3, 5, 24, 17, 15}, l = 8, t = 7,Score(s,DNA) = 42. Indeed,

dH(w, s) = 2× 7 = 14 = 7× 8− 42

But why?

Page 143: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The consensus string w minimizes dH(v, s) over all choices of vand it maximizes score Score(s,DNA):

dH(w, s) = minall v

dH(v, s) = lt− Score(s,DNA)

minall s

minall v

dH(v, s) = lt−maxall s

Score(s,DNA)

Left is the goal of Median Finding Problem; andright is the goal of Motif Finding Problem.

Page 144: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The consensus string w minimizes dH(v, s) over all choices of vand it maximizes score Score(s,DNA):

dH(w, s) = minall v

dH(v, s) = lt− Score(s,DNA)

minall s

minall v

dH(v, s) = lt−maxall s

Score(s,DNA)

Left is the goal of Median Finding Problem; andright is the goal of Motif Finding Problem.

Page 145: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The consensus string w minimizes dH(v, s) over all choices of vand it maximizes score Score(s,DNA):

dH(w, s) = minall v

dH(v, s) = lt− Score(s,DNA)

minall s

minall v

dH(v, s) = lt−maxall s

Score(s,DNA)

Left is the goal of Median Finding Problem; andright is the goal of Motif Finding Problem.

Page 146: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The consensus string w minimizes dH(v, s) over all choices of vand it maximizes score Score(s,DNA):

dH(w, s) = minall v

dH(v, s) = lt− Score(s,DNA)

minall s

minall v

dH(v, s) = lt−maxall s

Score(s,DNA)

Left is the goal of Median Finding Problem; andright is the goal of Motif Finding Problem.

Page 147: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The two problems can also be solved using the same technique!

Exhaustive search for Motif Finding Problem: By consideringall (n− l + 1)t positions s.

Exhaustive search for Median String Problem: By consideringall 4l l-mers.

The two searches are similar if we consider the 4 nucleotides to benumbers.

The main general issue is to consider all kL L-mers for k-letter alphabet.

How to enumerate them?

Page 148: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The two problems can also be solved using the same technique!

Exhaustive search for Motif Finding Problem: By consideringall (n− l + 1)t positions s.

Exhaustive search for Median String Problem: By consideringall 4l l-mers.

The two searches are similar if we consider the 4 nucleotides to benumbers.

The main general issue is to consider all kL L-mers for k-letter alphabet.

How to enumerate them?

Page 149: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The two problems can also be solved using the same technique!

Exhaustive search for Motif Finding Problem: By consideringall (n− l + 1)t positions s.

Exhaustive search for Median String Problem: By consideringall 4l l-mers.

The two searches are similar if we consider the 4 nucleotides to benumbers.

The main general issue is to consider all kL L-mers for k-letter alphabet.

How to enumerate them?

Page 150: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The two problems can also be solved using the same technique!

Exhaustive search for Motif Finding Problem: By consideringall (n− l + 1)t positions s.

Exhaustive search for Median String Problem: By consideringall 4l l-mers.

The two searches are similar if we consider the 4 nucleotides to benumbers.

The main general issue is to consider all kL L-mers for k-letter alphabet.

How to enumerate them?

Page 151: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The two problems can also be solved using the same technique!

Exhaustive search for Motif Finding Problem: By consideringall (n− l + 1)t positions s.

Exhaustive search for Median String Problem: By consideringall 4l l-mers.

The two searches are similar if we consider the 4 nucleotides to benumbers.

The main general issue is to consider all kL L-mers for k-letter alphabet.

How to enumerate them?

Page 152: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The two problems can also be solved using the same technique!

Exhaustive search for Motif Finding Problem: By consideringall (n− l + 1)t positions s.

Exhaustive search for Median String Problem: By consideringall 4l l-mers.

The two searches are similar if we consider the 4 nucleotides to benumbers.

The main general issue is to consider all kL L-mers for k-letter alphabet.

How to enumerate them?

Page 153: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.7 Search trees

NextLeaf(a, L, k)1. for i← L to 12. if ai < k3. ai ← ai + 14. return a5. ai ← 16. return a

wherea is an L-mer, an array of length L (indexed 1 to L);

for i, 1 ≤ i ≤ L, element ai has value ranging from 1 to k;

What does the function NextLeaf do?

Page 154: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.7 Search trees

NextLeaf(a, L, k)1. for i← L to 12. if ai < k3. ai ← ai + 14. return a5. ai ← 16. return a

wherea is an L-mer, an array of length L (indexed 1 to L);

for i, 1 ≤ i ≤ L, element ai has value ranging from 1 to k;

What does the function NextLeaf do?

Page 155: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.7 Search trees

NextLeaf(a, L, k)1. for i← L to 12. if ai < k3. ai ← ai + 14. return a5. ai ← 16. return a

wherea is an L-mer, an array of length L (indexed 1 to L);

for i, 1 ≤ i ≤ L, element ai has value ranging from 1 to k;

What does the function NextLeaf do?

Page 156: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.7 Search trees

NextLeaf(a, L, k)1. for i← L to 12. if ai < k3. ai ← ai + 14. return a5. ai ← 16. return a

wherea is an L-mer, an array of length L (indexed 1 to L);

for i, 1 ≤ i ≤ L, element ai has value ranging from 1 to k;

What does the function NextLeaf do?

Page 157: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

NextLeaf(a, L, k) { with comments }1. for i← L to 1 { from low to high digits }2. if ai < k3. ai ← ai + 1 {increment the first digit not yet reaching k}4. return a5. ai ← 1 {set the digit back to 1 if having reached k,

carried to the next higher digit}6. return a

Search tree:

- each node can have k children;- there are L levels of nodes;- leaves are L-mers;- NextLeaf is used to navigate from one leaf to the next one;

Page 158: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

NextLeaf(a, L, k) { with comments }1. for i← L to 1 { from low to high digits }2. if ai < k3. ai ← ai + 1 {increment the first digit not yet reaching k}4. return a5. ai ← 1 {set the digit back to 1 if having reached k,

carried to the next higher digit}6. return a

Search tree:

- each node can have k children;- there are L levels of nodes;- leaves are L-mers;- NextLeaf is used to navigate from one leaf to the next one;

Page 159: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Enumerate all L-mers for a k-letter alphabet

AllLeaves(L, k)1. a← (1, . . . , 1)2. continue← TRUE3. while continue4. print a5. NextLeaf(a, L, k)6. if a = (1,. . . , 1)7. continue← FALSE8. return

Only go through all leaves, not internal nodes.

Page 160: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Enumerate all L-mers for a k-letter alphabet

AllLeaves(L, k)1. a← (1, . . . , 1)2. continue← TRUE3. while continue4. print a5. NextLeaf(a, L, k)6. if a = (1,. . . , 1)7. continue← FALSE8. return

Only go through all leaves, not internal nodes.

Page 161: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Enumerate all L-mers for a k-letter alphabet

AllLeaves(L, k)1. a← (1, . . . , 1)2. continue← TRUE3. while continue4. print a5. NextLeaf(a, L, k)6. if a = (1,. . . , 1)7. continue← FALSE8. return

Only go through all leaves, not internal nodes.

Page 162: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

How big is such a search tree?

- Number of leaves is kL for a k-letter alphabet.

- Number of internal nodes is (kL − 1)/(k − 1).

- Total number of nodes is (kL+1 − 1)/(k − 1).

Page 163: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

How big is such a search tree?

- Number of leaves is kL for a k-letter alphabet.

- Number of internal nodes is (kL − 1)/(k − 1).

- Total number of nodes is (kL+1 − 1)/(k − 1).

Page 164: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

How big is such a search tree?

- Number of leaves is kL for a k-letter alphabet.

- Number of internal nodes is (kL − 1)/(k − 1).

- Total number of nodes is (kL+1 − 1)/(k − 1).

Page 165: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

How big is such a search tree?

- Number of leaves is kL for a k-letter alphabet.

- Number of internal nodes is (kL − 1)/(k − 1).

- Total number of nodes is (kL+1 − 1)/(k − 1).

Page 166: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Page 167: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

An alternative search program (going through internal nodes):

NextVertex(a, i, L, k)1. if i < L { not yet at the bottom level, go one level }2. ai+1 ← 1 { deeper, follow the leftmost branch }3. return (a, i+ 1)4. else { do as NextLeaf }5. for j ← L to 1 { when this starts, j = L, bottom level }6. if aj < k { when j 6= L, it is not at bottom level}7. aj ← aj + 1 { but an internal node}8. return(a, j)9. return (a, 0)

Why going through internal nodes?

For the purpose of pruning tree branches(avoiding unnecessary enumerations) to save time!

Page 168: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

An alternative search program (going through internal nodes):

NextVertex(a, i, L, k)1. if i < L { not yet at the bottom level, go one level }2. ai+1 ← 1 { deeper, follow the leftmost branch }3. return (a, i+ 1)4. else { do as NextLeaf }5. for j ← L to 1 { when this starts, j = L, bottom level }6. if aj < k { when j 6= L, it is not at bottom level}7. aj ← aj + 1 { but an internal node}8. return(a, j)9. return (a, 0)

Why going through internal nodes?

For the purpose of pruning tree branches(avoiding unnecessary enumerations) to save time!

Page 169: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

An alternative search program (going through internal nodes):

NextVertex(a, i, L, k)1. if i < L { not yet at the bottom level, go one level }2. ai+1 ← 1 { deeper, follow the leftmost branch }3. return (a, i+ 1)4. else { do as NextLeaf }5. for j ← L to 1 { when this starts, j = L, bottom level }6. if aj < k { when j 6= L, it is not at bottom level}7. aj ← aj + 1 { but an internal node}8. return(a, j)9. return (a, 0)

Why going through internal nodes?

For the purpose of pruning tree branches(avoiding unnecessary enumerations) to save time!

Page 170: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The method of branch-and-bound:

While traversing a search tree, it is possible to skip a whole subtreerooted at certain vertex.

How?

At each vertex, we calculate a bound – the most optimistic score of anyleaves within its subtree (which will be discussed later).

And using the following function to skip:

ByPass(a, i, L, k)1. for j ← i to 12. if aj < k3. aj ← aj + 14. return(a, j)5. return (a, 0)

Page 171: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The method of branch-and-bound:

While traversing a search tree, it is possible to skip a whole subtreerooted at certain vertex.

How?

At each vertex, we calculate a bound – the most optimistic score of anyleaves within its subtree (which will be discussed later).

And using the following function to skip:

ByPass(a, i, L, k)1. for j ← i to 12. if aj < k3. aj ← aj + 14. return(a, j)5. return (a, 0)

Page 172: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The method of branch-and-bound:

While traversing a search tree, it is possible to skip a whole subtreerooted at certain vertex.

How?

At each vertex, we calculate a bound – the most optimistic score of anyleaves within its subtree (which will be discussed later).

And using the following function to skip:

ByPass(a, i, L, k)1. for j ← i to 12. if aj < k3. aj ← aj + 14. return(a, j)5. return (a, 0)

Page 173: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The method of branch-and-bound:

While traversing a search tree, it is possible to skip a whole subtreerooted at certain vertex.

How?

At each vertex, we calculate a bound – the most optimistic score of anyleaves within its subtree (which will be discussed later).

And using the following function to skip:

ByPass(a, i, L, k)1. for j ← i to 12. if aj < k3. aj ← aj + 14. return(a, j)5. return (a, 0)

Page 174: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

The method of branch-and-bound:

While traversing a search tree, it is possible to skip a whole subtreerooted at certain vertex.

How?

At each vertex, we calculate a bound – the most optimistic score of anyleaves within its subtree (which will be discussed later).

And using the following function to skip:

ByPass(a, i, L, k)1. for j ← i to 12. if aj < k3. aj ← aj + 14. return(a, j)5. return (a, 0)

Page 175: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.8 Algorithms for Finding Motifs

First brute force algorithm for motif finding:

BruteForceMotifSearch(DNA, t, n, l)1. bestScore← 02. for each s = (s1, ..., st) from (1, ..., 1) to (n− l + 1, ..., n− l + 1)3. if Score(s,DNA) > bestScore4. bestScore← Score(s,DNA)5. bestMotif ← s6. return bestMotif

Line 2 enumerates of all tuples (1, . . . , 1) to (n− l + 1, ..., n− l + 1);

Page 176: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.8 Algorithms for Finding Motifs

First brute force algorithm for motif finding:

BruteForceMotifSearch(DNA, t, n, l)1. bestScore← 02. for each s = (s1, ..., st) from (1, ..., 1) to (n− l + 1, ..., n− l + 1)3. if Score(s,DNA) > bestScore4. bestScore← Score(s,DNA)5. bestMotif ← s6. return bestMotif

Line 2 enumerates of all tuples (1, . . . , 1) to (n− l + 1, ..., n− l + 1);

Page 177: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.8 Algorithms for Finding Motifs

First brute force algorithm for motif finding:

BruteForceMotifSearch(DNA, t, n, l)1. bestScore← 02. for each s = (s1, ..., st) from (1, ..., 1) to (n− l + 1, ..., n− l + 1)3. if Score(s,DNA) > bestScore4. bestScore← Score(s,DNA)5. bestMotif ← s6. return bestMotif

Line 2 enumerates of all tuples (1, . . . , 1) to (n− l + 1, ..., n− l + 1);

Page 178: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextLeaf to enumerate tuples;

BruteForceMotifSearchAgain(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← Score(s,DNA)3. while forever4. s← NextLeaf(s, t, n− l + 1)5. if Score(s,DNA) > bestScore6. bestScore← Score(s,DNA)7. bestMotif ← (s1, ..., st)8. if s = (1, ..., 1)9. return bestMotif

There are (n− l + 1)t such tuples;

Computing Score(s,DNA) takes O(l × t) steps;

So the complexity is O(lt(n− l + 1)t);

Page 179: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextLeaf to enumerate tuples;

BruteForceMotifSearchAgain(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← Score(s,DNA)3. while forever4. s← NextLeaf(s, t, n− l + 1)5. if Score(s,DNA) > bestScore6. bestScore← Score(s,DNA)7. bestMotif ← (s1, ..., st)8. if s = (1, ..., 1)9. return bestMotif

There are (n− l + 1)t such tuples;

Computing Score(s,DNA) takes O(l × t) steps;

So the complexity is O(lt(n− l + 1)t);

Page 180: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextLeaf to enumerate tuples;

BruteForceMotifSearchAgain(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← Score(s,DNA)3. while forever4. s← NextLeaf(s, t, n− l + 1)5. if Score(s,DNA) > bestScore6. bestScore← Score(s,DNA)7. bestMotif ← (s1, ..., st)8. if s = (1, ..., 1)9. return bestMotif

There are (n− l + 1)t such tuples;

Computing Score(s,DNA) takes O(l × t) steps;

So the complexity is O(lt(n− l + 1)t);

Page 181: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextLeaf to enumerate tuples;

BruteForceMotifSearchAgain(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← Score(s,DNA)3. while forever4. s← NextLeaf(s, t, n− l + 1)5. if Score(s,DNA) > bestScore6. bestScore← Score(s,DNA)7. bestMotif ← (s1, ..., st)8. if s = (1, ..., 1)9. return bestMotif

There are (n− l + 1)t such tuples;

Computing Score(s,DNA) takes O(l × t) steps;

So the complexity is O(lt(n− l + 1)t);

Page 182: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextLeaf to enumerate tuples;

BruteForceMotifSearchAgain(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← Score(s,DNA)3. while forever4. s← NextLeaf(s, t, n− l + 1)5. if Score(s,DNA) > bestScore6. bestScore← Score(s,DNA)7. bestMotif ← (s1, ..., st)8. if s = (1, ..., 1)9. return bestMotif

There are (n− l + 1)t such tuples;

Computing Score(s,DNA) takes O(l × t) steps;

So the complexity is O(lt(n− l + 1)t);

Page 183: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextVertex:

SimpleMotifSearch(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← 03. i← 14. while i > 05. if i < t6. (s, i)← NextVertex (s, i, t, n− l + 1)7. else8. if Score(s,DNA) > bestScore9. bestScore← Score(s,DNA)10. bestMotif ← s11. (s, i)← NextVertex(s, i, t, n− l + 1)12. return bestMotif

Still without branch-and-bound heuristics

Page 184: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextVertex:

SimpleMotifSearch(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← 03. i← 14. while i > 05. if i < t6. (s, i)← NextVertex (s, i, t, n− l + 1)7. else8. if Score(s,DNA) > bestScore9. bestScore← Score(s,DNA)10. bestMotif ← s11. (s, i)← NextVertex(s, i, t, n− l + 1)12. return bestMotif

Still without branch-and-bound heuristics

Page 185: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextVertex:

SimpleMotifSearch(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← 03. i← 14. while i > 05. if i < t6. (s, i)← NextVertex (s, i, t, n− l + 1)7. else8. if Score(s,DNA) > bestScore9. bestScore← Score(s,DNA)10. bestMotif ← s11. (s, i)← NextVertex(s, i, t, n− l + 1)12. return bestMotif

Still without branch-and-bound heuristics

Page 186: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

With a branch-and-bound heuristics:

BranchAndBoundMotifSearch(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← 03. i← 14. while i > 05. if i < t6. optimisticScore← Score(s, i,DNA) + (t− i) · l7. if optimisticScore < bestScore8. (s, i)← ByPass(s, i, t, n− l + 1)9. else10. (s, i)← NextVertex (s, i, t, n− l + 1)11. else12. if Score(s,DNA) > bestScore13. bestScore← Score(s,DNA)14. bestMotif ← s15. (s, i)← NextVertex(s, i, t, n− l + 1)16. return bestMotif

Page 187: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

With a branch-and-bound heuristics:

BranchAndBoundMotifSearch(DNA, t, n, l)1. s← (1, ..., 1)2. bestScore← 03. i← 14. while i > 05. if i < t6. optimisticScore← Score(s, i,DNA) + (t− i) · l7. if optimisticScore < bestScore8. (s, i)← ByPass(s, i, t, n− l + 1)9. else10. (s, i)← NextVertex (s, i, t, n− l + 1)11. else12. if Score(s,DNA) > bestScore13. bestScore← Score(s,DNA)14. bestMotif ← s15. (s, i)← NextVertex(s, i, t, n− l + 1)16. return bestMotif

Page 188: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.9 Finding a median string

BruteForceMedianSearch(DNA, t, n, l)1. bestWord← AAA...AAA2. bestDistance←∞3. for each l-mer word← AAA...AAA to TTT...TTT4. if TotalDistance(word,DNA) < bestDistance5. bestDistance← TotalDistance(word,DNA)6. bestWord← word7. return bestWord

Page 189: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.9 Finding a median string

BruteForceMedianSearch(DNA, t, n, l)1. bestWord← AAA...AAA2. bestDistance←∞3. for each l-mer word← AAA...AAA to TTT...TTT4. if TotalDistance(word,DNA) < bestDistance5. bestDistance← TotalDistance(word,DNA)6. bestWord← word7. return bestWord

Page 190: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextVertex:

SimpleMedianSearch(DNA, t, n, l)1. s← (1, ...1)2. bestDistance←∞3. i← 14. while i > 05. if i < l6. (s, i)← NextVertex (s, i, l, 4)7. else8. word← nucleotide string from (s1, ..., sl)9. if TotalDistance (word,DNA) < bestDistance10. bestDistance← TotalDistance (word,DNA)11. bestWord← word12. (s, i)← NextVertex (s, i, l, 4)13. return bestWord.

Page 191: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Using subroutine NextVertex:

SimpleMedianSearch(DNA, t, n, l)1. s← (1, ...1)2. bestDistance←∞3. i← 14. while i > 05. if i < l6. (s, i)← NextVertex (s, i, l, 4)7. else8. word← nucleotide string from (s1, ..., sl)9. if TotalDistance (word,DNA) < bestDistance10. bestDistance← TotalDistance (word,DNA)11. bestWord← word12. (s, i)← NextVertex (s, i, l, 4)13. return bestWord.

Page 192: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

With a branch-and-bound strategy:

BranchAndBoundMedianSearch(DNA, t, n, l)1. s← (1, ...1)2. bestDistance←∞3. i← 14. while i > 05. if i < l6. prefix← nucleotide string from (s1, ...si)7. optimisticDistance← TotalDistance(prefix,DNA)8. if optimisticDistance > bestDistance9. (s, i)←ByPass(s, i, l, 4)10. else11. (s, i)← NextVertex (s, i, l, 4)12. else13. word← nucleotide string from (s1, ..., sl)14. if TotalDistance (word,DNA) < bestDistance15. bestDistance← TotalDistance (word,DNA)16. bestWord← word17. (s, i)← NextVertex (s, i, l, 4)18. return bestWord.

Page 193: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

With a branch-and-bound strategy:

BranchAndBoundMedianSearch(DNA, t, n, l)1. s← (1, ...1)2. bestDistance←∞3. i← 14. while i > 05. if i < l6. prefix← nucleotide string from (s1, ...si)7. optimisticDistance← TotalDistance(prefix,DNA)8. if optimisticDistance > bestDistance9. (s, i)←ByPass(s, i, l, 4)10. else11. (s, i)← NextVertex (s, i, l, 4)12. else13. word← nucleotide string from (s1, ..., sl)14. if TotalDistance (word,DNA) < bestDistance15. bestDistance← TotalDistance (word,DNA)16. bestWord← word17. (s, i)← NextVertex (s, i, l, 4)18. return bestWord.

Page 194: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

How much time does it need to compute

TotalDistance(word,DNA) ?

and

TotalDistance(prefix,DNA) ?

Page 195: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

How much time does it need to compute

TotalDistance(word,DNA) ?

and

TotalDistance(prefix,DNA) ?

Page 196: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

How much time does it need to compute

TotalDistance(word,DNA) ?

and

TotalDistance(prefix,DNA) ?

Page 197: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.912 Profile-based Motif Search

Extending the motif finding question:

Once a motif profile is established, how to find a motif froma new DNA sequence which fits the profile “well” ?

I.e., consider the following problem:INPUT: a DNA sequence D, and motif profile P ;OUTPUT: some position s in D such that Score(s, P )

achieves the optimal.

where Score(s, P ) is computed with the motif starting at position sagainst the profile P .

Page 198: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.912 Profile-based Motif Search

Extending the motif finding question:

Once a motif profile is established, how to find a motif froma new DNA sequence which fits the profile “well” ?

I.e., consider the following problem:INPUT: a DNA sequence D, and motif profile P ;OUTPUT: some position s in D such that Score(s, P )

achieves the optimal.

where Score(s, P ) is computed with the motif starting at position sagainst the profile P .

Page 199: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.912 Profile-based Motif Search

Extending the motif finding question:

Once a motif profile is established, how to find a motif froma new DNA sequence which fits the profile “well” ?

I.e., consider the following problem:INPUT: a DNA sequence D, and motif profile P ;OUTPUT: some position s in D such that Score(s, P )

achieves the optimal.

where Score(s, P ) is computed with the motif starting at position sagainst the profile P .

Page 200: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

4.912 Profile-based Motif Search

Extending the motif finding question:

Once a motif profile is established, how to find a motif froma new DNA sequence which fits the profile “well” ?

I.e., consider the following problem:INPUT: a DNA sequence D, and motif profile P ;OUTPUT: some position s in D such that Score(s, P )

achieves the optimal.

where Score(s, P ) is computed with the motif starting at position sagainst the profile P .

Page 201: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Two components are needed for the profile-based motif search:

(1) scanning algorithm

enumerating all positions on the DNA sequence

(2) scoring method

computing score Score(sP ), how ?

Page 202: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Two components are needed for the profile-based motif search:

(1) scanning algorithm

enumerating all positions on the DNA sequence

(2) scoring method

computing score Score(sP ), how ?

Page 203: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Two components are needed for the profile-based motif search:

(1) scanning algorithm

enumerating all positions on the DNA sequence

(2) scoring method

computing score Score(sP ), how ?

Page 204: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Two components are needed for the profile-based motif search:

(1) scanning algorithm

enumerating all positions on the DNA sequence

(2) scoring method

computing score Score(sP ), how ?

Page 205: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

Two components are needed for the profile-based motif search:

(1) scanning algorithm

enumerating all positions on the DNA sequence

(2) scoring method

computing score Score(sP ), how ?

Page 206: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

motif at position s: G T G G A A C T

* + + * + + + +

One method is to use Hamming distance,

Score(s, P ) = 2

disadvantage?

Page 207: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

motif at position s: G T G G A A C T

* + + * + + + +

One method is to use Hamming distance,

Score(s, P ) = 2

disadvantage?

Page 208: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

motif at position s: G T G G A A C T

* + + * + + + +

One method is to use Hamming distance,

Score(s, P ) = 2

disadvantage?

Page 209: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

motif at position s: G T G G A A C T

* + + * + + + +

One method is to use Hamming distance,

Score(s, P ) = 2

disadvantage?

Page 210: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

nucleotide/position 1 2 3 4 5 6 7 8

A 5 1 0 0 5 5 0 0C 0 0 1 4 2 0 6 1G 1 1 6 3 0 1 0 0T 1 5 0 0 0 1 1 6

Consensus A T G C A A C T

motif at position s: G T G G A A C T

* + + * + + + +

One method is to use Hamming distance,

Score(s, P ) = 2

disadvantage?

Page 211: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

(2) Statistical method

The profile gives probability pi(x) in the ith column, forx ∈ {A,C,G, T},

and i = 1, 2, . . . , 8.

Score(s, P ) = p1(G)× p2(T )× · · · × p8(T )

But one question remains:

how high a probability is for a motif to be considered acceptable?

Page 212: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

(2) Statistical method

The profile gives probability pi(x) in the ith column, forx ∈ {A,C,G, T},

and i = 1, 2, . . . , 8.

Score(s, P ) = p1(G)× p2(T )× · · · × p8(T )

But one question remains:

how high a probability is for a motif to be considered acceptable?

Page 213: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

(2) Statistical method

The profile gives probability pi(x) in the ith column, forx ∈ {A,C,G, T},

and i = 1, 2, . . . , 8.

Score(s, P ) = p1(G)× p2(T )× · · · × p8(T )

But one question remains:

how high a probability is for a motif to be considered acceptable?

Page 214: CSCI x490 Algorithms for Computational Biologycobweb.cs.uga.edu/~cai/courses/compbio/2016spring/... · CSCI x490 Algorithms for Computational Biology Lecture Note 1 (by Liming Cai)

Chapter 4. Exhaustive Search

(2) Statistical method

The profile gives probability pi(x) in the ith column, forx ∈ {A,C,G, T},

and i = 1, 2, . . . , 8.

Score(s, P ) = p1(G)× p2(T )× · · · × p8(T )

But one question remains:

how high a probability is for a motif to be considered acceptable?