abstract many research groups have described genetic and protein networks as networks of boolean...

1
Abstract Many research groups have described genetic and protein networks as networks of Boolean variables, and provided procedures for reverse engineering. Recently, Laubenbacher and colleagues proposed using finite fields to represent genes and proteins in biological networks. We develop a procedure for error-correction of microarray data based on majority logic decoding, and we develop two procedures for reverse engineering finite field networks: one produces functions over a single uniform finite field and another functions over different finite sets. We demonstrate the utility of finite fields by applying the techniques to a data set developed at the University of Puerto Rico, from rats trained in a memory task called conditioned taste aversion. We introduce the definition of dynamical systems over different finite sets and we present applications to biological networks (like regulatory and protein networks). In our model we consider variables over sets with different numbers of elements. This work provides a mathematical model for the biological methods developed by Thomas and colleagues. The importance of our model is that we give a formalism that allows us to solve the reverse engineering problem. María Alicia Aviñó, UPR-Cayey Edusmildo Orozco, UPR-Mayagüez Humberto Ortiz-Zuazuaga, HPCf, UPR-Rio Piedras Dynamical Systems of Protein and Gene Networks ) , , ( ) , ( : }, 3 , 2 , 1 , 0 { }, 2 , 1 , 0 { }, 1 , 0 { Example }. ,..., 1 , 0 { is that es, posibiliti of number different a has ble each varia but } ,...., { variab of number finite a over acting F function A 3 2 1 3 2 , 1 3 2 1 3 2 1 3 2 1 f f f x x x F X X X X X X F X X X i X x x x i i n 1 A Dynamical system over different finite sets is: We prove that all the functions which act over a regulatory network can be represented as a polynomial function in one variable or in several variables over a finite field. Sandra Peña de Ortiz, UPR- Rio Piedras Dorothy Bollman, UPR-Mayagüez Oscar Moreno, UPR-Rio Piedras Acknowlegements M. Aviñó and O Moreno are supported by CISE MII award EIA-0080926 from the National Science Foundation. H. Ortiz is supported in part by the Puerto Rico BRIN Award P20 RR16470 from the National Center for Research Resources, National Institutes of Health. S. Peña is supported by an Institutional Development Award (IDeA) Program, P20RR15565, from the National Center for Research Resources, National Institutes of Health. E. Orozco and D. Bollman were partially supported by NIH/NIGMS grant No. S06GM08103 and by PRECISE grant NSF #99-77071. Reverse Engineering as Polynomial Interpolation Moreno et al [2, 3] propose solving the reverse engineering problem by looking at each tuple of the time series as an element from a finite field. This framework finds an interpolating polynomial in single variable whose coefficients are from the underlying finite field. There is an algebraic procedure to go from the ONE variable case to the MULTIVARIATE equivalent version proposed by Laubenbacher [4]. R. Laubenbacher [4, 5] and our group propose to reverse engineer protein networks using a polynomial approach. () () () fx Px gx This mathematical model can be seen in a more simple way by the following expression. where P(x) is the univariate polynomial interpolating the points from the time series and g(x) is a polynomial that vanishes at all interpolating values. P(x) can be computed using Lagrange interpolation. The computational cost of this is O(n 2 ). 11 2 18 11 () fx x x Example 2 Given the sequence of points r 0 = (0, 1, 0), r 1 = (0, 2, 1), r 2 = (0, 2, 2), r 3 = (2, 0, 0) from GF(3 3 ), find a dynamic network f that satisfies f( r 0 ) = r 1 , f( r 1 ) = r 2 , f( r 2 ) = r 3 . This is equivalent to: Find a polynomial f(x) from K[x], where K = GF(3 3 ), that interpolates the three pair of points (r 0 , r 1 ), (r 1 , r 2 ), (r 2 , r 3 ). Figure 1. A finite Dynamical system over GF(3 3 ) that interpolates the points (α 5 , α 2 ), 2 , α 6 ), 6 , α 23 ), 23 , α 10 ). Using Lagrange interpolation we found the polynomial shown at the button of the figure. Brown circles represent the original sequence of points r 0 , r 1 , r 2 , and r 3 . 2 2 1 2 3 1 3 2 3 2 2 2 2 1 3 2 3 1 3 2 3 2 1 3 1 2 2 3 1 1 1 f x x xx xx x x x xx x x f x xx xx xx x This is the multivariate equivalent form of the polynomial f( x ) = α 11 x 2 + α 18 x + α 11 α is what is called a primitive element from the finite field. In this case α = (1, 0, 1). Chinese Remaindering Chinese remaindering is used to solve a system of congruence equations modulo a set of pairwise relatively prime polynomials. Fast Chinese Remaindering Interpolating Polynomial When the modular polynomials are of the form x x i , the Chinese remaindering theorem is converted to a fast polynomial interpolation algorithm [1]. The complexity for this interpolation algorithm is O( n log 2 n) This algorithm is mainly based on the computation of two sets of intermediate polynomials q(x) and S(x) which are recursive in nature as shown in the following figures. , 1 ij q 1 2 , 1 j i j q , ij q S i + 2 j-1 , j-1 S i, j-1 S i, j 1 2 , 1 j i j q , 1 ij q Parallel Algorithm 1. Compute q i,j in parallel 2. Compute d i in parallel 3. Compute S i,0 in parallel 4. Compute S i,j : 4.1 for ( j = 1; j <= t; j++ ) 4.2 for ( i = 0; i < k; i+ = 2 j ) in parallel 4.3 1 , 1 , 2 1 , 2 1 , , * * 1 1 j i j i j i j i j i q S q S S j j o p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 0 p 2 p 4 p 6 p 0 p 0 p 4 p Step 0 Step 1 Step 2 Step 3

Upload: trevor-james

Post on 13-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abstract Many research groups have described genetic and protein networks as networks of Boolean variables, and provided procedures for reverse engineering

AbstractMany research groups have described genetic and protein networks as networks of Boolean variables, and provided procedures for reverse engineering. Recently, Laubenbacher and colleagues proposed using finite fields to represent genes and proteins in biological networks. We develop a procedure for error-correction of microarray data based on majority logic decoding, and we develop two procedures for reverse engineering finite field networks: one produces functions over a single uniform finite field and another functions over different finite sets. We demonstrate the utility of finite fields by applying the techniques to a data set developed at the University of Puerto Rico, from rats trained in a memory task called conditioned taste aversion. We introduce the definition of dynamical systems over different finite sets and we present applications to biological networks (like regulatory and protein networks). In our model we consider variables over sets with different numbers of elements. This work provides a mathematical model for the biological methods developed by Thomas and colleagues. The importance of our model is that we give a formalism that allows us to solve the reverse engineering problem.

María Alicia Aviñó, UPR-Cayey Edusmildo Orozco, UPR-Mayagüez Humberto Ortiz-Zuazuaga, HPCf, UPR-Rio Piedras

Dynamical Systems of Protein and Gene Networks

),,(),(

:

},3,2,1,0{},2,1,0{},1,0{

Example

}.,...,1,0{ is that es,posibiliti of

number different a has bleeach variabut },....,{

variablesofnumber finite aover acting Ffunction A

32132,1

321321

321

fffxxxF

XXXXXXF

XXX

iXx

xx

ii

n1

A Dynamical system over different finite sets is:

We prove that all the functions which act over a regulatory network can be represented as a polynomial function in one variable or in several variables over a finite field.

Sandra Peña de Ortiz, UPR- Rio PiedrasDorothy Bollman, UPR-MayagüezOscar Moreno, UPR-Rio Piedras

Acknowlegements

M. Aviñó and O Moreno are supported by CISE MII award EIA-0080926 from the National Science Foundation. H. Ortiz is supported in part by the Puerto Rico BRIN Award P20 RR16470 from the National Center for Research Resources, National Institutes of Health. S. Peña is supported by an Institutional Development Award (IDeA) Program, P20RR15565, from the National Center for Research Resources, National Institutes of Health. E. Orozco and D. Bollman were partially supported by NIH/NIGMS grant No. S06GM08103 and by PRECISE grant NSF #99-77071.

Reverse Engineering as Polynomial Interpolation

Moreno et al [2, 3] propose solving the reverse engineering problem by looking at each tuple of the time series as an element from a finite field. This framework finds an interpolating polynomial in single variable whose coefficients are from the underlying finite field. There is an algebraic procedure to go from the ONE variable case to the MULTIVARIATE equivalent version proposed by Laubenbacher [4].

R. Laubenbacher [4, 5] and our group propose to reverse engineer protein networks using a polynomial approach.

( ) ( ) ( )f x P x g x

This mathematical model can be seen in a more simple way by the following expression.

where P(x) is the univariate polynomial interpolating the points from the time series and g(x) is a polynomial that vanishes at all interpolating values.

P(x) can be computed using Lagrange interpolation. The computational cost of this is O(n2).

11 2 18 11( )f x x x

Example 2

Given the sequence of points r0 = (0, 1, 0), r1 = (0, 2, 1), r2 = (0, 2,

2), r3 = (2, 0, 0) from GF(33), find a dynamic network f that satisfies f( r0 ) = r1, f( r1 ) = r2, f( r2 ) = r3.

This is equivalent to: Find a polynomial f(x) from K[x], where K = GF(33), that interpolates the three pair of points

(r0, r1), (r1, r2), (r2, r3).

Figure 1. A finite Dynamical system over GF(33) that interpolates the points (α5, α2), (α2, α6), (α6 , α23), (α23, α10).

Using Lagrange interpolation we found the polynomial shown at the button of the figure.

Brown circles represent the original sequence of points r0, r1, r2, and r3.

2 21 2 3 1 3 2 3 2

2 22 1 3 2 3 1 3

23 2 1 3 1 2 2 3 1

1

1

f x x x x x x x

f f x x x x x x

f x x x x x x x x

This is the multivariate equivalent form of the polynomial

f( x ) = α11x2 + α18x + α11

α is what is called a primitive element from the finite field. In this case α = (1, 0, 1).

Chinese Remaindering

Chinese remaindering is used to solve a system of congruence equations modulo a set of pairwise relatively prime polynomials.

Fast Chinese Remaindering Interpolating Polynomial

When the modular polynomials are of the form x – xi , the Chinese remaindering theorem is converted to a fast polynomial interpolation algorithm [1]. The complexity for this

interpolation algorithm is

O( n log2 n)

This algorithm is mainly based on the computation of two sets of intermediate polynomials q(x) and S(x) which are recursive in nature as shown in the following figures.

, 1i jq 12 , 1ji jq

,i jq

S i + 2 j-1

,

j-1

S i, j-1

S i,

j12 , 1ji j

q , 1i jq

Parallel Algorithm

1. Compute q i,j in parallel

2. Compute di in parallel

3. Compute S i,0 in parallel

4. Compute S i,j:

4.1 for ( j = 1; j <= t; j++ )

4.2 for ( i = 0; i < k; i+ = 2 j ) in parallel

4.3

1,1,21,21,, ** 11 jijijijiji qSqSS jj

op1p 2p 3p 4p 5p 6p 7p

0p 2p 4p 6p

0p

0p

4p

Step 0

Step 1

Step 2

Step 3